Bitly, Links, and Metadata

Ed Stenson | 14th November 2012

Today, Datasift announces two new link resolution services. First, we're delighted to be partnering with bitly, the #1 link sharing platform, which powers 75 percent of the world’s largest media companies and half of the Fortune 500 companies. With over 20,000 white-labeled domains, bitly generates 200M clicks/day.

In addition, DataSift's own Links augmentation is now live, too. This is a massive update to our old TweetMeme links aggregator. Until now, we used TweetMeme to fully resolve links embedded in Tweets and other data sources but, with the increased volume of links traffic coming from Twitter, we were close to hitting TweetMeme’s maximum capacity of 40M links/day.

What's the big deal?

These two services are complementary. By definition, 100 percent of bitly interactions relate to links and clicks on links. In fact, the data volume can be so high that, you might want to add an extra line of CSDL to throttle back the volume; here's how you do it:

bitly.country_code == "US"
and interaction.sample < 3
// Take a 3% sample of the data

But adding the Links augmentation provides even more opportunities for filtering. For example, here's some CSDL that filters in real time for clicks made within the UK on bitly links that lead to content with Apple in the title.

bitly.country_code == "uk" and
links.title contains "Apple"

On top of that, DataSift's Links augmentation adds metadata to the interactions in your filters. I discuss the significance of metadata in another blog, Open Graph and Twitter Cards.


In DataSift, there are currently 15 targets that you can filter against. For example:

  • bitly.cname allows you to filter against custom names in bitly such as (for the ESPN sports network) or (for the New York Times).
  • bitly.referring_domain allows you to filter for clicks on links from particular domains; that is, links embedded on pages at domains you specify.
  • bitly.country_code allows you to filter for clicks on bitly links from countries you specify.

Meanwhile, our Links augmentation offers 79 targets that you can filter against. For example:

Use cases

Trend analysis

A very compelling use case for bitly is trend analysis. We can already track Likes on Facebook or Retweets on Twitter and expect to see when something goes viral. But what if a story receives relatively little of this kind of attention but large numbers of clicks? To monitor clicks activity rather than sharing activity, bitly and the Links augmentation are perfect. For an all-round perspective, you could monitor bitly, the Links augmentation, Twitter retweets, and Facebook likes simultaneously.

Platform analysis

The bitly.user.agent target is useful when you want to measure popularity of a particular web client. Find out which of your content is most popular on mobile devices.

Geo analysis

Find out what percentage of people publishing or sharing information about a particular subject are located in a specified area. Take it one stage further and approximate the size of the area that people are Tweeting from. Find out if people enjoyed a rock concert, or to determine how quickly a wildfire is spreading.

Timezone analysis

The timezone for a click helps you find out when content is published or shared in different time zones. Use it to compare the kind of content popular in the morning in the US and mainland Europe.


Looking at the combined stats for bitly and the Links augmentation, DataSift resolves an average of 3,500 links per second, collecting the metadata at the same time and caching the results.

To learn more about these services and the engineering that makes them work, take a look at today's blog by Lorenzo Alberton, DataSift's Chief Technical Architect.

Previous post: Bitly, DataSift and Links Resolution

Next post: How Best to Filter for Twitter @Mentions