Ed Stenson | 13th November 2012
Today we announce our new data source, the bitly input stream. With 200M clicks a day, it provides an excellent augmentation to the links embedded in Tweets and in messages from other data sources. In the past, we could see which content was being shared; we can now see which links are actually being clicked. In practical terms, DataSift can now reveal activity that, formerly, was hidden. We can show the real reach of content like never before, providing the complete picture, not just one side of it.
The new data source is unquestionably useful by itself, but here at DataSift we’re always trying find ways to add more information to the input data, making it richer and more structured. Every time our filtering engine sees an interaction that contains a link, it resolves that link all the way back to the interaction's original target page, even if the link has been shortened several times. Then, it examines the target page, looking for metadata in Open Graph or Twitter Cards format in the page's HTML header. Any metadata that it finds, it adds to the interaction. We believe the result makes the click stream ten times more valuable to our users, so let’s explore in more depth the data that our platform can deliver.
Data, metadata, and embedded content
A simple but immensely significant change has arrived in the world of social media, as two apparently separate elements, embedded content and metadata, have come together in a fascinating way. At DataSift, the effect is already impacting about 30 percent of the content passing through our servers, and the trend shows healthy growth.
What are Open Graph and Twitter Cards?
Let's define our terms first; embedded content on a web page consists, for example, of videos or static images such as photographs.
Meanwhile, metadata is nothing more than data that describes other data. If that data happens to be a piece of embedded content, a press photograph, for instance, it might be accompanied by metadata such as:
- a title
- a description
- a URL that points to the photograph
- the width and height of the photograph
... plus as many other nuggets of relevant information that the image's creator chose to supply.
A key technology here is Facebook’s Open Graph protocol (now an open standard that anyone can use), and more recently Twitter Cards. Given the volume of content being shared on Facebook and Twitter, these two platforms decided to propose a set of metadata properties that content creators could use to influence the way their content is previewed (“embedded”) when shared on Facebook and Twitter.
As an example, the New York Times (one of the over 2,000 newspapers already using OpenGraph and Twitter Card metadata) might specify - for each article - the title, the description, the author, the canonical URL, and what image should be used in the preview on Twitter/Facebook.
Why are Open Graph and Twitter Cards significant?
Open Graph and Twitter Cards allow Facebook and Twitter to present rich content, and these ideas are producing an extraordinary, explosive effect because they benefit so many participants in social media:
Creators benefit because they now have a way to determine what happens to their content after release. By defining metadata for any creation, whether it's a 3,000-word blog, a photograph, a video, an audio clip, or something brand new on the web, creators can name, annotate, and classify their work.
- Syndicators benefit because metadata makes their lives easier. In the old days, a newspaper article about Hewlett-Packard stock might have discussed $HPQ common stock but it might have been about inventory shortages of the latest HP server. The only way to be sure was to read the article, or to use natural-language processing to analyze it. But metadata takes the problem away. To describe the article, the syndicator can simply republish the description that they find in the metadata. If it comes from a trusted creator, it will be good. The quality and amount of metadata can be impressive, and span classification, summary, domain, author information, etc.
- Consumers benefit from metadata because they get a better experience on Twitter and Facebook, by having a compelling, visual preview of the target page embedded in their timeline, and not just a link, so they can immediately make up their mind whether it’s worth following the link to the full article or not.
According to our statistics, more than one-third of all the links we receive point to a page with Open Graph metadata, and about 10 percent also have Twitter Cards (it’s a lower percentage because Twitter Cards is a younger protocol and less generic), so a really significant portion of the links will contain a wealth of information attached.
Facebook Open Graph
We believe that we're the only company able to filter against Open Graph and Twitter Card data, offering you an opportunity to gain unique insight. Here are a few possible use cases:
- In real time, monitor clicks on bitly links to your site or check out bitly links going to your competitors' sites.
- For stories about TV shows featured on America's top-five newspapers websites, which ones are shared in links the most?
- For Tweets that were heavily retweeted, filter for those that contained heavily clicked links.
- For stories on Superbowl Sunday, exclude the ones that do not have Google News keyword metadata. Stories with Google News keywords will be amongst those most widely read.