Monitoring Eurozone sentiment for just 20 cents an hour

datasift | 5th December 2011

Introduction

I wrote a stream today to monitor social media commentary on the meeting between German Chancellor Angela Merkel and French President Nicolas Sarkozy in Paris. It’s “the start of a crucial week for the Eurozone,” one report read, and it almost sounded like understatement. Definitely, sentiment is going to be worth watching today.

Let’s ask some questions about this stream:

  1. What would it cost to run for 24 hours?
  2. How is the cost calculated?
  3. What are all those tags for?
  4. How much output does the stream deliver?

tag "Positive" { salience.title.sentiment >= 3 OR salience.content.sentiment >= 3 }
tag "Neutral" { ( salience.title.sentiment > -3 OR salience.content.sentiment > -3 ) AND ( salience.title.sentiment < 3 OR salience.content.sentiment < 3 ) }
tag "Negative" { salience.title.sentiment <= -3 OR salience.content.sentiment <= -3 }

tag "Klout <10" { klout.score < 10 }
tag "Klout 10+" { klout.score >= 10 AND klout.score < 20 }
tag "Klout 20+" { klout.score >= 20 AND klout.score < 30 }
tag "Klout 30+" { klout.score >= 30 AND klout.score < 40 }
tag "Klout 40+" { klout.score >= 40 AND klout.score < 50 }
tag "Klout 50+" { klout.score >= 50 AND klout.score < 60 }
tag "Klout 60+" { klout.score >= 60 AND klout.score < 70 }
tag "Klout 70+" { klout.score >= 70 }

return {
interaction.content contains_any "Merkel, Sarkozy, #Merkel, #Sarkozy, #euro, #Osbourne"
}

What does it cost?

The filter has 14 lines of code but the good news is that only one of them is chargeable. Here it is:

interaction.content contains_any "Merkel, Sarkozy, #Merkel, #Sarkozy, #euro, #Osbourne"

The other lines cost you nothing at all.

Our Understanding Billing page shows the way DataSift calculates costs for each operator. Here we're using the contains_any operator with 6 arguments:

"Merkel, Sarkozy, #Merkel, #Sarkozy, #euro, #Osbourne"

The billing documentation indicates that the cost would be 0.2 DPUs per hour.

In fact, you can include up to 10 arguments and still only pay 0.2 DPU.

What's a DPU?

The simplest way to think about it is:

  1. DPUs are a measure of cost per hour to run a stream.
  2. A DPU is currently equivalent to 20 US cents.

Now, DataSift's minimum charge is 1 DPU per hour. Hence, the overall cost to run our Eurozone stream is 20 cents per hour, or $4.80 for an entire 24 hour's worth of focused data.

What are those tags for?

They're a feature of CSDL that allows you to add metadata on a conditional basis. For example:

tag "Klout 20+" { klout.score >= 20 AND klout.score < 30 }

This command adds a tag "Klout 20+" to every object that comes from a user who has a Klout score between 20 and 29.

Most real-world applications built on DataSift, use our API and one of our client libraries. After DataSift has passed objects to your client application, you own code can examine the metadata and perform any analysis you choose. For instance, it would be very easy to generate a bar chart with frequency on the vertical (y) axis and Klout range on the horizontal (x) axis.

How much data does this stream produce?

We wrote a few simple lines of PHP to sample the stream for 30 minutes and received 2,727 objects.


Previous post: Standard and Poor's Downgrades US Banks

Next post: Historical Architecture - Data Mining Billions of Tweets