Billing

Billing details are explained in full on our Terms page. The cost of a stream depends on how many operators you include. Some operators are more expensive than others. All streams have a fixed cost; some have a variable cost too because some data suppliers charge for their content. Billing for Historics also works in the same way. The following page will provide you with detailed information on how our billing system works.

Don't forget to take a look at our Billing FAQ too.

Overview

The cost of running a stream is a function of two variables:

  1. Data processing effort required to execute the rule

    Each rule is assigned an hourly data processing effort, measured in data processing units (DPUs), according to an analysis of its complexity. The simplest rule incurs an hourly cost of 0.1 DPU.

  2. Interaction throughput of the rule

    The interaction throughput of a rule is the number of data objects it delivers. The cost of accepting a data object depends on the object's source and the licensing agreement we have with the provider. For example, each interaction from Tumblr costs $0.0002. That means, if you accept 1,000 interactions the cost will be $0.20. Note that in order to receive data objects, you must sign the license agreements for some of our data sources on the license page on DataSift's platform.

    *Subject to change.

Monthly subscription

For DataSift's STREAM platform we offer a monthly subscription where you buy a fixed number of DPUs per month for a fixed price. As your streams run they consume your fixed DPU allowance and, separately, incur a variable licensing fee. The licensing fees are calculated depending on the licensing agreement we have with each data provider. Assuming you don't exhaust your DPU allowance, your monthly bill will be the fixed cost plus the licensing fees. If you do exceed your DPU allowance, the excess DPU hours are charged at an overage rate defined by your contract.

You must also set a variable cost limit for your monthly subscription to DataSift. The variable cost limit is the sum of:

  • Licensing costs from your data sources and augmentations
  • Excess DPU costs if you run over your monthly DPU allowance

As long as the combined total of your license costs and excess DPU costs are less than your set variable cost limit, you will be able to consume data normally.

But if you run over your variable cost limit, your streams will stop, and you will receive the following error message as part of your stream:

{"error":"You need to have credits or a valid subscription to use the API."}

The remaining DPU allowance is evaluated every 5th minute so it is possibe to exceed the limit until the allowance is next evaluated.

*Subject to change.

Whereas a rule's data processing rate is certain as soon as it is defined, its throughput is impossible to predict, it can only be estimated. You might want to run some sample executions to get a feel for the throughput cost of a stream.

Notifications

The DataSift billing system calculates the cost of using streams from the DPU rate and licensing costs. DataSift also allows you to monitor your usage by enabling notifications via email and the Dashboard.

You can set a variable cost limit on your account. You will receive notifications when you are close to and if you reach your variable cost limit. You can set or change your variable cost limit any time during the billing cycle.

The first notification is triggered when you have used up 80 percent of your variable cost limit. For example, if you set your variable cost limit to $2,500, you will receive the first notification when you have used up $2,000 on your account. You will receive the second notification when you reach your variable cost limit, at which point we will stop your streams. It is good practice to monitor your usage and ensure that your variable cost limit is always high enough for you to be certain that you will not have any problems for the duration of the month.

Screen%2520Shot%25202013-01-11%2520at%252010.59.39

Preview of notifications in Dashboard

Screen%2520Shot%25202013-01-11%2520at%252010.56.45

Preview of notifications via email

If you notice that you are close to your variable cost limit and then you raise it, you might be below 80 percent of the new limit or you might be above 80 percent of the new limit; it all depends on where you set your new limit.

For example, if you set the variable cost limit to $2,000 on your account, you receive the first notification when you have used up $1,600. Suppose that you receive that notification and you raise the variable cost limit to $2,500. There are two possible scenarios to consider:

  • If you are below 80 percent of the new variable cost limit, which is $2,000, you would receive both the notifications: a warning when you reach 80 percent of the new variable cost limit and then a notification when you reach your variable cost limit.
  • If you were above 80 percent usage, you will only receive a notification when you reach your new variable cost limit.

Billing for Historics queries

You can use Historics queries subject to one-time activation by your account manager. The cost of running a Historics query depends on data processing usage plus licensing costs, and the original DPU complexity of the stream you are running the query on.

Data processing usage for Historics is calculated based on the duration and sample size of the output data. The duration of the query is determined using the timeframe of the query, that is the duration between the start date and time, and the end date and time of the query. The sample size of the output data can be either 100 percent or 10 percent. For all Historics queries, there is a premium on the DPU usage compared to usage for live streaming. DPU usage for the 100% sample size is 125% of what you would pay for live streaming of the same filter. Similarly, for the 10% sample size, the DPU usage is 40% of what you would pay for live streaming of the same filter.

Hence, when you create a Historics query, DataSift is able to calculate the DPU usage before the query is executed. This DPU usage information is displayed on the Confirm New Historic Query page. When running a Historics query through the Historics API, you need to hit the historics/prepare endpoint to create a Historics query and get the total DPU breakdown for your Historics query before it is executed. DPU usage charges are deducted from the monthly DPU allowance.

On the other hand, licensing costs are calculated based on the volume of data retrieved for a particular Historics query. For a given CSDL filter, licensing costs for a Historics query of 100 percent sample size will be more than for a Historics query of 10 percent sample size.

You can view usage statistics for Historics queries on the Billing page. You can view total licensing costs and the DPU usage for your Historics queries. You can also view the volume of data retrieved by a Historics query and the number of Historics hours used. Alternatively, you can hit the usage endpoint in DataSift API which will give you a more accurate figure for the number of objects processed.

Billing for Historics Preview

Each request has a fixed cost of 10 DPUs plus 2 DPUs per day. For example:

  • 1 day = 12 DPU
  • 30 day = 70 DPU

There are no licensing fees charged for a Historics Preview since you will not be actually receiving any interactions matching your filter. You will ony receive aggregate statistics for your selected filter.

The DPUs are deducted from your account only after a complete and successful execution of your Historics Preview request. If your request gets interrupted while it is being processed, you won't get charged. You can only request a single Historics Preview per stream; if you request a new one, the previous request is overwritten.

Billing for Managed Sources

Billing for Managed Sources has two components:

  • There is a charge for the complexity of your query, based on the number and type of operators.
  • Each source is also billed as follows:
Facebook Pages

50 DPUs per Facebook page per month.

Google+ 50 DPUs per Google+ page or keyword search per month.
Instagram

50 DPUs per search term per month. Search terms are:

  • username
  • hashtag
  • image caption
  • location/area
Yammer No charge.

Billing information

Find your DPU cost via the UI

In DataSift's UI you can check the DPU breakdown:

  1. Select a stream.

  2. Click View Definition.

The DPU breakdown appears below your CSDL code.

Find your DPU cost via the API

DataSift's REST API provides a dpu endpoint that gives the total DPU cost for a rule and the breakdown of its individual elements.

api.datasift.com/dpu

For Historics, DataSift's REST API provides a historics/prepare endpoint that gives the total DPU breakdown for a Historics query.

api.datasift.com/historics/prepare

Find your throughput via the API

DataSift's REST API provides a usage endpoint that gives the number of object processed.

api.datasift.com/usage

Cost of operators

Some operators in CSDL have a fixed DPU cost while others have a variable cost.

For fixed-cost operators you simply multiply the number of times you use the operator in a stream by its DPU cost. For example, if you use the contains operator twice in a stream the cost is 0,2 DPUs.

Operator or Keyword DPUs
contains variable - see below
substr 0.1
contains_any variable - see below
contains_all variable - see below
contains_near variable - see below
exists 0.1
in variable - see below
comparisons (==, > and so on) 0.1
regular expressions variable - see below
geo_box 0.1
geo_radius 0.1
geo_polygon variable - see below
tag variable - see below
wildcard variable - see below

Reg​ular Expressions

The DPU cost of a regular expression is calculated as:

      cost = the number of characters in the expression divided by 100.

The minimum charge for one regular expression is 0.1 so, for example, a regular expression that includes 10 characters costs 0.1 DPUs while a regular expression that includes 100 characters costs 1.0 DPUs.

geo_polygon

The DPU cost of a geo_polygon depends on the number of vertices it has. To determine the DPU cost of any geo_polygon, divide the number of vertices by 30.

For example, a hexagon has 6 vertices so it has a DPU cost of 0.2. A triangle has 3 vertices so it has a DPU cost of 0.1.

contains

The DPU cost for the contains operator is based on the number of values you match against and the way you use the operator.

Using the contains operator to find a phrase

interaction.content contains "My dog ate my homework"

In this case, you can match against up to seven values for a cost of 0.1 DPU. The cost increases by 0.1 DPU as you add more words to the matching phrase. Here are the first few DPU cost bands.

Maximum number of values DPUs
7 0.1
15 0.2
23 0.3
31 0.4
39 0.5
and so on...

For example this filter has just one word in the argument so it costs 0.1 DPU:

interaction.content contains "iPad"

This filter has eight words in the argument so it costs 0.2 DPU:

interaction.content contains "iPad is my favorite tablet device right now"

Using the contains operator to find individual words

interaction.content contains "xxx" and
interaction.content contains "yyy" and     
interaction.content contains "zzz"`

In this case, you can match against up to three values costs 0.1 DPU. The cost increases by 0.1 DPU for every four extra values you add. Here are the first few DPU cost bands.

Maximum number of values DPUs
3 0.1
7 0.2
11 0.3
15 0.4
19 0.5
and so on...

in/contains_any/contains_all

The DPU cost for the in, contains_any and contains_all operators is based on the number of values you match against. The following table shows the DPU cost for any filter that uses these operators.

For example, this filter matches against 10 values so it costs 0.2 DPUs.

interaction.content contains_any "apple, microsoft, hp, dell, oracle, google, yahoo, ebay, amazon, facebook"

Maximum number of values DPUs
9 0.1
19 0.2
29 0.3
39 0.4
...
100 1
1,000 2
10,000 4
100,000 8

The exact cost is determined using a sliding scale, so if you have 99 values in the command, the cost will be slightly lower than 1 DPU. Note that the table shows how we calculate DPU costs for a list of single keywords. In practise, you will often write filters that use the contains_any keyword with a list of phrases of varying length. For example:

interaction.content contains_any "Yesterday, Yellow Submarine, The Long and Winding Road"

When you use Phrases the DPU cost is calculated is a similar way to individual words but rather than counting the number of phrases, you count the number of words within all the phrases. You then look up this count on the same discount curve table. However the phrases word count and individual words count are not added together before the discount curve is applied, instead they each have their own copy of the discount curve.

For example, this condition has 4 words and 1 phrase.

interaction.content contains_any "a,b,c,d, e f g"

It is charged on the "contains_any" discount curve as:

contains_any: 4 + contains_any: 3 = 0.1 DPU + 0.1 DPU = 0.2 DPU

The difference between in and any

There is a subtle difference between the way in and contains_any are handled internally. For example, since the language.tag target contains short, well-defined codes, the correct and most efficient way to filter on it is with the contains_any operator.

language.tag in "en, fr, it"

You could use contains_any but it is not recommended. However, the overall cost in either case would be the same. Each example shown here has three values in the argument.

language.tag contains_any "en, fr, it"

Suppose you wanted to write a similar filter, this time looking for UK English, which has the language code en-gb. When you use the contains_any operator, DataSift tokenizes the incoming interactions and so the string "en-gb" becomes three separate tokens:

  • en
  • -
  • gb

However, with the in operator tokenization is not performed. Therefore, this filter is billed as having three values in the argument:

language.tag in "en-gb, fr, it"

This one is billed as having five values in the argument:

language.tag contains_any "en-gb, fr, it"

We recommend that you check the DPU cost before you run a filter. The /compile endpoint returns a JSON object that includes the DPU cost.

contains_near

We calculate DPUs based on the number of words you filter for, independently of the distance between them.

Number of values DPUs
2 0.2
3 0.4
4 0.6
5 0.8
... ...

Substr

When used on the same target, the substr operator is billed at 0.1 DPUs for each tier of four uses.

Maximum number of uses DPUs
4 0.1
8 0.2
12 0.3
16 0.4
... ...

Otherwise, you are billed 0.1 DPU each time you use substr.

Comparisons

This set of operators comprises: ==, !=. >, >=, <=, and <.

When used on the same target, a comparison operator is billed at 0.1 DPUs for each tier of five uses.

Maximum number of uses DPUs
5 0.1
10 0.2
15 0.3
20 0.4
... ...

Otherwise, you are billed 0.1 DPU each time you use a comparison operator.

Wildcard

Wildcards are charged at double the cost of the contains_any operator.

Tags

Simple tagging

Operators used inside a tag statement are normally charged at 10% of their usual DPU cost.

For example, if the normal cost of a rule is 1 DPU, that same code inside a tag statement would cost 0.1 DPU.

Advanced tagging

If you use tags with namespaces or scoring rules, or cascade tags from one filter to another, the pricing is based on the combined cost of operators in the tagging logic and in the filter definition. We simply count how many times each operator appears, and calculate the overall cost. For example, if you use the contains operator nine times in your tagging and you use it twice in your filtering logic, you will be charged for 11 uses of that operator.

It doesn't matter whether you include tags from external files or define them locally, the cost is the same. Similarly, it doesn't matter whether you define your filter locally or include part of it using the stream keyword, the cost is the same.

Chunking and Punctuation

Foreign-language chunking adds a surcharge to the DPUs for any filter they appear in. For example, suppose the cost of a filter is 2 DPU. If you use Japanese chunking in that rule, it adds 20% to the overall cost. That is: 1.2 * 2 = 2.4 DPU. This is a one-off fee for the filter so it doesn't matter whether you use Japanese chunking once or 50 times in a filter, the cost is the same.

If you use Mandarin chunking as well as Japanese chunking in a filter, you will be charged the 20% surcharge twice, so we multiply your DPU cost by 1.4 in this case.

Similarly, punctuation introduces a surcharge of 10% for each element in keep or drop. By element we mean:

  • classic
  • default
  • extended
  • any single punctuation character

So, if you want to use the extended character set and drop commas, the surcharge is 10% + 10%.

The DPUs are rounded to the nearest 0.1.