Understanding Billing

Updated on Wednesday, 1 May, 2013 - 10:50

Billing details are explained in full on our Terms page. The cost of a stream depends on how many operators you include. Some operators are more expensive than others. All streams have a fixed cost; some have a variable cost too because some data suppliers charge for their content. Billing for Historics also works in the same way. The following page will provide you with detailed information on how our billing system works. 

Don't forget to take a look at our Billing FAQ too.

Overview

You can compile and preview streams free of charge through the website GUI.

There is a charge to use streams through our APIs. The cost of using a stream via the API is a function of two variables:

 

Data processing effort required to execute the rule

Each rule is assigned an hourly data processing effort, measured in data processing units (DPUs), according to an analysis of its complexity. The simplest rule incurs an hourly cost of 0.1 DPU. However, note that DataSift's minimum charge rate is 1 DPU per hour. Therefore, you can run ten 0.1 DPU streams simultaneously for the same overall DPU cost as one.

Interaction throughput of the rule

The interaction throughput of a rule is the number of data objects it delivers. The cost of accepting a data object depends on the object's source and the licensing agreement we have with the provider. For example, each accepted Tweet costs $0.0001*. That means, if you accept 1,000* Tweets the cost will be $0.10. Note that in order to receive data objects, you must sign the license agreements for a number of data sources, including Twitter, on the license page of the website. 

 

*Subject to change.

Payment plans

There are two types of payment plan which differ in the approach to charging for the data processing cost, allowing you to optimize according to your usage pattern:

 

On Demand

Each DPU is charged at a fixed rate of $0.20 per hour* so, for example a rule rated at 1.5 DPU is charged at $0.30* per hour.
 

Note that DataSift's minimum charge is $0.20 per hour, so a 0.5 DPU rule would cost $0.20 per hour. If you use DataSift's multistream capability and run 10 streams simultaneously, and all those streams are rated at 0.1 DPU, the total is 1 DPU and so the total cost to run the stream is $0.20.
 

Whenever you want, you can buy credits in increments of $10 which allow you to run streams. As your streams run we continuously compute the combined DPU and throughput cost and reduce your credit balance. If your balance drops to zero, your streams stop until you top up your balance.

 

Monthly subscription

You agree to buy a fixed number of DPU hours per month for a fixed price. As your streams run they consume your fixed DPU allowance and, separately, incur a variable licensing fee. The licensing fees are calculated depending on the licensing agreement we have with the provider. Assuming you don't exhaust your DPU allowance, your monthly bill will be the fixed cost plus the licensing fees. If you do exceed your DPU allowance, the excess DPU hours are charged at the on-demand rate.
 

You must also set a variable cost limit for your monthly subscription to DataSift. The variable cost limit is the sum of:

  • Licensing costs from your data sources and augmentations
  • Excess DPU costs if you run over your monthly DPU allowance 

As long as the combined total of your license costs and excess DPU costs are less than your set variable cost limit, you will be able to consume data normally.
 

But if you run over your variable cost limit, your streams will stop, and you will receive the following error message as part of your stream:

  {"error":"You need to have credits or a valid subscription to use the API."}

 

*Subject to change.

Whereas a rule's data processing rate is certain as soon as it is defined, its throughput is impossible to predict, it can only be estimated. You might want to run some sample executions to get a feel for the throughput cost of a stream.

Notifications

The DataSift billing system calculates the cost of using streams from the DPU rate and licensing costs. DataSift also allows you to monitor your usage by enabling notifications via email and the Dashboard. The notifications vary depending on the type of payment plan you are on.

On Demand

If you choose the On Demand plan, you will receive notifications if your credit balance runs low or falls to zero.

Monthly subscription

In a monthly subscription, you can set a variable cost limit on your account. You will receive notifications when you are close to and if you reach your variable cost limit. You can set or change your variable cost limit any time during the billing cycle.

The first notification is triggered when you have used up 80 percent of your variable cost limit. For example, if you set your variable cost limit to $2,500, you will receive the first notification when you have used up $2,000 on your account. You will receive the second notification when you reach your variable cost limit, at which point we will stop your streams. It is good practice to monitor your usage and ensure that your variable cost limit is always high enough for you to be certain that you will not have any problems for the duration of the month.
 

          

                                                                             Preview of notifications in Dashboard

 

           

                                                                             Preview of notifications via email
 

If you notice that you are close to your variable cost limit and then you raise it, you might be below 80 percent of the new limit or you might be above 80 percent of the new limit; it all depends on where you set your new limit.

For example, if you set the variable cost limit to $2,000 on your account, you receive the first notification when you have used up $1,600. Suppose that you receive that notification and you raise the variable cost limit to $2,500. There are two possible scenarios to consider:

  • If you are below 80 percent of the new variable cost limit, which is $2,000, you would receive both the notifications: a warning when you reach 80 percent of the new variable cost limit and then a notification when you reach your variable cost limit.
  • If you were above 80 percent usage, you will only receive a notification when you reach your new variable cost limit.

 

Billing for Historics queries

You can use Historics queries if you are on a monthly subscription, subject to one-time activation by your account manager. The cost of running a Historics query depends on data processing usage plus licensing costs, and the original DPU complexity of the stream you are running the query on.

Data processing usage for Historics is calculated based on the duration and sample size of the output data. The duration of the query is determined using the timeframe of the query, that is the duration between the start date and time, and the end date and time of the query. The sample size of the output data can be either 100 percent or 10 percent. For all Historics queries, there is a premium on the DPU usage compared to usage for live streaming. DPU usage for the 100% sample size is 125% of what you would pay for live streaming of the same filter. Similarly, for the 10% sample size, the DPU usage is 40% of what you would pay for live streaming of the same filter.

Hence, when you create a Historics query, DataSift is able to calculate the DPU usage before the query is executed. This DPU usage information is displayed on the Confirm New Historic Query page. When running a Historics query through the Historics API, you need to hit the historics/prepare endpoint to create a Historics query and get the total DPU breakdown for your Historics query before it is executed. DPU usage charges are deducted from the monthly DPU allowance. 

On the other hand, licensing costs are calculated based on the volume of data retrieved for a particular Historics query. For a given CSDL filter, licensing costs for a Historics query of 100 percent sample size will be more than for a Historics query of 10 percent sample size.

You can view usage statistics for Historics queries on the Billing page. You can view total licensing costs and the DPU usage for your Historics queries. You can also view the volume of data retrieved by a Historics query and the number of Historics hours used. Alternatively, you can hit the usage endpoint in DataSift API which will give you a more accurate figure for the number of objects processed.
 

          

Billing for Historics Preview

Historics Preview is available for all accounts, on any payment plan, be it Subscription or Pay As You Go. Each request has a fixed cost of 20 DPUs. There are no licensing fees charged for a Historics Preview since you will not be actually receiving any interactions matching your filter. You will ony receive aggregate statistics for your selected filter.

The 20 DPUs are deducted from your account only after a complete and successful execution of your Historics Preview request. If your request gets interrupted while it is being processed, you won't get charged. You can only request a single Historics Preview per stream; if you request a new one, the previous request is overwritten.
 

Billing information

Find your DPU cost via the GUI

In DataSift's GUI you can check the DPU breakdown:

1. Select a stream

2. Click View Definition

The DPU breakdown appears below your CSDL code.

Find your DPU cost via the API

DataSift's REST API provides a dpu endpoint that gives the total DPU cost for a rule and the breakdown of its individual elements. 

    api.datasift.com/dpu

For Historics, DataSift's REST API provides a historics/prepare endpoint that gives the total DPU breakdown for a Historics query.

    api.datasift.com/historics/prepare

Find your throughput via the API

DataSift's REST API provides a usage endpoint that gives the number of object processed.

    api.datasift.com/usage


Cost of operators

Some operators in CSDL have a fixed DPU cost while others have a variable cost.

For fixed-cost operators you simply multiply the number of times you use the operator in a stream by its DPU cost. For example, if you use the contains operator twice in a stream the cost is 0,2 DPUs. 

Operator or Keyword DPUs
contains variable - see below
substr 0.1
contains_any variable - see below
contains_near 0.2
exists 0.1
in variable - see below
comparisons (==, > and so on) 0.1
regular expressions variable - see below
geo_box 0.1
geo_radius 0.1
geo_polygon variable - see below
tag variable - see below


Reg​ular Expressions 

The DPU cost of a regular expression is calculated as:

          cost = the number of characters in the expression divided by 100.

The minimum charge for one regular expression is 0.1 so, for example, a regular expression that includes 10 characters costs 0.1 DPUs while a regular expression that includes 100 characters costs 1.0 DPUs.

geo_polygon 

The DPU cost of a geo_polygon depends on the number of vertices it has. To determine the DPU cost of any geo_polygon, divide the number of vertices by 30.

For example, a hexagon has 6 vertices so it has a DPU cost of 0.2. A triangle has 3 vertices so it has a DPU cost of 0.1.

contains

The DPU cost for the contains operator is based on the number of values you match against and the way you use the operator.

Using the contains operator to find a phrase

    twitter.text contains "My dog ate my homework"

In this case, you can match against up to seven values for a cost of 0.1 DPU. The cost increases by 0.1 DPU as you add more words to the matching phrase. Here are the first few DPU cost bands.

Maximum number of values DPUs
7 0.1
15 0.2
23 0.3
31 0.4
39 0.5
and so on...  

For example this filter has just one word in the argument so it costs 0.1 DPU:

    twitter.text contains "iPad"

This filter has eight words in the argument so it costs 0.2 DPU:

    twitter.text contains "iPad is my favorite tablet device right now"

Using the contains operator to find individual words

    twitter.text contains "xxx" and
    twitter.text contains "yyy" and
    twitter.text contains "zzz"

In this case, you can match against up to three values costs 0.1 DPU. The cost increases by 0.1 DPU for every four extra values you add. Here are the first few DPU cost bands.

Maximum number of values DPUs
3 0.1
7 0.2
11 0.3
15 0.4
19 0.5
and so on...  

in/contains_any

The DPU cost for the in and contains_any operators is based on the number of values you match against. The following table shows the DPU cost for any filter that uses these operators.

For example, this filter matches against 10 values so it costs 0.2 DPUs.

    twitter.text contains_any "apple, microsoft, hp, dell, oracle, google, yahoo, ebay, amazon, facebook"
 

Maximum number of values DPUs
9 0.1
19 0.2
29 0.3
39 0.4
...  
100 1
1,000 2
10,000 4
100,000 8

The exact cost is determined using a sliding scale, so if you have 99 values in the command, the cost will be slightly lower than 1 DPU. Note that the table shows how we calculate DPU costs for a list of single keywords. In practise, you will often write filters that use the contains_any keyword with a list of phrases of varying length. For example:

    twitter.text contains_any "Yesterday, Yellow Submarine, The Long and Winding Road"

Since phrases take longer for DataSift to process than single keywords, the DPU cost is slightly higher. For example, a list of 30 single keywords with the contains_any operator incurs a DPU cost of 0.4. However, if you filter for 10 phrases, each of three words, the DPU cost is 0.5.

We recommend that you check the DPU cost before you run a filter. The /compile endpoint returns a JSON object that includes the DPU cost.

Tags

Operators used inside a tag statement are normally charged at 10% of their usual DPU cost.  

For example, if the normal cost of a rule is 1 DPU, that same code inside a tag statement would cost 0.1 DPU.

If the normal cost is less than 1 DPU, there is no charge.