By default, DataSift filters against all the data from your chosen sources. For example, this filter looks at every post sent to DataSift from Tumblr:

tumblr.text contains "Cincinnati Reds"

In situations where you are performing statistical analysis on data, you can use the technique of sampling.

The interaction.sample target is an internally generated floating-point random number between 0 and 100.

This filter samples 5.25 percent of the incoming input objects and ignores the rest:

tumblr.text contains "Cincinnati Reds" and interaction.sample < 5.25

Rate Limiting

You can use interaction.sample to reduce your data consumption.

1.  Filter for a sample of 1 percent of incoming Tumblr interactions:

interaction.sample <= 1 and interaction.type == "tumblr"

2.  Filter for all the Tumblr posts that mention "coffee" and for a 10-percent sample of the posts that mention tea:

tumblr.text contains "coffee" or \n( \n\ttumblr.text contains "tea" and \n\tinteraction.sample <= 10 \n)

3.  You can even nest the samples:

interaction.sample < 50 \nand \n( \n\ttumblr.text contains "coffee" or \n\t( \n \t \t tumblr.text contains "tea" and \n \t \t interaction.sample <= 10 \n\t)\n)