By default, DataSift filters against all the data from your chosen sources. For example, this filter looks at every post sent to DataSift from Tumblr:
tumblr.text contains "Cincinnati Reds"
In situations where you are performing statistical analysis on data, you can use the technique of sampling.
The interaction.sample target is an internally generated floating-point random number between 0 and 100.
This filter samples 5.25 percent of the incoming input objects and ignores the rest:
tumblr.text contains "Cincinnati Reds" and interaction.sample < 5.25
You can use interaction.sample to reduce your data consumption.
1. Filter for a sample of 1 percent of incoming Tumblr interactions:
interaction.sample <= 1 and interaction.type == "tumblr"
2. Filter for all the Tumblr posts that mention "coffee" and for a 10-percent sample of the posts that mention tea:
tumblr.text contains "coffee" or \n( \n\ttumblr.text contains "tea" and \n\tinteraction.sample <= 10 \n)
3. You can even nest the samples:
interaction.sample < 50 \nand \n( \n\ttumblr.text contains "coffee" or \n\t( \n \t \t tumblr.text contains "tea" and \n \t \t interaction.sample <= 10 \n\t)\n)