By default, DataSift filters against all the data from your chosen sources. For example, this filter looks at every post sent to DataSift from Tumblr:
tumblr.text contains "Cincinnati Reds"
In situations where you are performing statistical analysis on data, you can use the technique of sampling.
The interaction.sample target is an internally generated floating-point random number between 0 and 100.
This filter samples 5.25 percent of the incoming input objects and ignores the rest:
tumblr.text contains "Cincinnati Reds" and interaction.sample < 5.25
You can use interaction.sample to reduce your data consumption.
Filter for a sample of 1 percent of incoming Tumblr interactions:
interaction.sample <= 1 and interaction.type == "tumblr"
Filter for all the Tumblr posts that mention "coffee" and for a 10-percent sample of the posts that mention tea:
tumblr.text contains "coffee" or ( tumblr.text contains "tea" and interaction.sample <= 10 )
You can even nest the samples:
interaction.sample < 50 and ( tumblr.text contains "coffee" or ( tumblr.text contains "tea" and interaction.sample <= 10 ) )
Target service: The Common Target: Interaction
Target object: Interaction
Always exists: Yes