By default, a stream in DataSift looks at all of the input objects that arrive from your chosen data sources. For example, this filter looks at every input object sent to DataSift along the Twitter Firehose:
In situations where you are performing statistical analysis on data, you can use the technique of sampling.
The interaction.sample target is an internally generated floating-point random number between 0 and 100.
This filter samples 5.25 percent of the incoming input objects and ignores the rest:
Twitter limits your to 500,000 Tweets in a 24-hour period. You can use interaction.sample to reduce your data consumption.
1. To sample 1 percent of incoming Tweets:
2. Filter for all the Tweets that mention "coffee" and for a 10% sample of the Retweets that mention coffee:
3. You can even nest the samples: