Feel free to jump right in with the Historics API: Step-by-Step guide.
The DataSift Historics archive is a large body of content gathered from a variety of social media sites. Historics is useful when you want to turn the clock back and filter against data from the past.
It uses the same CSDL language that we use for live streaming. For Twitter, it works 100 times faster than live streaming; it offers 100 percent coverage but can be run on a sample of 10 percent.
Historics jobs run as 'batch' processes in our cluster. You specify the time range that you want to look at and submit 'jobs' to gather the data.
When you query the Historics archive, we give you clear guidance of data availability. You'll see on screen that we have coverage for the days you've selected. Here's a snapshot of the archive in our staging environment (not production) where data for July, 17 has not been loaded.
Note that your Historics queries are run in the timezone you set on your profile in DataSift. You can change it at any time.
You can stop a Historics halfway through. You're billed for the work done so far, and for any data received.
How does billing work with Historics?
Push costs nothing. You pay exactly the same whether you use Push or not.
To learn more, please take a look at our Billing page.
Don't miss A Journey into Optimizing Hadoop Jobs by Lorenzo Alberton, DataSift Chief Technical Architect.