Feel free to jump right in with the Historics API: Step-by-Step guide.
NoteThe Historics archive is not currently available to trial customers nor to Pay As You Go customers. If you would like to use Historics, the folks on our Sales Team are happy to help you.
The DataSift Historics archive is a large body of content gathered from a variety of social media sites. Historics is useful when you want to turn the clock back and filter against data from the past.
It uses the same CSDL language that we use for live streaming but it works much faster than live streaming; it offers 100 percent coverage but can be run on a sample of 10 percent.
Historics jobs run as 'batch' processes in our cluster. You specify the time range that you want to look at and submit 'jobs' to gather the data.
When you query the Historics archive, we give you clear guidance of data availability. You'll see on screen that we have coverage for the days you've selected. Here's a snapshot of the archive in our staging environment (not production) where data for July, 17 has not been loaded.
Note that your Historics queries are run in the timezone you set on your profile in DataSift. You can change it at any time.
Note also that the end time for any Historics query must be at least one hour in the past.
You can stop a Historics halfway through. You're billed for the work done so far, and for any data received.
What data is available in the archive?
At the time of writing, we have data from these sources and augmentations in the archive.
- Salience Sentiment
- Salience Entities
- Salience Topics
For the latest information, and to find out how far back the archive data goes for each source, consult our Historics Archive Schema pages.
How does billing work with Historics?
Push costs nothing. You pay exactly the same whether you use Push or not.
To learn more, please take a look at our Billing page.
Don't miss A Journey into Optimizing Hadoop Jobs by Lorenzo Alberton, DataSift Chief Technology Officer.