Historics Preview

The amount of information stored inside DataSift's archives of online interactions is measured in petabytes. Normally, you filter on it using our Historics API. The Historics Preview API gives you a peek into the archive to see, for example, how much data a Historics query might deliver. It gives you a way to judge the fitness of the filter you create or the data set you want to apply it to.

Historics Preview is a special case of a Historics query that takes a random 1-percent sample from the archive and runs your query against that data. It still gets queued like ordinary Historics queries but, because it is optimized for speed, you receive your results much sooner.

Taking a 1-percent sample has a number of advantages:

  • It is statistically significant for most cases.
  • DataSift performs up to twenty different statistical analyses; you don't have to evaluate large volumes of data yourself.

However, remember that the sample is randomly chosen so there are cases where you will receive less data or, indeed, more data than the preview predicts.

You pay just 10 DPUs as a base cost plus 2 DPUs for every day the preview covers. For example, running a preview over one day costs 12 DPUs while running it for 32 days, the maximum period, costs 74 DPUs.

Any previews that you create are available for inspection for 14 days before they expire.

Unlike a Historics query, Historics Preview returns results of a statistical analysis performed on the data you want to filter for, but it does not return the actual interactions. If you want to receive the interactions used to construct the preview, you will need to run a Historics query using the same filter. Historics Preview can analyze a period of up to 30 days.

Historics Preview can also return a statistically valid estimate of the fitness of the data set for a particular research purpose, before you run a full Historics query. In fact, sometimes it may be all you need. For example, it's ideal if you are looking for a quick answer to a question such as, 'What were the top 10 popular words on Tumblr an hour ago.'

Get started now

