Platform Updates - Content Age Filtering, Larger Compressed Data Deliveries

Richard Caudle | 30th April 2014

This is a quick post to update you on some changes we've introduced recently to help you work with our platform and make your life a little easier.

Filtering On Content Age

We aim to deliver you data as soon as we possibly can, but for some sources there can be a delay between publication to the web and our delivery which is out of our control.

In most cases this does not have an impact, but in some situations (perhaps you only want to display extremely fresh content to a user) this is an issue.

For these sources we have introduced a new target, .age, which allows you to specify the maximum time since the content was posted. For instance if you want to filter on blog posts mentioning 'DataSift', making sure that you only receive posts published within the last hour:

blog.content contains "DataSift" AND blog.age < 3600

This new target applies to the Blog, Board, DailyMotion, IMDB, Reddit, Topix, Video and YouTube sources.

Push Destinations - New Payload Options

Many of our customers are telling us they can take much larger data volumes from our system. We aim to please, so have introduced options to help you get more data quicker.

Increased Payload Sizes

To enable you to receive more data quicker from our push connectors, we have upped the maximum delivery sizes for many of our destinations. See the table below for the new maximum delivery sizes.

Compression Support

As the data we deliver to you is text, compression can be used to greatly reduce the size of files we deliver, making transport far more efficient. Although compression rates do vary, we are typically seeing an 80% reduction in file size with this option enabled.

We have introduced GZip and ZLib compression to our most popular destinations. You can enable compression on a destination by selecting the option in your dashboard, or by specifying the output_param.compression parameter through the API.

When data is delivered you can tell it has been compressed in two ways:

HTTP destination: The HTTP header 'X-DataSift-Compression' will have the value none, zlib or gzip as appropriate

S3, SFTP destinations: Files delivered to your destination will have an addition '.gz' extension is they have been compressed, for example DataSift-xxxxxxxxxxxxxxxxxxx-yyyyyyy.json.gz

Here's a summary of our current push destinations support for these features.

Destination Maximum Payload Size Compression Support
HTTP 200 MB GZip, ZLib
S3 200 MB GZip
CouchDB 50 MB
ElasticSearch 200 MB
FTP 200 MB
MongoDB 50 MB
PostgreSQL 50 MB
Pull 50 MB
Redis 50 MB
Splunk 50 MB

Stay Up-To-Date

To stay in touch with all the latest developer news please subscribe to our RSS feed at

And, or follow us on Twitter at @DataSiftDev

Previous post: Chinese Tokenization - Generate Accurate Insight From Chinese Sources Including Sina Weibo

Next post: Facebook Pages Managed Source Enhancements