As an alternative to HTTP Streaming and WebSockets, which require you to maintain a long-living socket connection with us, we offer a more robust and efficient way to deliver data to your servers using HTTP POST or PUT methods. In this scenario, DataSift acts like a user uploading a file to an HTTP server via a web browser. The interactions you filter for will be delivered in batches, in the JSON format. You will need to set up your own HTTP server and write code to handle the uploads. Your responsibility is to ensure that your HTTP server can handle the volume of data sent by DataSift.

Here are some guidelines to help you write code to use HTTP destinations successfully with Push. Read the step-by-step guide to the API to learn how to work with DataSift's APIs.

Configuring HTTP for Push delivery

When you choose to use your own HTTP server to receive interactions from DataSift you need to write code to handle communication with us properly. You are free to choose any technology and software you like, provided you follow the instructions below. For your convenience, we have also provided source code of a simple non-authenticating, non-SSL HTTP server that you can run on your side. It is designed to help you test this connector and you are free to use it and extend it, but it is not supported.


The DataSift HTTP connector currently offers one HTTP authentication method: basic authentication. The alternative is to use no authentication. When you combine basic authentication with a firewall that only lets traffic from and to the range of IP addresses managed by DataSift and SSL (see the appropriate section later on this page), you will end up with a very secure communication channel.

SSL Support

For extra security, DataSift can deliver interactions over a secure SSL connection. You need to turn it on and let DataSift know if you want us to verify the validity of your SSL certificate. See the list of the output parameters.

You might receive this error message:

"SSL error. Do you require a signed HTTP payload or are you using a self signed certificate?"

It appears for one of the following reasons:

  • Your CA Receiver Certificate has a passphrase.
  • The CA Receiver Certificate is not a valid public certificate.
  • Your SSL server does not trust the CA, Gandi, used to sign our certificate.


DataSift supports two HTTP methods for data delivery: POST and PUT. You can set the delivery method in the parameters of a /push/create call for the HTTP connector.

HTTP headers

Requests made by the DataSift's HTTP connector include additional headers. They contain useful information about the data sent to your HTTP server. You can use it to create unique filenames, database rows/tables, or content handlers.

Element Content
X-Datasift-Hash For a Historics query, this contains the Historics Id.
For a stream, this contains the stream hash.
X-Datasift-Hash-Type Either "historic" or "stream".
X-Datasift-Id The subscription id for this query.
X-Datasift-Remaining-Bytes The number of bytes remaining in the buffer.
X-Datasift-Compression-Type The compression type can be: "none", "gzip", or "zlib".
X-DataSift-Failure-Count How many times have we failed to deliver this (when the HTTP server failed was unavailable or not running).
X-DataSift-Payload-ID Unique identifier for the payload.
Content-Encoding Set to "gzip" if the content is in GZIP format. If the content is not compressed, this header does not need to be present.

GZIP compression

The default data delivery format used by DataSift in uncompressed plain-text JSON. When your server is not capable of processing large amounts of data or when you do not have enough bandwidth, you should consider using compression. DataSift is happy to deliver compressed data. All it takes is adding another parameter in a /push/create call. Remember to store and uncompress the data you are receiving on your side.

Responding quickly

When receiving a batch of data, your server must respond with a success message within 10 seconds. Otherwise, the call will time out and the delivery will be considered a failure, and reattempted. Please make sure your code can process and store the data fast enough.

Handling POST requests

DataSift needs to make sure that the HTTP server to which it will try to send the data can accept the data. Your server must pass a simple test:

  1. The first thing that DataSift does with HTTP POST is to send an empty JSON object to your URL:


    The empty JSON object string will be sent in the body of the request. Your server-side code must recognize it and react accordingly (see the next step).

  2. Your server must send back a success message with a status code of 200 to 299, otherwise DataSift will issue an error message and will not send you data. The JSON success message looks like this:

        "success": true

    There is no need to call /push/create repeatedly, you can use a call to /push/validate to send the empty JSON object and checks that your server sends the success message.


If you use PUT, you must send back a message with a status code of 200 to 299, but DataSift does not check the content of the message.

Testing Push with a test HTTP server

To help you test the HTTP connector we are giving you some code to play with. The following Python script implements a test HTTP server that you can use as a starting point for your own servers. It performs no authentication or SSL, but it does support compressed data delivery. Whatever data it receives will be written to files in the local directory. The files will use the DataSift--.json filename pattern for uncompressed data and DataSift--.gz for compressed data. You need Python 2.7 or later and a local installation of the Tornado HTTP server. Once you have both pieces of software installed, run the server:
python ./

import tornado.ioloop
import tornado.web
import time

class MainHandler(tornado.web.RequestHandler):

    def post(self):

        ts = str(int(time.time()))
        of = "json"

        if self.request.body == '{}':
                print 'Got {} from DataSift!'

        if 'X-Datasift-Hash' in self.request.headers:
                print 'Got X-Datasift-Hash: ',
                print self.request.headers['X-Datasift-Hash']

        if 'X-Datasift-Hash-Type' in self.request.headers:
                print 'Got X-Datasift-Hash-Type:',
                print self.request.headers['X-Datasift-Hash-Type']

        if 'Content-Encoding' in self.request.headers:
                print 'Got Content-Encoding:',
                of = self.request.headers['Content-Encoding']
                print of

        if 'X-Datasift-Id' in self.request.headers:
                print 'Got X-Datasift-Id:',
                print self.request.headers['X-Datasift-Id']

                if of == 'gzip':
                        f = open("/tmp/DataSift-%s-%s.gz" % (str(self.request.headers['X-Datasift-Id']) , ts), 'w')
                        f = open("/tmp/DataSift-%s-%s.json" % (str(self.request.headers['X-Datasift-Id']) , ts), 'w')


        print self.request.headers

        self.write('{"success": true}')

application = tornado.web.Application([
    (r"/", MainHandler),

if __name__ == "__main__":

Configuring Push for HTTP delivery

  1. To enable delivery, you will need to define a stream or a Historics query. Both return important details required for a Push subscription. A succesful stream definition returns a hash, a Historics query returns an id. You will need either (but not both) to set the value of the hash or historic_id parameters in a call to /push/create. You need to make a call to /push/get or /historics/get to obtain that information or you can use the DataSift dashboard.

  2. Once you have the stream hash or the Historics id, you can supply that information to /push/create. In the example below we are making that call using curl, but you are free to use any programming language or tool.

    curl -X POST '' \
    -d 'name=connectorhttp' \
    -d 'hash=42d388f8b1db997faaf7dab487f11290' \
    -d 'output_type=http' \
    -d 'output_params.method=post' \
    -d 'output_params.url=' \
    -d 'output_params.use_gzip' \
    -d 'output_params.delivery_frequency=60' \
    -d 'output_params.max_size=10485760' \
    -d 'output_params.verify_ssl=false' \
    -d 'output_params.auth.type=none' \
    -d 'output_params.auth.username=YourHTTPServerUsername' \
    -d 'output_params.auth.password=YourHTTPServerPassword' \
    -H 'Authorization: datasift-user:your-datasift-api-key'

  3. For more information, read the step-by-step guide to the API to learn how to use Push with DataSift's APIs.

  4. When a call to /push/create is successful, you will receive a response that contains a Push subscription id. You will need that information to make successful calls to all other Push API endpoints (/push/delete, /push/stop, and others). You can retrieve the list of your subscription ids with a call to /push/get.

  5. You should now check that the data is being delivered to your server. If you are using your own custom solution, you will know how to do it, but if you are using our test server, you need to log in to the machine running our test HTTP server and list the contents of /tmp:

    ls /tmp/DataSift\*

    When DataSift is able to connect and deliver interactions to this directory, the test HTTP server will use filenames that follow the patterns described in the "Testing Push with a test HTTP server" section earlier on this page. Please remember that the earliest time you can expect the first data delivery is one second after the period of time specified in the output_params.delivery_frequency parameter. If there is a longer delay, this might be due to the fact that the stream has no data in it or there is a problem with your server's configuration. In the first case, preview your stream using the DataSift web console and in the second case, make a call to /push/log to see if that has any additional information.

    Please make sure that you watch your usage and add funds to your account when it is running low. Also, stop any subscriptions that are no longer needed otherwise you will be charged for their usage. There is no need to delete them. You can can have as many stopped subscriptions as you like without paying for them. Remember that any subscriptions that were paused automatically due to insufficient funds, will resume when you add funds to your account.

  6. To stop delivery, call /push/stop. To remove your subscription completely, call /push/delete.

  7. Familiarize yourself with the output parameters (for example, the host name) you'll need to know when you send data to an HTTP server.

Output parameters

DataSift sends JSON data once it becomes available, according to the "delivery_frequency" interval you configure in the output parameters. Data is bundled into batches of up to "max_size" bytes. Your server must respond with a success message upon each delivery.

Parameter: Description:
default = json_meta
The output format for your data:
  • json_meta - The current default format, where each payload contains a full JSON document. It contains metadata and an "interactions" property that has an array of interactions.
  • json_array - The payload is a full JSON document, but just has an array of interactions.
  • json_new_line - The payload is NOT a full JSON document. Each interaction is flattened and separated by a line break.

If you omit this parameter or set it to json_meta, your output consists of JSON metadata followed by a JSON array of interactions (wrapped in square brackets and separated by commas).

Take a look at our Sample Output for File-Based Connectors page.

If you select json_array, DataSift omits the metadata and sends just the array of interactions.

If you select json_new_line, DataSift omits the metadata and sends each interaction as a single JSON object.

The verb that you want DataSift to use with the HTTP request:
  • POST
  • PUT

Any valid URL that you want DataSift to deliver to; for example:

For POST requests:

DataSift uses the URL that you specify.

For PUT requests:

DataSift appends a filename to the URL.

For example, suppose that you supply this URL:

Internally we append the filename in this format: DataSift--

When you hit the /push/create endpoint for the first time, we make a PUT request to (where there isn't a subscription id created yet, so we add "verify" into the file name and 31546216 is the time of test).

When DataSift has data ready for delivery using a PUT request, it sends it to (where the abcdefghij1234579 is the subscription id and 31546216 is the time of delivery).

Make sure that the URL is properly encoded, otherwise your /push/create request will fail. This on-line tool can help you encode data for an HTTP request.


The minimum number of seconds you want DataSift to wait before sending data again:

Learn more...

The maximum amount of data that DataSift will send in a single batch:

  • 102400 (100KB)
  • 256000 (250KB)
  • 512000 (500KB)
  • 1048576 (1MB)
  • 2097152 (2MB)
  • 5242880 (5MB)
  • 10485760 (10MB)
  • 20971520 (20MB)

Note: if you are using compression, output_params.max_size is the uncompressed size of the data. In other words, it is the amount of data we take from the buffer.


The authentication that you want DataSift to use when connecting to output_params.url:

  • basic
  • none

If you choose basic authentication, you must supply output_params.auth.username and output_params.auth.password.

If you specify "none" for authentication, or if you do not include this parameter, DataSift does not check for a username or password.


Specify whether or not you want DataSift to verify your SSL certificate, checking that it originates from a legitimate Certificate Authority. Can be:

  • true
  • false

The compression setting that you want DataSift to use:

  • none
  • zlib
  • gzip

If you set this parameter to zlib, DataSift compresses the data using the ZLIB compression standard with the compression level 6. We also add an additional entry to the header if you choose zlib or gzip:

Content-Encoding: gzip

Valid options are an empty string or "datasift_public". If "datasift_public" is specified, we will use a SSL (TLS) client certificate on the HTTP request to prove that the request comes from DataSift. Our certificate serial number is currently 4f:04:04:0e:5c:46:72:c8:0a:48:85:28:86:90:cf:7d.
see description
In the UI, this is a file upload. Via the API we require you to send the contents of your public CA certificate, base64 encoded.
This output parameter is required if your HTTP server uses self-signed certificates and you want to set verify_ssl to true. This certificate must not have a passphrase. Otherwise, this parameter is optional.
if output_params.auth.type = basic
The username for authentication.
if output_params.auth.type = basic
The password for authentication.

Data format delivered:

JSON document containing an array of JSON objects, each representing one DataSift interaction. Here's an example of the output.

Storage type:

For each delivery, DataSift sends all the data that is available. It is up to you to configure and manage the HTTP server to handle the storage.


Take care when you set the max_size and delivery_frequency output parameters. If the stream your Push subscription is based on generates data at a faster rate than you permit the delivery, the buffer will fill up until we reach the point where data may be discarded.