The Streaming API offers a high-throughput way to receive curated data from DataSift. With this API you can capture information in real time. It's perfect if you're working with a continuous stream that has no defined end, and you can ensure that you capture all the messages in a stream without missing any.

In the simplest case, you open a connection between your client and DataSift's streaming API, then send one or more requests for data from a stream, and finally close the connection when you're done.

You can access the streaming API using HTTP or websockets. In this section, we discuss:

Developers are strongly encouraged to read all of the documentation linked to by this document thoroughly.

Authentication

Please consult our API authentication page for details on how to authorize API calls.

 

How it Works

Before you use the Streaming API you must write the CSDL code for a stream and compile it using DataSift's GUI or the api.datasift.com/compile endpoint in the REST API.

Opening a Connection

The first thing you do when you use the Streaming API is to open a connection between your client and DataSift. That connection will remain open until your client closes it, until a server-side error occurs, or if your client cannot keep up with the rate that DataSift deliveres the stream.

You do not have to request any data over that stream immediately. When no data is coming through, DataSift returns a series of ticks to keep the conenction open:

    {"tick":1311694603,"status":"connected","message":"Waiting for data"}

    {"tick":1311694604,"status":"connected","message":"Waiting for data"}

Once a connection is open, you can set a stream running whenever you want. 

For test purposes, you might prefer to create and compile your first stream using DataSift's GUI but the GUI imposes a limit of 1,000 stream definitions. If you go over that limit, you'll need to do some housekeeping and delete old streams. The /compile endpoint in the REST API has no limit to the number of streams you can define.

The response body contains a list of JSON objects separated by new lines. Each JSON object represents one item curated by the stream.

Your client needs to be able to handle these keep-alive ticks.

 

Endpoints

For HTTP streaming, the endpoint is:

    https://stream.datasift.com/

For Websocket streaming, the endpoint is:

    wss://stream.datasift.com/

 

Client Libraries

Our client libraries support the Streaming API, via HTTP streaming, websockets, or both.

 

Performance

To cope with occasions when data throughput is high, your client software must be able to receive data quickly enough otherwise it may be disconnected.

For example, the first time you connect to a new Managed Source, your client will typically receive a burst of data covering the past seven days that have accumulated at the source. After this burst, DataSift slows down to real-time data delivery.

 

Success Messages

When we introduced the Historics streaming feature in DataSift, we added success messages that confirm your API calls were successful.

We added these to the live Streaming API too, but we switch them off by default, to make sure that we don't break your existing code. You can easily enable the messages, on a stream-by-stream basis, like this:

    https://stream.datasift.com/?statuses=true

 

Limits

There are some platform limits to keep in mind:

  • Max concurrent connections per user: 200
  • Max subscriptions per connection: 200
  • Max connection rate per user: 1000/minute

For that last bullet, we do not count failed connections, only successful connections.

 

IP Access Control

By default, you can access DataSift's streaming API (api.datasift.com) from any IP address. However, in our UI, you can restrict access to specific IP addresses. You might choose to do this as a security measure. If you attempt to access the streaming API from an unauthorized IP address you will receive an error such as:

    IP Address 123.123.123.123 not authorized to access this account

 

Reconnecting

Certain events may require clients to reconnect to an HTTP stream. Make sure you read the reconnection rules.