Streaming API
Introduction
The Streaming API offers a high-throughput way to receive curated data from DataSift. With this API you can capture information in real time. It's perfect if you're working with a continuous stream that has no defined end, and you can ensure that you capture all the messages in a stream without missing any.
In the simplest case, you open a connection between your client and DataSift's streaming API, then send one or more requests for data from a stream, and finally close the connection when you're done.
You can access the streaming API using HTTP or websockets. In this section, we discuss:
- HTTP access which uses the stream.datasift.com endpoint.
- websockets access which uses the websocket.datasift.com endpoint.
- multiple streams, which can use either endpoint and allow you to request data from several streams along just one connection.
Developers are strongly encouraged to read all of the documentation linked to by this document thoroughly.
Authentication
Please consult our API authentication page for details on how to authorize API calls.
How it Works
Before you use the Streaming API you must write the CSDL code for a stream and compile it using DataSift's GUI or the api.datasift.com/compile endpoint in the REST API.
Opening a Connection
The first thing you do when you use the Streaming API is to open a connection between your client and DataSift. That connection will remain open until your client closes it, until a server-side error occurs, or if your client cannot keep up with the rate that DataSift deliveres the stream.
You do not have to request any data over that stream immediately. When no data is coming through, DataSift returns a series of ticks to keep the conenction open:
{"tick":1311694603,"status":"connected","message":"Waiting for data"}
{"tick":1311694604,"status":"connected","message":"Waiting for data"}
Once a connection is open, you can set a stream running whenever you want.
For test purposes, you might prefer to create and compile your first stream using DataSift's GUI but the GUI imposes a limit of 250 stream definitions. If you go over that limit, you'll need to do some housekeeping and delete old streams. The /compile endpoint in the REST API has no limit to the number of streams you can define.
The response body contains a list of JSON objects separated by new lines. Each JSON object represents one item curated by the stream.
Your client needs to be able to handle these keep-alive ticks.
Endpoints
For HTTP streaming, the endpoint is:
http://stream.datasift.com/
For Websocket streaming, the endpoint is:
ws://websocket.datasift.com/
Success Messages
When we introduced the Historics streaming feature in DataSift, we added success messages that confirm your API calls were successful.
We added these to the live Streaming API too, but we switch them off by default, to make sure that we don't break your existing code. You can easily enable the messages, on a stream-by-stream basis, like this:
http://stream.datasift.com/?statuses=true
Reconnecting
Certain events may require clients to reconnect to an HTTP stream. Make sure you read the reconnection rules.
Sample Output Format
Here's some sample output to show the format:
