Things Every Developer Should Know
Getting a DataSift login and API key
NOTE: your API key and DataSift username are both case sensitive.
Before you start to work with DataSift's API:
Request a DataSift login:
- Visit www.datasift.com.
- Click Register.
- Enter your details.
- Click Create.
Find your API key:
- Log in to DataSift.
- Go to the Dashboard or to the Settings page.
- Click the Copy to Clipboard icon under Developer API Key.
Note that DataSift does not display the API key until you have purchased credits. If you have no credits yet, you cannot make an API call.
Choosing an API
DataSift offers two families of APIs:
| Core REST API for non-real-time access: | Streaming API for real-time access: |
| api.datasift.com/validate | stream.datasift.com |
| api.datasift.com/compile | websocket.datasift.com |
| api.datasift.com/dpu | |
| api.datasift.com/stream | |
| api.datasift.com/usage |
Pros and Cons
The REST API
| Good For: | Not Good For: |
| Compiling CSDL code. | Applications that process real-time data. |
|
Requesting a snapshot of a stream or a segment of a stream. Taking a sample of data. Used mainly for testing purposes. |
Capturing every object from a stream. Working with high-throughput streams. |
| Sending a request that returns a "set" response such as the cost of a stream.. |
The Streaming API
The streaming API requires HTTP or Websockets.
| Good For: | Not Good For: |
| Capturing real-time information. | Making requests that return a result; for example: to request the cost of a stream. |
| Capturing a continuous stream which has no defined end. | Coding in a language such as Javascript, that needs to see the end of a request before allowing access to any data. |
| Capturing all the messages in a stream, making sure that none are missed. | Coding in a language that does not have support for HTTP streaming. |
| Coding in a language that does not have websockets support. | |
| Compiling CSDL code. |
Choosing a client library
Before you select a programming language, decide how you want to call DataSift's API:
- Directly, using any language of your choice
- Indirectly, using one of the client libraries we supply
Client libraries are available from our GitHub account in two forms:
- Source code
- Compiled DLLs
Our set of client libraries is constantly expanding.
Choosing an authentication technique
We have an entire page devoted to API authentication.
Choosing a return structure
The Streaming API
The Streaming API returns information in JSON (JavaScript Object Notation) format.
The REST API
The REST API offers a choice of formats:
| Format: | Description: |
| JSON | The default format. |
| JSONP | Wraps JSON with a user-specified "callback" function. |
By default, calls to the REST endpoints receive JSON objects in response. For example:
api.datasift.com/compile
You can specify a return format like this:
api.datasift.com/compile.jsonp
api.datasift.com/compile.json
The return message includes:
- Standard HTTP status codes to reflect the result of the operation. For example: 200 indicates succes.
- Error messages in the body of the object
Error messages are always returned in the format you requested. For example, if you requested a JSON object, error messages are passed in JSON objects.
If you choose the JSONP format, it is not possible to receive HTTP status codes because they would prevent execution of the callback function.
Formatting parameters
Parameters need to be formatted in UTF-8.
GET and POST
All the REST API endpoints accept both GET and POST requests.
Requests to the /validate and /compile endpoints might include long strings, depending on the length of the CSDL you are sending. In such situations, you might find POST to be the better option.
For the Streaming API you should use GET requests only.
Why do we use JSON as our object format?
We selected JSON as our format because:
- JSON provides excellent language-independent representation of data structures
- JSON has a simple specification
- JSON handles mixed content
Here are some valuable JSON resources:
Formatting parameters
Parameters need to be formatted in UTF-8.
Twitter delete messages
When a Twitter user deletes a Tweet, we receive a notification in the form of a JSON object.
DataSift sends these notifications to you in your streams. Under the terms and conditions of your Twitter license, you must delete any such Tweets that you might have stored. The JSON format of the delete message is documented in our Twitter Deletes page.
Twitter status messages
A Twitter User Status Message is a message forwarded on from Twitter to you through your DataSift stream, alerting you of a change to the status of a Twitter user's account. We pass these notifications on to you as part of your stream when you are filtering for Tweets from these users. If you are storing Tweets, you must take account of these changes in order to comply with Twitter's Terms of Service.
You can check the meaning of the messages and read our Twitter User Status Message FAQ.
Filtering data versus consuming data
DataSift is all about data, filtering for what you want to receive and then consuming it, perhaps via our Streaming API or perhaps via Push.
It's important to understand that the data you filter against is not identical to the data you receive. For instance, there are elements in the output data such as a "created_at" date, which you can consume but you cannot include in filtering.
The easiest way to understand is this:
- Targets are something that you can filter on with CSDL
- Output data is something that you receive from DataSift
Take a look at Targets vs Output Data to learn more.
