Client Libraries: Basics

Updated on Friday, 5 April, 2013 - 10:47

We've rewritten this page but the earlier version is still available if you need it.

 

Are you trying to set up Push or Historics? You're in the right place. Read this page and then go to Client Library: Push and Historics.

 

The best way to understand how our client libraries work is to take a high-level view to understand the general concepts and then dip down into your language of choice. All the client libraries follow the same basic principals that we describe here. Class and method names may vary slightly from library to library. This document is meant as a general overview of the concepts shared across the different language bindings.

If you need code examples right away:

 

Client Library Details (Beta)

We're gradually building our low-level documentation for each client library. This section is still in beta but you're very welcome to look:

 

Basic objects

User

All interaction with the API libraries begins with an instance of the User class. Most other objects you will create while using the API require you to pass a User object, to which you have given your DataSift login and an API key.

All other objects that use the DataSift API require a User object for authentication. The User class provides methods for most ways you might want to create those other objects.

Definition

A Definition object represents a stream definition. These are roughly equivalent to the streams as you see them on the DataSift web interface, except that you cannot store them in your DataSift account for later retrieval.

Historic

A Historic object represents a Historics query. These are also roughly equivalent to the queries you can create via the DataSift web interface.

StreamConsumer

This object creates and manages connections to the DataSift streaming server. The default connection type used is HTTP, but some of the libraries also support WebSockets. This object is roughly equivalent to the code that runs in the browser when you're receiving data from a stream in the DataSift API.

When you create a StreamConsumer you supply a way to receive events from the stream. Events include connection, interactions, errors, status messages and more.

Exceptions

All of the libraries throw exceptions (or their language's equivalent) when errors occur. It's important that you catch and handle all possible errors for every call you make.

 

Putting it all together

This section is in pseudocode but you can translate it almost directly into your language of choice.

We start by creating a User object.

 

We now have a choice depending on what we actually want to do. We can create a StreamConsumer directly from the User object if we have a stream hash that we want to consume. For the sake of example we're going to say that we don't have the hash, we have some CSDL that we want to use. To do that we need to create a Definition object to represent that CSDL.

 

The first thing we should do is validate that the CSDL we've supplied is valid. We do this using the validate() method on the Definition object. As mentioned above, if the validate() method encounters an error (for example, if compiling the CSDL fails) it will throw an exception, so we make sure to catch that. However, we must also make sure we catch other errors that may occur. The example below uses a generic catch block to handle things like authentication or connectivity problems, but in your production code you should always catch specific exceptions.

 

Exception handling will be omitted from the remainder of this document, but please make sure you are correctly handling all exceptions that may be thrown by the library in your code, otherwise your program may terminate without warning while you're sleeping, and you'll miss out on some of the lovely data you want to consume.

Now that we know our CSDL compiles properly we can move on to creating a context for consuming data via a streaming connection. We start by creating our event handler.

 

For most libraries there is a reference class (or interface) which your event handler must implement. This defines the methods that must exist and the parameters they take. The most notable exception to this is the Ruby library which currently uses blocks rather than an event handler class.

The first method is onConnect() which gets called when a connection is successfully established with the DataSift streaming server.

 

The opposite of this event is getting disconnected. There's an event for that, too.

 

Let's take a moment here to look at the parameter being passed to these two events. All events will get the StreamConsumer object which is raising the event as the first parameter. This enables the handlers to make calls on the StreamConsumer.

The data coming down the streaming connection consists of a mixture of interaction objects, status messages, warnings and errors. We have events that handle each of these.

Status messages trigger the onStatus() method.

 

Status messages can contain additional information and this will be passed in the info HashMap. For example, a status of type "progress" for Historic queries will contain the percentage complete in info['progress'].

Errors and warnings trigger the following methods.

 

The data being received (interactions) trigger one of two methods: onInteraction and onDeleted. Essentially the data passed for a deletion notification is in the same format as normal interactions but only contains the data required for you to identify the interaction that has been deleted so you can delete it from your own storage systems.

 

Note that properly handling delete notification is required for you to remain compliant with some of the licenses you have signed.

Interactions trigger the onInteraction() method.

 

And that completes the EventHandler class.

 

Now that we have an event handler ready to receive events we can get a StreamConsumer from our Definition object.

 

The first parameter is the type of consumer we want. Most of the libraries only support HTTP streaming at this time, but some also support WebSockets. The second parameter is an instance of our EventHandler class.

We can now start to consume data. In most of the libraries this call will not return, so if your program needs to do other things while connected to the stream you'll need to wrap your usage of the API library in a thread.

 

The library will now compile the definition if necessary, connect to the streaming server, and start consuming data.

 

 

MultiStream support

The above discussion focused on consuming a single definition. Most of the libraries support consuming multiple definitions through the same stream connection. When doing that the event handler methods will get passed the hash of the stream which matched the interaction in addition to the other parameters.

Notes

  • Make sure you have a DataSift login and an API key.
  • The libraries will throw exceptions when something goes wrong, so make sure you're catching and properly handling them.
  • For most libraries, once you start the StreamConsumer it will not return until the stream gets disconnected.
  • You can control the StreamConsumer from any of the event handler methods using the StreamConsumer object they are passed.

 

Community-built libraries

We're happy to see that the DataSift development community has already started to add to the set of libraries we provided at launch.

Other code examples