Getting Started with DataSift

Shruti Desai | 8th January 2013

DataSift offers organizations a cloud-based platform to filter for real-time social media data. Every second, social media sites generate massive amounts of data. This data can provide valuable insight to your organization. DataSift filters for content as it is posted. For instance, you could filter for the mention of an individual, a message posted on a social media site, or all messages posted within a specified location. DataSift offers you an integrated solution that filters, aggregates, and delivers the exact content that you need. This blog post aims to help you understand the various features that the DataSift platform offers.

With DataSift, you can filter for content in real time. This is achieved with the help of DataSift's own programming language, the Curated Stream Definition Language (CSDL). You use CSDL to write simple pieces of code that filter for the content you need. The code for a single filter contains a target, an operator, and an argument. A target specifies the data source from which the content will be filtered. The argument specifies what you are trying to filter for. The operator defines how a target will filter against an argument. Once you save and run the code, it is then referred to as a data stream. The data stream filters for the content you want and delivers the output data in JSON (JavaScript Object Notation) format which is lightweight and easy to read. You can store this output data in DataSift or use your own data destination. You can create a recording and export the output data received from your streams. You can also go back in time and filter for content in the past by creating a Historics query for a data stream.

Now that you are familiar with how DataSift works, let's look at the DataSift UI and learn how to get started.

The Dashboard

The DataSift platform is easy to navigate and the first step is to create a DataSift account. You can register with your email address or you can use your Twitter, Facebook, LinkedIn, Google, Foursquare. or Yahoo account.


After signing up with DataSift, log in to your DataSift account to access your Dashboard. The Dashboard is the control panel for your account. You can manage your account from here and access many of DataSift's features. The Dashboard displays your API details which are required for authentication when you use the DataSift API.


You can also access Settings from the Dashboard, where you can manage your account settings, such as account details, billing details, data licenses, identities and password.


The Dashboard provides six tabs that navigate you to the different features that make up DataSift. Let's look at these features in brief.


You can create new streams or access existing streams by clicking on the Streams tab. You can create streams in the CSDL language using the Visual Query Builder or by writing CSDL code manually using the Code Editor.


Visual Query Builder

You don't have to be a developer to create filters for social media data streams. The Visual Query Builder allows you to construct filters for complex social media data streams without using the CSDL programming language. Simply choose a data source such as Twitter, then the relevant target field from a list of available target fields and, lastly, select or enter an argument describing what you want to filter.


You can customize the Visual Query Builder to allow users to build queries for a limited set of targets. It can also be integrated to match your organization's graphical identity scheme.

CSDL Code Editor

More advanced users such as developers, prefer to work directly in our CSDL Code Editor. To create a stream in CSDL in the Code Editor, simply enter the CSDL commands that define the content you want to filter for. When you click Save & Close, the editor validates your code and notifies you if it finds an error.


Once you have created a stream, you can:

  • preview the output data from your stream.
  • consume the stream via the API.
  • record the stream and export the output data.
  • create a Historics query for your stream.
  • share the CSDL code of your stream.


Once you have created your first streams, you can perform tasks on them. To access or monitor these tasks, click the Tasks tab. All your existing tasks are displayed on this page. You can also delete your tasks or export data from your tasks. You can perform two main tasks on your streams:

  • Create a recording of your stream by clicking the Start a Recording button.
  • Create a Historics query of your stream by clicking the New Historics query button.

Data sources

You can use the DataSift platform to filter for content from a range of data sources such as Twitter, Facebook, and Amazon. The Data Sources tab displays all the websites from where we acquire data for your streams. Our sources include a range of blogs, boards, media sharing websites as well as some of the most widely used social media sites. However, keep in mind that you must activate and sign a license for the data source if you want to receive their data in your stream output.


Data destinations

DataSift also offers you options to export your output data to a range of data destinations such as FTP, HTTP, SFTP, Amazon S3, Amazon DynamoDB, ElasticSearch, Splunk Storm, and so on. You can view or access these by clicking the Data Destinations tab. You can add or edit settings for individual destinations from here. You must also ensure that they are correctly configured and set up with their own unique settings, including authentication details. DataSift allows you to test the connection from the platform to your data destinations.



You can monitor your usage statistics and the costs of streams that are currently running, from the Billings tab. You can also view the total costs, usage, data volume, connected hours, and historic hours from last seven days.



DataSift offers state-of-the-art technology to filter real-time data relevant to your organization. DataSift offers this service through a feature-packed user interface that is intuitive and easy to use. The DataSift GUI can be used by non-developers as well as advanced users. You can create streams to filter for content, recordings of the streams, export output data from the streams, and create Historics queries to retrieve data from the past. You can also view the data sources through which we run your streams to filter for content. To export the output data from your streams to an external data storage, you can configure your own data destination. Any activities you perform through our UI or the API are logged in your usage statistics. You can also view your billing details and DPU usage.

To try out and preview the DataSift platform, sign up today for a free trial.

Previous post: How Best to Filter for Twitter @Mentions

Next post: Gerrit Schultz - Internship at DataSift