Getting Started - Ruby

This Quickstart Guide

What you'll learn: How to connect and stream live data from the DataSift platform to your machine.
Duration: 10 minutes

Table Of Contents

Before You Start

Before you get started make sure you've completed the Developer Quick Start guide. This will take you through:

  • Activating the Tumblr data source
  • Activating the Links augmentation

You'll need to have done these steps to continue.

Once complete, sign-in to DataSift and make a note of your username and API key on your dashboard. You'll need them in a minute.


Step1: Install The Client Library

The Ruby library is available on RubyGems. Note you need to be running Ruby 2.0.0 or above.

You can install the package at the command line:

gem install datasift

Or, by including this line in your Gemfile, and running bundle install.

gem 'datasift'

Step2: Create A DataSift Client

With the package installed, now you can write a script to access the API. Firstly we need to create a client object that will access the API for us.

Start a new Ruby script with the following code (inserting your username and api key):

# include the datasift library
require 'datasift'  

# configuration options
config = {:username => 'YOUR_USERNAME', :api_key => 'YOUR_API_KEY', :enable_ssl => true}

# create a client
@datasift =

Step 3: Compile A Filter

In order to stream data from the platform, you need to create a filter in CSDL. You compile this filter using the API and receive a hash that represents the filter.

Add the following to your script:

# Declare a filter in CSDL, looking for content mentioning brands
csdl = 'interaction.content contains_any "Calvin Klein, GQ, Adidas"'

# Compile the filter
filter = @datasift.compile csdl

Here we're using the interaction.content property (or target) of the piece of content. This is the text of the Tumblr post in this case.

Step 4: Streaming Data

With your filter compiled you can now start streaming data. This sample code shows how to declare your event handlers and kick off the data stream.

Warning: When you are running a stream you are consuming your platform credit. Your free credit gives you plenty to play with, but always remember to stop your stream if you're not using to make the most of your credit.

Add the following to your script:

# Handler: Message (i.e. a new Tumblr post) is received
on_message = lambda { |message, stream, hash| puts "Received interaction: #{message}" }

# Handler: Message deleted by user
on_delete = lambda { |stream, m| puts 'You must delete this to be compliant with T&Cs ==> ' + m }

# Handler: An error occurred
on_error = lambda { |stream, e| puts "An error has occurred: #{e.message}" }

# Handler: Connected to DataSift
on_connect = lambda do |stream|
  puts 'Connected to DataSift'
stream.subscribe(filter[:data][:hash], on_message)

# Create stream object, and start streaming data
conn = DataSift::new_stream(config, on_delete, on_error, on_connect)

Step 5: Give It A Whirl

With your script now complete, you can run the example and see data pouring into your console.

ruby [yourscript].rb

There's a complete version of the Ruby script here.

Step 6: Classifying Data

To help you understand, integrate and act upon the data, DataSift allows you to add custom metadata through tags and scores. Read more about DataSift VEDO to find out more.

As a quick example, you could use tags to identify where music is being shared from. Try replacing the csdl variable in your script with the following, and run the script again.

# Declare a filter in CSDL, looking for content mentioning brands
csdl = 'tag.brand "Calvin Klein" { interaction.content contains "Calvin Klein" }
tag.brand "GQ" { interaction.content contains "GQ" }
tag.brand "Adidas" { interaction.content contains "Adidas" }

interaction.content contains_any "Calvin Klein, GQ, Adidas"

You can see the tags assigned to the data, under the interaction.tag_tree property of each item.

Learn More

That's the end of the quick start guide. To learn more about the platform please take a look at the following resources: