Getting Started - Ruby

What you'll learn: How to record data and get your first analysis results via the API
Duration: 30 minutes

Table Of Contents

warning icon

To complete this guide you'll need an identity configured for your account with a valid Facebook access token. This will have been setup by your account manager or a member of our sales team. Use your account API USERNAME and a valid IDENTITY API KEY to complete this example.

Before You Start

Before you start this guide, if you haven't already done so take a look at our PYLON 101 page to learn the key concepts of the platform.

You work with Pylon by:

  • Recording data into an index
  • Submitting analysis queries to the index to receive analysis results

This guide will help you create one program (or script) to kick off the recording, and another to submit analysis queries. When you come to build your application you'll no doubt want to separate the two flows in a similar way.

Installing the Client Library

The Ruby library is available on RubyGems. Note you need to be running Ruby 2.0.0 or above.

You can install the package at the command line:

gem install datasift

Or, by including this line in your Gemfile, and running bundle install.

gem 'datasift'

Recording Data

With your library now installed, now you can start writing your script to record data. Firstly you need to create a client object that will access the API for you.

# include the datasift library
require 'datasift'  

# create a client
config = {:username => 'ACCOUNT_API_USERNAME', :api_key => 'IDENTITY_APIKEY', :enable_ssl => true }
@datasift =

Compiling a Filter

Next, you'll need to compile a filter.

Compiling a filter will give you a hash that you can use when setting up a recording.

# compile a filter to receive a hash
csdl = '( fb.content contains_any "wedding,engaged,engagement,marriage" 
          or in "Wedding,Marriage" ) 
        OR ( fb.parent.content contains_any "wedding,engaged,engagement,marriage" 
          or in "Wedding,Marriage" )'
compiled = @datasift.pylon.compile csdl
hash = compiled[:data][:hash]

puts "Filter hash: #{hash}"

Creating a Recording

Now that you have a hash for your filter, you can use this to start recording data to your index. Starting the recording will give you an id. When you perform analysis you'll use this id to reference the recording.

# start recording
recording = @datasift.pylon.start(hash, 'Pylon Test Filter')
puts "Recording started, ID: #{recording[:data][:id]}"

Start Recording

As you're working with live data, you'll need to let the data start filling up the index before we can perform any useful analysis.

You can run your controller program now to start the recording of data, whilst you work on the analysis program.

ruby start-recording.rb

note icon

When I ran this example, it took around 5 minutes for the index to contain enough data. You can check how much data has been recorded by hitting the /pylon/get API endpoint. curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]"[recording id] Or by looking at your recording task on the DataSift dashboard, within the Pylon tab, then Recordings tab.

Analyzing Data

Now you can start writing your program to analyze data in your index. Firstly create a new script, creating a DataSift client as before.

Note that we need to tell the program which recording to analyze. We'll take this in at the command line for now, it was output by your first program.

# include the datasift library
require 'datasift'  

# check id argument is given
if not len(sys.argv) > 1:
    print('Usage: python [recording id]')

recording_id = sys.argv[1]

# create a client
config = {:username => 'ACCOUNT_API_USERNAME', :api_key => 'IDENTITY_APIKEY', :enable_ssl => true }
@datasift =

Submitting and Analysis Query

You submit analysis queries using the /pylon/analyze API endpoint.

To do so you need to specify the following parameters:

  • Analysis type - how you want the data aggregated, e.g. time series or frequency distribution
  • Threshold - for frequency distributions, the number of categories to return
  • Target - The data field of the interaction you want to analyze and plot

As a simple example, let's analyze the distribution of authors in the index by their age group:

# analysis query without a filter
params = {
:analysis_type => "freqDist",
:parameters => {
      :threshold => 5,
      :target => ""

puts @datasift.pylon.analyze('', params, '', nil, nil, recording_id)

note icon

Analysis Thresholds and Limits It's important you understand analysis thresholds to get the most from Pylon. Thresholds help you work within limits that ensure the privacy of authors. Read more in our in-depth guide - Understanding Audience-Size Gating.

Interpreting Your Results

Run your program to get your first analysis results.

ruby analyze.rb [recording id]

You'll see that the result is a JSON object, which you can easily use in your application.

If when you run your program the 'redacted' value is true, then there is not enough data in your index to give you results. You'll need to wait until your index contains more data, or you could try reducing the threshold value.

Using Analysis Filters

The /pylon/analyze endpoint also allows you to specify filters to run against your index, before performing analysis:

  • Filter - specify a CSDL filter to drill into the dataset
  • Start & end - specify a time window

Your current query does not give these parameters, so the query is run against the entire dataset.

Now let's update your query to add a CSDL filter to grab a portion of the dataset, then perform analysis.

Replace your for submitting an analysis query with the following:

# analysis query with a filter
params = {
:analysis_type => "freqDist",
:parameters => {
      :threshold => 5,
      :target => ""

# filter to just females posting or engaging
filter = ' == "female" OR == "female"'
puts @datasift.pylon.analyze('', params, filter, nil, nil, recording_id)

Run your program once more and take a look at the JSON output. The results will change because of the filter you've applied.

Stopping your recording

Excellent you're all done! Before you forget, stop your data recording, otherwise you'll use up some of your recording quota.

Of course, in production solutions you'll likely want to leave your recording running permanently, or for long periods of time to collect data.

The quickest way to do this is to log in to the DataSift Dashboard. Click on the Pylon tab, then the Recordings tab within and click Stop next to your recording.

Of course you can do this using the API too!

curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]" -H "Content-type: application/json" -X PUT[recording id]

Next Steps

So now that you've got to grips with the API, how can you learn more?

Why not see how you can build more complex filters and queries, or learn how to add more value to data in your index?

Take a look at our Developer Guide and In-Depth Guides to deepen your knowledge of the platform.

Check out our Code Examples and start building your solution in no time.