Getting Started - Python

What you'll learn: How to record data and get your first analysis results via the API
Duration: 30 minutes

Table Of Contents

warning icon

To complete this guide you'll need an identity configured for your account with a valid Facebook access token. This will have been setup by your account manager or a member of our sales team. Use your account API USERNAME and a valid IDENTITY API KEY to complete this example.

Before You Start

Before you start this guide, if you haven't already done so take a look at our PYLON 101 page to learn the key concepts of the platform.

You work with Pylon by:

  • Recording data into an index
  • Submitting analysis queries to the index to receive analysis results

This guide will help you create one program (or script) to kick off the recording, and another to submit analysis queries. When you come to build your application you'll no doubt want to separate the two flows in a similar way.

Installing the Client Library

The Python library is available as a package on PyPI here.

You can install the package at the command line:

pip install datasift

Recording Data

With your library now installed, now you can start writing your script to record data. Firstly you need to create a client object that will access the API for you.

# import required library
from datasift import Client

# create client

Compiling a Filter

Next, you'll need to compile a filter.

Compiling a filter will give you a hash that you will need when setting up a recording.

# compile a filter to receive a hash
csdl = '''( fb.content contains_any "wedding,engaged,engagement,marriage" 
          or in "Wedding,Marriage" ) 
        OR ( fb.parent.content contains_any "wedding,engaged,engagement,marriage" 
          or in "Wedding,Marriage" )'''
compiled = datasift.pylon.compile(csdl)
print (compiled)

Creating a Recording

Now that you have a hash for your filter, you can use this to start recording data to your index. Starting the recording will give you an id. When you perform analysis you'll use this id to reference the recording.

# start recording
start = datasift.pylon.start(compiled['hash'], 'Pylon Test Filter')
print('Recording started, ID: ' + start['id'])

Start Recording

As you're working with live data, you'll need to let the data start filling up the index before we can perform any useful analysis.

You can run your controller program now to start the recording of data, whilst you work on the analysis program.


note icon

When I ran this example, it took around 5 minutes for the index to contain enough data. You can check how much data has been recorded by hitting the /pylon/get API endpoint. curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]"[recording id] Or by looking at your recording task on the DataSift dashboard, within the Pylon tab, then Recordings tab.

Analyzing Data

Now you can start writing your program to analyze data in your index. Firstly create a new script, creating a DataSift client as before.

Note that we need to tell the program which recording to analyze. We'll take this in at the command line for now, it was output by your first program.

# include the datasift libraries
import sys
from datasift import Client

# check id argument is given
if not len(sys.argv) > 1:
    print('Usage: python [recording id]')

recording_id = sys.argv[1]

# create a client

Submitting and Analysis Query

You submit analysis queries using the /pylon/analyze API endpoint.

To do so you need to specify the following parameters:

  • Analysis type - how you want the data aggregated, e.g. time series or frequency distribution
  • Threshold - for frequency distributions, the number of categories to return
  • Target - The data field of the interaction you want to analyze and plot

As a simple example, let's analyze the distribution of authors in the index by their age group:

# analysis query without a filter
analyze_parameters = {'analysis_type': 'freqDist', 'parameters': {'threshold': 5, 'target': ''}}
print (datasift.pylon.analyze(recording_id, analyze_parameters))

note icon

Analysis Thresholds and Limits It's important you understand analysis thresholds to get the most from PYLON. Thresholds help you work within limits that ensure the privacy of authors. Read more in our in-depth guide - Understanding Audience-Size Gating.

Interpreting Your Results

Run your program to get your first analysis results.

python [recording id]

You'll see that the result is a JSON object, which you can easily use in your application.

If when you run your program the 'redacted' value is true, then there is not enough data in your index to give you results. You'll need to wait until your index contains more data, or you could try reducing the threshold value.

Using Analysis Filters

The /pylon/analyze endpoint also allows you to specify filters to run against your index, before performing analysis:

  • Filter - specify a CSDL filter to drill into the dataset
  • Start & end - specify a time window

Your current query does not give these parameters, so the query is run against the entire dataset.

Now let's update your query to add a CSDL filter to grab a portion of the dataset, then perform analysis.

Replace your for submitting an analysis query with the following:

# analysis query with a filter
analyze_parameters = {'analysis_type': 'freqDist', 'parameters': {'threshold': 5, 'target': ''}}
analyze_filter = ' == "female" OR == "female"'
print (datasift.pylon.analyze(recording_id, analyze_parameters, analyze_filter))

Run your program once more and take a look at the JSON output. The results will change because of the filter you've applied.

Stopping your recording

Excellent you're all done! Before you forget, stop your data recording, otherwise you'll use up some of your recording quota.

Of course, in production solutions you'll likely want to leave your recording running permanently, or for long periods of time to collect data.

The quickest way to do this is to log in to the DataSift Dashboard. Click on the Pylon tab, then the Recordings tab within and click Stop next to your recording.

Of course you can do this using the API too!

curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]" -H "Content-type: application/json" -X PUT[recording id]

Next Steps

So now that you've got to grips with the API, how can you learn more?

Why not see how you can build more complex filters and queries, or learn how to add more value to data in your index?

Take a look at our Developer Guide and In-Depth Guides to deepen your knowledge of the platform.

Check out our Code Examples and start building your solution in no time.