Getting Started - Python

This Quickstart Guide

What you'll learn: How to connect and stream live data from the DataSift platform to your machine.
Duration: 10 minutes

Table Of Contents

Before You Start

Before you get started make sure you've completed the Developer Quick Start guide. This will take you through:

  • Activating the Tumblr data source
  • Activating the Links augmentation

You'll need to have done these steps to continue.

Once complete, sign-in to DataSift and make a note of your username and API key on your dashboard. You'll need them in a minute.

Step 1: Install The Client Library

The Python library is available as a package on PyPI here.

You can install the package at the command line:

pip install datasift

Step 2: Create A DataSift Client

With the package installed, now you can write a script to access the API. Firstly we need to create a client object that will access the API for us.

Start a new Python script with the following code (inserting your username and api key):

# Include the DataSift library
import datasift

# Create a client
client = datasift.Client('DATASIFT_USERNAME', 'DATASIFT_API_KEY')

Step 3: Compile A Filter

In order to stream data from the platform, you need to create a filter in CSDL. You compile this filter using the API and receive a hash that represents the filter.

Add the following to your script:

# Declare a filter in CSDL, looking for content mentioning brands
csdl = 'interaction.content contains_any "Calvin Klein, GQ, Adidas"'

# Compile the filter
fltr = client.compile(csdl)

Here we're using the interaction.content property (or target) of the piece of content. This is the text of the Tumblr post in this case.

Step 4: Streaming Data

With your filter compiled you can now start streaming data. This sample code shows how to declare your event handlers and kick off the data stream.

Warning: When you are running a stream you are consuming your platform credit. Your free credit gives you plenty to play with, but always remember to stop your stream if you're not using to make the most of your credit.

Add the following to your script:

# Handler: Message deleted by user
def on_delete(interaction):
    print "You must delete this to be compliant with T&Cs: ", interaction

# Handler: Connection was closed
def on_close(wasClean, code, reason):
    print "Stream subscriber shutting down because ", reason

# Handler: Picks up any error, warning, information messages from the platform
def on_ds_message(msg):
    print( 'DS Message %s' % msg)

# Handler: Connected to DataSift
def on_open():
    print "Connected to DataSift"
    def on_interaction(interaction):
        print "Recieved interaction: ", interaction

# Start streaming

Step 5: Give It A Whirl

With your script now complete, you can run the example and see data pouring into your console.

python [yourscript].py

There's a complete version of the Python script here.

Step 6: Classifying Data

To help you understand, integrate and act upon the data, DataSift allows you to add custom metadata through tags and scores. Read more about DataSift VEDO to find out more.

As a quick example, you could use tags to identify where music is being shared from. Try replacing the csdl variable in your script with the following, and run the script again.

# Declare a filter in CSDL, looking for content mentioning brands
csdl = '''tag.brand "Calvin Klein" { interaction.content contains "Calvin Klein" }
tag.brand "GQ" { interaction.content contains "GQ" }
tag.brand "Adidas" { interaction.content contains "Adidas" }

interaction.content contains_any "Calvin Klein, GQ, Adidas"

You can see the tags assigned to the data, under the interaction.tag_tree property of each item.

Learn More

That's the end of the quick start guide. To learn more about the platform please take a look at the following resources: