Getting Started - Python

What you'll learn: How to submit an analysis task to the API and retrieve analysis results
Duration: 20 minutes

Table of contents

warning icon

To complete this guide you'll need an identity configured for your account with a valid LinkedIn access token. This will have been setup by your account manager or a member of our sales team. Use your account API USERNAME and a valid IDENTITY API KEY to complete this example.

Before you start

Before you start this guide, if you haven't already done so take a look at our PYLON 101 page to learn the key concepts of the platform.

You work with PYLON for LinkedIn Engagement Insights by submitting analysis tasks to a pre-recorded index, waiting for the analysis to complete, and then retrieving the analysis results.

This guide will show how you can write a Python script to submit analysis queries and retrieve analysis results.

Installing the client library

The Python library is available as a package on PyPI.

You can install the package via the command line:

pip install datasift

Analyzing data

Firstly create a new script, creating a DataSift client, and specifying the id of the index to analyze:

# include the datasift libraries
import sys
from datasift import Client

# create a client
datasift = Client("ACCOUNT_API_USERNAME", "IDENTITY_API_KEY")

index_id = "INDEX_ID"

Submitting an analysis task

You submit analysis queries using the POST /pylon/{service}/task API endpoint.

As a simple example, let's analyze the frequency distribution of the top five articles being shared and engaged with on LinkedIn.

To submit a frequency distribution analysis task you need to specify the following parameters:

  • Analysis type - how you want the data aggregated, in this case 'freqDist'
  • Threshold - the number of categories to return
  • Target - the data field of the interaction you want to analyze

The following code sets the parameters and submits the analysis task to PYLON:

# analysis query without a filter
task_parameters = {
    'parameters': {
        'analysis_type': 'freqDist',
        'parameters': {
            'threshold': 5,
            'target': 'li.all.articles.title'
        }
    }
}

print('Create an analysis task')
task = datasift.pylon.task.create(
    subscription_id=index_id,
    name='You can name your analysis tasks',
    parameters=task_parameters,
    service='linkedin'
)

print(task)

note icon

Analysis Thresholds and Limits - It's important you understand analysis thresholds to get the most from PYLON. Thresholds help you work within limits which respect the privacy of authors. Read more in our in-depth guide - Understanding Limits.

Retrieving your results

When you run the code above an analysis task will be submitted to PYLON. The API will return a new task id:

{u'id': u'ee35db10ad1539a34200772f9090aa98ef25ff4c'}

This task id is used to retrieve the results of the analysis.

Depending on the complexity of the analysis you have requested, in can take a few seconds to complete the analysis, therefore analysis results are not immediately ready to be retrieved. When you retrieve the task from the API you need to inspect the task's status to see if you need to wait for longer for analysis to complete, or if the analysis results are available.

If the task status is completed then analysis has work is complete and the results of the analysis are included in the task you retrieve. If the task status is queued or running then analysis has is not completed and you need to wait for analysis to complete.

# use the Task ID to retrieve the analysis task results
import time

while True:
    results = datasift.pylon.task.get(
        task['id'],
        service='linkedin'
    )
    if results['status'] == 'completed':
        print('Task complete:')
        print(results['result'])
        break
    else:
        # implement a short backoff before trying again
        print('Task still running. Waiting...')
        time.sleep(2)

Running your analysis

Run your program to get your first analysis results.

python analyze.py

You'll see that the result is a JSON object, which you can easily use in your application.

Running your code will analyze the top five article titles, and will give a result similar to the following:

{
  u'analysis': {
    u'analysis_type': u'freqDist',
    u'redacted': False,
    u'results': [
      {
        u'unique_authors': 60300,
        u'key': u"'Is this LinkedIn appropriate?'",
        u'interactions': 114300
      },
      {
        u'unique_authors': 77000,
        u'key': u'The Coming Tech Backlash',
        u'interactions': 91000
      },
      {
        u'unique_authors': 49700,
        u'key': u'Really - Always leave the office on time.',
        u'interactions': 90900
      },
      {
        u'unique_authors': 40100,
        u'key': u'Tired of wasting time in meetings? Try this',
        u'interactions': 73900
      },
      {
        u'unique_authors': 57700,
        u'key': u"Amazon reportedly bidding for bankrupt retailer American Apparel; 'Routine' jobs are disappearing, and more news",
        u'interactions': 65100
      }
    ],
    u'parameters': {
      u'threshold': 5,
      u'target': u'li.all.articles.title'
    }
  },
  u'unique_authors': 8253400,
  u'interactions': 28436500
}

Using analysis filters

The POST /pylon/{service}/task endpoint also allows you to specify filters to run against your index, before performing analysis:

  • Filter - specify a CSDL filter to drill into the dataset
  • Start & end - specify a time window

Your current query does not give these parameters, so the query analyzes all interactions recorded in the last 24 hours.

Now let's update your query to add a CSDL filter to grab a portion of the dataset, then perform analysis.

Try replacing your task_parameters object with the following to add a longer date range (5 days), and an analysis query filter. In this example, we're looking to determine the top industry sectors of those engaging with articles written by individuals in the medical industry, who are not in the medical industry themselves.

import datetime

query_filter = """
    li.root.user.member.employer_industry_sectors IN "medical" AND
    NOT li.user.member.employer_industry_sectors IN "medical"
"""

# analysis query with a filter
task_parameters = {
    'filter': query_filter,
    'start': (datetime.date.today() - datetime.timedelta(5)).strftime("%s"),
    'end': datetime.date.today().strftime("%s"),
    'parameters': {
        'analysis_type': 'freqDist',
        'parameters': {
            'threshold': 5,
            'target': 'li.user.member.employer_industry_sector_names'
        }
    }
}

Run your program once more and take a look at the JSON output. Changing items such as the analysis target, timeframe or query filter will change your analysis task results.

note icon

If when you run your program the value of the redacted field is "true", then your filter may be too restrictive to give you results. You'll need to try making your query filter a little more inclusive.

Read our in-depth guide on redaction and quantization to learn more about analysis query limits.

Next steps

Why not see how you can build more complex analysis query filters, or learn how to add more value to data in your index?

Take a look at our Developer Guide and In-Depth Guides to deepen your knowledge of the platform.