Getting Started - Java

What you'll learn: How to record data and get your first analysis results via the API
Duration: 30 minutes

Table Of Contents

warning icon

To complete this guide you'll need an identity configured for your account with a valid Facebook access token. This will have been setup by your account manager or a member of our sales team. Use your account API USERNAME and a valid IDENTITY API KEY to complete this example.

Before You Start

Before you start this guide, if you haven't already done so take a look at our PYLON 101 page to learn the key concepts of the platform.

You work with Pylon by:

  • Recording data into an index
  • Submitting analysis queries to the index to receive analysis results

This guide will help you create a program including one method for starting a recording, and another for performing analysis. When you come to build your application you'll no doubt want to separate the two flows in a similar way.

Installing the Client Library

The Java library is available as a Maven package here.

Maven is a package manager tool and repository, and is supported by popular IDEs - you can read more here.

Create a new Java project in your favourite IDE, and add a pom.xml file.

Add a dependency to your pom.xml file:


Or of course you can choose to use the JAR file directly in your project. The latest JAR file can be downloaded from Maven too.

Recording Data

With your project started and the library now installed, now you can start writing your application.

For this example we'll create a command line application. Create a new class in your Java project called

Firstly you'll need to include the following imports at the top of the file:

import com.datasift.client.*;
import com.datasift.client.pylon.*;
import com.datasift.client.pylon.PylonRecording.PylonRecordingId;

Next use the following code to define the structure of your Main class. Here we're creating one method for kicking-off our data recording, and another for submitting analysis queries.

You'll need to insert your API username and key here!

public class Main {
private static DataSiftClient _datasift;

public static void main(String[] args) {
        DataSiftConfig config = new DataSiftConfig("ACCOUNT_API_USERNAME", "IDENTITY_APIKEY");
        _datasift = new DataSiftClient(config);
        PylonRecordingId recordingId = StartRecording();
private static String StartRecording() {}
private static void Analyze(PylonRecordingId recordingId) {}


Compiling a Filter

To record data you'll need to compile a filter.

Compiling a filter will give you a hash that you will need when setting up a recording.

Add the following code to the StartRecording method.

String csdl = "( fb.content contains_any \"wedding,engaged,engagement,marriage\" " + 
                "OR in \"Wedding,Marriage\" ) " +
                "OR ( fb.parent.content contains_any \"wedding,engaged,engagement,marriage\" " + 
                "OR in \"Wedding,Marriage\" )";
PylonStream stream = _datasift.pylon().compile(csdl).sync();
String hash = stream.hash();
System.out.println("Hash: " + hash);

Creating a Recording

Now that you have a hash for your filter, you can use this to start recording data to your index. Starting the recording will give you an id. When you perform analysis you'll use this id to reference the recording.

Add the following code to the StartRecording method.

PylonRecordingId recordingId = null;
recordingId = _datasift.pylon().start(stream, "Example recording").sync();

System.out.println("Started recording, id: " + recordingId.getId());

return recordingId;

Start Recording

As you're working with live data, you'll need to let the data start filling up the index before we can perform any useful analysis.

Kick off recording by running your application now. You can work on the rest of the application whilst data starts filling your index.

note icon

When I ran this example, it took around 5 minutes for the index to contain enough data. You can check how much data has been recorded by hitting the /pylon/get API endpoint. curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]"[recording id] Or by looking at your recording task on the DataSift dashboard, within the Pylon tab, then Recordings tab.

Analyzing Data

Now you can start writing code to analyze data in your index.

Submitting and Analysis Query

You submit analysis queries using the /pylon/analyze API endpoint.

To do so you need to specify the following parameters:

  • Analysis type - how you want the data aggregated, e.g. time series or frequency distribution
  • Threshold - for frequency distributions, the number of categories to return
  • Target - The data field of the interaction you want to analyze and plot

As a simple example, let's analyze the distribution of authors in the index by their age group.

Add the following code to your Analyze method:

// Build query parameters object
PylonParametersData paramData = new PylonParametersData();
PylonQueryParameters queryParams = new PylonQueryParameters("freqDist", paramData);
// Submit query without a filter
PylonQuery noFilter = new PylonQuery(recordingId, queryParams, null, null, null);
PylonResult resultNoFilter = _datasift.pylon().analyze(noFilter).sync();

System.out.println("No filter: " + resultNoFilter.getResponse().data());

note icon

Analysis Thresholds and Limits It's important you understand analysis thresholds to get the most from Pylon. Thresholds help you work within limits that ensure the privacy of authors. Read more in our in-depth guide - Understand Audience-Size Gating.

Interpreting Your Results

Run your program to get your first analysis results.

You'll see that the result is a JSON object, which you can easily use in your application.

If when you run your program the 'redacted' value is true, then there is not enough data in your index to give you results. You'll need to wait until your index contains more data, or you could try reducing the threshold value.

Using Analysis Filters

The /pylon/analyze endpoint also allows you to specify filters to run against your index, before performing analysis:

  • Filter - specify a CSDL filter to drill into the dataset
  • Start & end - specify a time window

Your current query does not give these parameters, so the query is run against the entire dataset.

Now let's add another query to add a CSDL filter to grab a portion of the dataset, then perform analysis.

Add the following code to your Analyze method:

// Submit query WITH a filter
PylonQuery withFilter = new PylonQuery(recordingId, queryParams, 
" == \"female\" OR == \"female\"", 0, 0);
PylonResult resultWithFilter = _datasift.pylon().analyze(withFilter).sync();
System.out.println("WITH filter: " + resultWithFilter.getResponse().data());

Run your program once more and take a look at the JSON output. Compare the results of the two queries.

Stopping your recording

Excellent you're all done! Before you forget, stop your data recording, otherwise you'll use up some of your recording quota.

Of course, in production solutions you'll likely want to leave your recording running permanently, or for long periods of time to collect data.

The quickest way to do this is to log in to the DataSift Dashboard. Click on the Pylon tab, then the Recordings tab within and click Stop next to your recording.

Of course you can do this using the API too!

curl -H "Auth: [ACCOUNT_API_USERNAME]:[IDENTITY_APIKEY]" -H "Content-type: application/json" -X PUT[recording id]

Next Steps

So now that you've got to grips with the API, how can you learn more?

Why not see how you can build more complex filters and queries, or learn how to add more value to data in your index?

Take a look at our Developer Guide and In-Depth Guides to deepen your knowledge of the platform.

Check out our Code Examples and start building your solution in no time.