pylon/analyze

Analyze a recording and provide results in the form of frequency distribution or time series data.

To learn how to use the endpoints together take a look at our PYLON API step-by-step page.

An HTTP POST request sent to:

https://api.datasift.com/v1.3/pylon/analyze

A successful call to this endpoint returns: 200 OK plus a JSON object.

By writing a nested query, it is possible to drill down into the data, for example analyzing the age breakdown of author genders. Query filters allow analysis queries to be applied to a sub-set of the data in a recording.

Parameters

Parameter Description
id
required

The id of the recording you want to analyze.

parameters
required

This is an object containing one of the following items:

  • analysis_type
  • parameters
  • child

These are described below.

filter
optional

CSDL for a query filter. Take a look at the list of targets you can use in query filters.

It is available for both timeSeries and freqDist.

You do not need to specify a query filter. If you omit this parameter, DataSift defaults to no query filtering, and uses the content of your index in the analysis query.

start
optional

Optional start timestamp for filtering by date. It is available for both timeSeries and freqDist.

Time ranges are treated as inclusive on the start and exclusive on the end.

Default:

  • If you specify a start and no end, the analysis will run from the start point until now.
  • If you omit the start and end parameters, the Analysis Query will resort to a default time period.
    • For a frequency distribution analysis the last 24 hours will be analyzed.
    • For time series analysis the maximum period allowed by the selected interval will be analyzed. For example, if the interval is hours the analysis query defaults to 336 hours (2 weeks).
end
optional

Optional end timestamp for filtering by date. It is available for both timeSeries and freqDist.

Time ranges are treated as inclusive on the start and exclusive on the end.

Default:

  • If you specify a start and no end, the analysis will run from the start point until now.
  • If you omit the start and end parameters, the Analysis Query will resort to a default time period.
    • For a frequency distribution analysis the last 24 hours will be analyzed.
    • For time series analysis the maximum period allowed by the selected interval will be analyzed. For example, if the interval is hours the analysis query defaults to 336 hours (2 weeks).

Limits:

We restrict analysis time ranges for timeSeries queries:

  • minute interval <= 60 minutes
  • hour interval <= 336 hours
  • daily interval <= 32 days
  • weekly interval <= 4 weeks
  • monthly interval <= 1 month
analysis_type
required

This is part of the top-level parameters element. It can be:

  • timeSeries
  • freqDist
parameters
required

Required for timeSeries analyses. This is an object containing:

  • interval (for timeSeries)
  • span (for timeSeries)
  • threshold (for freqDist)
  • target (for freqDist)

These are described individually below.

interval
required for timeSeries

The resolution to break down timeSeries analyses by:

  • month
  • week
  • day
  • hour
  • minute

This parameter is not used for freqDist.

span
optional

How many interval units to span. Only used for timeSeries.

If the interval is "week" and span is 2, the output is grouped into two-week buckets.

A span value greater than 1 can be applied to the intervals week, day, hour, minute but not to month. Remember the index can contain data for 32 days total.

This parameter is not used for freqDist.

threshold
required for freqDist

The maximum number of results to return.

This parameter is not used for timeSeries.

target
required for freqDist

The fb target to analyze.

This parameter is not used for timeSeries.

child
optional

Optional for a frequency distribution analysis. Only required for nested analysis queries. Up to three levels of nesting are permitted (parent and child nested to two levels). This is an object containing:

  • analysis_type
  • parameters

The use of these parameters in a child object is described below.

analysis_type
required for nested analysis

Required as part of a child object definition. The type of analysis in the child query of a nested analysis. The only permitted value is:

  • freqDist
parameters
required for nested analysis

Required as part of a child object definition. The analysis parameters for the child query of a nested analysis. This is an object containing:

  • threshold
  • target

Thresholds are described earlier in this page.

offset
optional

An offset that is automatically applied to the start and end parameters to adjust for your timezone.

The offset is expressed in hours:

Example: Description: 8 Adjust for a timezone that is eight hours ahead of UTC. +8 Adjust for a timezone that is eight hours ahead of UTC. -8 Adjust for a timezone that is eight hours behind UTC.For example, by setting the UTC offset to -8 and passing a time range which matches a 24-hour PST day, you can receive timeseries results at daily intervals and ensure that an author generating interactions which fall on two different UTC days, but the same PST day, are never double counted.

Examples

Sample requests

Frequency distribution

This example shows an analysis query for the three most frequently occurring genders in a recording.

curl -X POST https://api.datasift.com/v1.3/pylon/analyze 
    -d '{"id":"d1b7d73b47c639ea3cc290595bca888ca4388afe","parameters":{"analysis_type":"freqDist","parameters":{"threshold":3,"target":"fb.author.gender"}}}' 
    -H 'Authorization: username:api_key' 
    -H "Content-type: application/json"

Frequency distribution with time ranges

This example shows an analysis query for the three most frequently occurring genders in a recording which occurred in the specified time range.

curl -X POST https://api.datasift.com/v1.3/pylon/analyze 
    -d '{"start": 1435662000, "end": 1435748400, "id":"d1b7d73b47c639ea3cc290595bca888ca4388afe","parameters":{"analysis_type":"freqDist","parameters":{"threshold":3,"target":"fb.author.gender"}}}' 
    -H 'Authorization: username:api_key' 
    -H "Content-type: application/json"

Frequency distribution with query filter

Query Filters allow a subset of the recording to be analyzed. The query filter is written in CSDL. This example shows the age groups in the index, only where the gender is male.

curl -X POST https://api.datasift.com/v1.3/pylon/analyze 
    -d '{"filter":"fb.author.gender == \"male\"","id":"d1b7d73b47c639ea3cc290595bca888ca4388afe","parameters":{"analysis_type":"freqDist","parameters":{"threshold":3,"target":"fb.author.age"}}}' 
    -H 'Authorization: username:api_key' 
    -H "Content-type: application/json"

Frequency distribution with nesting

Nested analysis queries allows each result of a frequency distribution analysis to be broken down by the values of another target. For more information, see the How To page. This example shows the author age groups for each gender.

curl -X POST https://api.datasift.com/v1.3/pylon/analyze 
    -d '{"id": "d1b7d73b47c639ea3cc290595bca888ca4388afe", "parameters": { "analysis_type": "freqDist", "parameters": { "threshold": 3, "target": "fb.author.gender" }, "child": { "analysis_type": "freqDist", "parameters": { "threshold": 2, "target": "fb.author.age" } } } }' 
    -H 'Authorization: username:api_key' 
    -H "Content-type: application/json"

Time series

This example shows an analysis query requesting volumes of interaction and unique authors for every four hours within the start and end times. Query filters can be used with time series analysis however it is not possible to use nesting.

curl -X POST https://api.datasift.com/v1.3/pylon/analyze 
    -d '{"start": 1435662000, "end": 1435748400, "id":"d1b7d73b47c639ea3cc290595bca888ca4388afe","parameters":{"analysis_type": "timeSeries","parameters": {"interval": "hour", "span": 4}}}' 
    -H 'Authorization: username:api_key' 
    -H "Content-type: application/json"

Sample outputs

Let's begin with a note about the top-level interactions and unique_authors fields when they appear at the outermost indentation in the JSON output in the following examples. In the first example they are:

"interactions": 186100,  
 "unique_authors": 153300,

These numbers relate to the entire analysis. They show that your call to /pylon/analyze processed a total of 186,100 interactions from your index, from 153,300 unique authors.

The number of interactions depends on:

  • whether or not you use a query filter
  • the values you choose for the start and end parameters

If you use a query filter these top-level counts are made after the query filter is applied.

If you specify start or end parameters the analysis processes interactions inside that timespan but excludes all other interactions. If you omit the start or end parameters DataSift applies the default values described earlier on this page.

In order to provide analysis results quickly across large indexes of data the platform employs estimation algorithms. As a result, /pylon/analyze returns close approximations of the unique author counts. Due to this approximation technique the sum of unique author counts for each individual result in your analysis is unlikely to be exactly equal to the total unique author count for your query.

Frequency distribution

This output is from an analysis query for the three most frequently occurring genders in a recording.

HTTP/1.1 200 OK

{
    "interactions": 186100,
    "unique_authors": 153300,
    "analysis": {
            "analysis_type": "freqDist",
            "parameters": {
                "target": "fb.author.gender",
                "threshold": 3
            },
            "redacted": false,
            "results": [{
                "interactions": 96200,
                "key": "female",
                "unique_authors": 78400
            }, {
                "interactions": 85400,
                "key": "male",
                "unique_authors": 65900
            }, {
                "interactions": 2300,
                "key": "unknown",
                "unique_authors": 1800
            }]
    }
}

Frequency distribution with time ranges

This output is from an analysis query for the three most frequently occurring genders in a recording which occurred in the specified time range.

HTTP/1.1 200 OK

{
    "interactions": 185200,
    "unique_authors": 152700,
    "analysis": {
            "analysis_type": "freqDist",
            "parameters": {
                "target": "fb.author.gender",
                "threshold": 3
            },
            "redacted": false,
            "results": [{
                "interactions": 95700,
                "key": "female",
                "unique_authors": 77500
            }, {
                "interactions": 85000,
                "key": "male",
                "unique_authors": 65900
            }, {
                "interactions": 2300,
                "key": "unknown",
                "unique_authors": 1800
            }]
    }
}

Frequency distribution with query filter

This output is from an analysis query showing the age groups in the index, only where the gender is male.

HTTP/1.1 200 OK

{
    "interactions": 85500,
    "unique_authors": 65700,
    "analysis": {
            "analysis_type": "freqDist",
            "parameters": {
                "target": "fb.author.age",
                "threshold": 3
            },
            "redacted": false,
            "results": [{
                "interactions": 17800,
                "key": "25-34",
                "unique_authors": 14700
            }, {
                "interactions": 17200,
                "key": "35-44",
                "unique_authors": 14200
            }, {
                "interactions": 17100,
                "key": "45-54",
                "unique_authors": 13500
            }]
    }
}

Frequency distribution with nesting

This output is from a nested analysis query showing the author age groups for each gender.

HTTP/1.1 200 OK

{
    "interactions": 185000,
    "unique_authors": 157900,
    "analysis": {
        "analysis_type": "freqDist",
        "parameters": {
            "target": "fb.author.gender",
            "threshold": 3
        },
        "results": [
            {
                "key": "male",
                "interactions": 97800,
                "unique_authors": 82000,
                "child": {
                    "analysis_type": "freqDist",
                    "parameters": {
                        "target": "fb.author.age",
                        "threshold": 2
                    },
                    "results": [
                        {
                            "key": "45-54",
                            "interactions": 21300,
                            "unique_authors": 16900
                        },
                        {
                            "key": "55-64",
                            "interactions": 19000,
                            "unique_authors": 16000
                        }
                    ],
                    "redacted": false
                }
            },
            {
                "key": "female",
                "interactions": 82600,
                "unique_authors": 69900,
                "child": {
                    "analysis_type": "freqDist",
                    "parameters": {
                        "target": "fb.author.age",
                        "threshold": 2
                    },
                    "results": [
                        {
                            "key": "55-64",
                            "interactions": 18700,
                            "unique_authors": 15600
                        },
                        {
                            "key": "65+",
                            "interactions": 17600,
                            "unique_authors": 14300
                        }
                    ],
                    "redacted": false
                }
            },
            {
                "key": "unknown",
                "interactions": 2000,
                "unique_authors": 1600,
                "child": {
                    "analysis_type": "freqDist",
                    "parameters": {
                        "target": "fb.author.age",
                        "threshold": 2
                    },
                    "results": [
                        {
                            "key": "25-34",
                            "interactions": 400,
                            "unique_authors": 300
                        },
                        {
                            "key": "45-54",
                            "interactions": 300,
                            "unique_authors": 300
                        }
                    ],
                    "redacted": false
                }
            }
        ],
        "redacted": false
    }
}

Time series

This output is from an analysis query requesting volumes of interaction and unique authors for every four hours within the start and end times.

HTTP/1.1 200 OK

{
    "interactions": 185200,
    "unique_authors": 152700,
    "analysis": {
        "analysis_type": "timeSeries",
        "parameters": {
            "interval": "hour",
            "span": 4
        },
        "redacted": false,
        "results": [
            {
                "interactions": 2200,
                "key": 1435651200,
                "unique_authors": 2000
            },
            {
                "interactions": 15000,
                "key": 1435665600,
                "unique_authors": 13500
            },
            {
                "interactions": 44100,
                "key": 1435680000,
                "unique_authors": 36500
            },
            {
                "interactions": 52200,
                "key": 1435694400,
                "unique_authors": 44500
            },
            {
                "interactions": 46300,
                "key": 1435708800,
                "unique_authors": 36200
            },
            {
                "interactions": 18900,
                "key": 1435723200,
                "unique_authors": 16100
            },
            {
                "interactions": 6300,
                "key": 1435737600,
                "unique_authors": 5400
            }
        ]
    }
}

Responses

Response code Description
Status 200 OK
{
    "truncated":<boolean>,
    "interations":<integer>,
    "unique_authors":<integer>,
    "results": [
        ...
    ]
}
Status 400 Bad Request
{
   "error":"<error message>",
   "original_error":"<if caused by an underlying Exception, that Exception's message>"
}
Status 404 Not Found The id could not be found.

Notes

  1. All calls to the API must be properly authenticated with a DataSift username and API key.
  2. All calls to the API must be versioned. The current version is v1.3.
  3. The Rate Limit Cost for this endpoint is 25. However, this cost is not taken from your regular allowance of credits. Instead it is taken from a special allowance described under the API rate limit for /pylon/analyze section on our platform allowances page. Your exact rate limit for this endpoint depends on your package.
  4. Note that the username is the one you use to log in to app.datasift.com and to make calls to any of the REST endpoints but the API key is the one that was returned by your call to the  POST /account/identity endpoint. That is, it is an identity-based API key.
  5. Take a look at our Changelog page to review the changes we've made to the DataSift API over time.

Revision history

v1.3

Up to v1.2 this endpoint required the hash of the CSDL interaction filter you wanted to analyze.

From v1.3 it takes the id of the recording you want to analyze. The id is supplied in the JSON returned by your call to the /pylon/start endpoint.

Resource information

Rate limit cost: 25

Requires authentication: Yes

Response formats: JSON, JSONP