Understanding Limits

In this guide we'll take a look at the limits you are subject to when building your PYLON for LinkedIn Engagement Insights solution. We'll also look at how you can monitor these limits so you can keep your application running in a production environment.

Limits are documented in the relevant locations across this site (for example the platform allowances page). However, this guide pulls all limits across each aspect of the platform together in one place.

With PYLON for LinkedIn Engagement Insights you are subject to the following categories of limits:

  • Platform limits - these limits apply to all customers and are independent of your package
  • Account limits - these limits are applied based on the package you have purchased

Platform limits

Regardless of your account package you will always need to keep in mind the following platform limits. The majority of platform limits are hard limits you need to aware of when designing your solution.

Recorded data expiry

All analysis tasks are performed on a pre-recorded index, containing the data from the last 30 days. Daily at 00:00 UTC, data which is over 30 days old will expire from the recording

Query filter complexity

Filters for analysis queries are limited to 30 conditions, each with a maximum of 100 arguments.

If you attempt to submit a query to the POST /pylon/{service}/task endpoint which exceeds these limits you'll receive an error response.

Frequency distribution results limit

A maximum of 200 elements can be returned in a report.

Nested frequency distribution results limits

Maximum nesting depth

The maximum depth of nesting is three levels - one parent and two children.

For each level of the analysis, the maximum number of results that can be returned is 200. The number of results to return for each level is specified using the threshold parameter.

Nested threshold product

Additionally, the overall 'threshold product' for a nested query is limited to 80,000, except for queries that analyze user skills for which the limit is 1000.

You can calculate the threshold product of a query by multiplying the threshold parameters together.

For example, the following nesting of targets and thresholds exceeds the overall threshold limit, because multiplying the thresholds together (200 x 27 x 18) gives 97,200:

  • li.user.member.country (threshold = 200)
    • li.user.member.functions (threshold = 27)
      • li.user.member.employer_industry_sectors (threshold = 18)

You can reduce the threshold values to stay within the limit:

  • li.user.member.country (threshold = 200)
    • li.user.member.functions (threshold = 27)
      • li.user.member.employer_industry_sectors (threshold = 14)

Here the threshold product is 75,600 and so the query is allowed by the platform.

Identifying nested analysis targets

Any analysis target may be used as the parent target, but only a subset of low cardinality targets (fewer than 50 unique values) can be used as child targets. The PYLON Target Explorer tool lists all targets, the Properties section includes information about where a target may be used.

In the example, the li.user.company.industry_sector target may be used in Analysis and Child Analysis Queries, and also in Query Filters:

Time series results limit

Interval Limits

There is a different limit to the duration of an Analysis Query for each interval.

  • Minute Interval
    • An analysis with a minute interval cannot cover a period greater than 60 minutes. The period can cross boundaries from one day to another.
  • Hour Interval
    • An hourly analysis cannot cover a period greater than 336 hours (2 weeks). The period can cross boundaries from one day to another.
  • Day Interval
    • A day analysis cannot cover a period greater than 32 days.
  • Week Interval
    • A weekly analysis cannot cover a period greater than 4 weeks.
  • Month Interval
    • A month interval cannot cover a period greater than 1 month.

Account limits

As a DataSift customer your account has a PYLON for LinkedIn Engagement Insights package assigned. In this section we'll look at the limits your package applies. Account limits can be monitored via the API, here we'll show you how.

API rate limit

Your account is subject to an API rate limit which is based on a number of credits you can spend in an hour.

Each call to the API has an associated cost in credits. The cost of each PYLON API call is listed on the platform allowances page.

You can monitor your API usage by inspecting returned headers from each API request. For each request you make to the API you will receive the following headers in the response:

  • X-RateLimit-Limit - Your account's assigned rate limit (in credits)
  • X-RateLimit-Remaining - Your current remaining credits
  • X-RateLimit-Cost - The cost of the call you just made
  • X-RateLimit-Reset-Ttl - The number of seconds until your rate limit resets

Analysis task rate limit

As well as a general API allowance your account package also determines how many tasks you can submit to the POST /pylon/{service}/task endpoint.

The number of tasks you can submit, and rate at which the tasks are processed is determined by the number of 'slots' included in your account package. A typical account is allocated one slot.

For each 'slot':

  • you can queue up to 1,000 analysis tasks at any time
  • tasks are processed at a best effort rate of 160 tasks per hour
  • if you have tasks on your queue then they will be processed in turn until the queue is empty (at the best effort rate)

Tasks are processed in the order that they are submitted.

It's important that you plan the tasks you submit so that you make the most of your allowance. Your usage is most likely to be split between:

  • Your own exploration of data - Naturally you'll want to explore the available data from time to time, but this is unlikely to use up many of your queries.
  • Repeated sets of queries to populate dashboards, data stores and baselines - Here you can design your query sets to fit within your limits.
  • Ad-hoc queries made by end customers (if you choose to provide this) - This is more difficult to predict as it depends on the feature you provide to your users. If you have enough headroom you can go ahead and provide live exploration to your users. On the other hand you may want to cache results or only allow users to explore results you've previously stored.

However you choose to use your tasks, we recommend that you cache analysis results in your data store to give you the most possible queries to work with.

The following headers are returned when you post a new analysis task:

  • X-Tasks-Queued - The number of tasks currently in your queue
  • X-Tasks-Queue-Limit - The maximum number of tasks you can queue at any time

By monitoring the number of tasks on your queue you can check the rate at which tasks are processed by the platform, and the space you have remaining on your queue for new queries.