Redaction and Quantization Rules

This guide explains the redaction and quantization rules which are applied by PYLON so that the privacy of LinkedIn members is respected.

Read this guide to learn why and how limits are applied, and how to maximise your insights working within the limits.

Table of Contents

What is Redaction and Quantization?

When you make queries to a PYLON index you receive aggregated results rather than the raw data. By giving you aggregated results we both make the data easy to work with, but also respect the privacy of the authors who created the content.

That said even with aggregated results you could use analysis queries to further-and-further segment to very small audiences, for this reason the following rules are applied for the privacy for authors:

  • REDACTION - All analysis results must represents at least 100 unique authors otherwise no analysis results will be returned. In addition each individual data point within an analysis must represent at least 100 unique authors otherwise it will be omitted from results.

  • QUANTIZATION - PYLON rounds all returned interaction and user counts down to the nearest 100.

When you're creating queries it's easy to hit the redaction limits, this is natural when you explore the data. If your query would result in fewer than 100 unique authors being returned, then you will receive no results - your results have been redacted. You'll need to broaden your query so that the 100 unique author limit is no longer hit.

Identifying Redacted Results

When using the API redacted results are indicated by the redacted property in the returned analysis result:

{
  "interactions": 0,
  "unique_authors": 0,
  "analysis": {
    "analysis_type": "freqDist",
    "parameters": {
      "target": "fb.topics.category",
      "threshold": 15
    },
    "results": [],
    "redacted": true
  }
}

How is it Applied?

Redaction is a simple concept, but it's not immediately obvious how it will impact your analysis queries.

All Analysis Queries

Regardless of the type of your query to receive any results from the API the query result must represent at least 100 unique authors.

Time Series

Only time intervals that represent at least 100 unique authors will be returned in your result.

When exploring your data as a time series you start broad, then use the API parameters to dig deeper:

  • start, end - segment data to a time period
  • filter - segment the data using CSDL
  • interval & span - increase the resolution of results

If you dig too deep you'll see redacted results. If this happens your filter or time period has reduced the number of interactions to analyze so that there is not enough data to represent 100 authors.

Frequency Distributions

Only categories that represent at least 100 unique authors will be returned.

When exploring your data through frequency distributions you start broad, then use the API parameters to dig deeper:

  • start, end - segment data to a time period
  • filter - segment the data using CSDL
  • threshold - the number of categories to return

If you dig too deep you'll see redacted results. If this happens your filter or time period has reduced the number of interactions to analyze so that there is not enough data for the frequency distribution.

Getting Maximum Results

When you hit the redaction limit your results are redacted, which can prove frustrating. It can take a little practice with the various options at your disposal to dig into your data and get detailed results.

Let's look at each option in turn to see how you can use it to your advantage.

Time Series

Default Parameters

The best place to start is with the system defaults for a time series query. If you leave these parameters empty they will default to as follows:

  • start, end - the last 24 hours
  • filter - no filter (all interactions in the time period)

The interval and span parameters are required. A good place to start is 1 hour intervals (interval = hour, span = 1).

So this initial query gives you intervals of 1 hour over the last 24 hours from the time you submit the query.

Filter, Start & End Parameters

Next segment your data, for instance choosing:

  • A time period using the start and end parameters
  • A demographic group using the filter parameter

There is no 'best' order in how to increasingly segment your data set as this will depend largely on the data you've chosen to record.

However, when you first explore your data it's best to add conditions one at a time. For example first specify your time period, then add one condition to your filter, then add remaining conditions until you've dug as deep as you can.

If you segment to too small a subset, then you may receive redacted results. At this point you could decide to broaden your subset again by removing a condition, or you can adjust your interval and span parameters to ask for less granular results.

Interval & Span Parameter

The smaller the intervals you specify the more likely you are to hit the audience-size gate limit.

For instance if you have 2,000 unique authors in your data set for one day, requesting data in hourly intervals is likely to give you individual hours with less than 100 authors, so many of your intervals will not return data.

The span parameter is very useful for giving you the most granular results possible.

The best way to approach intervals is to pick your desired interval limit and make a query. If many of the intervals are not returned with data, increasing the span parameter could help you receive more data.

For instance if you choose 'hour' for your interval then make a query which returns few results, increasing the span to 3 (so specifying your interval size as 3 hours) may return more results as the intervals are now larger.

Frequency Distribution

Default Parameters

The best place to start is with the system defaults for a time series query:

  • start, end - the last 24 hours
  • filter - no filter (all interactions in your index)

So by default you will be analyzing the last 24 hours of your data set.

There is no default threshold value.

Filter, Start & End Parameters

As with time series queries, you'll next want to segment your data set using the start, end and filter parameters.

Again when you first explore your data it's best to add conditions one at a time. For instance selecting your time period, then adding conditions to the filter one at a time.

If you segment to too small a subset, then you may receive redacted results. At this point you could decide to broaden your subset again by removing a condition or increasing your time period.

Threshold Parameter

The threshold parameter states how many categories you would like in your results. For instance if you specify 6 and are analyzing links being shared, you are requesting the 6 most shared links in your data set.

Only categories with at least 100 unique authors represented will be returned. So if your threshold value is 20 but only 10 links have been shared across your entire data set by 100 authors, you will only recieve 10 results.

Selecting Time Periods

The time period you choose is critical to maximising the results you retrieve from your index.

With frequency distribution analysis the time period is simply the period which will be used for counting the results. It's important to remember that if you specify no start or end parameter only the last 24 hours will be analyzed. So in most situations aside from exploration you will want to specify these parameters explicitly.

When you start working with time series analysis your results can easily be redacted if you don't specify a time period. Read out in-depth guide on Calculating Time Spans to learn more.