Understanding Audience-Size Gating

This guide explains audience-size gating which is applied to analysis results when using PYLON.

Read this guide to learn why and how this limit is applied, and how to maximise your insights working within the size gate.

Table of Contents

What is Audience-Size Gating?

When you make queries to a PYLON index, you receive aggregated results, rather than the raw data. By giving you aggregated results, we make the data easy to work with, but also we can ensure the privacy of the authors who created the content.

Even with aggregated results, you could use analysis queries to further-and-further segment to very small audiences.

For this reason the following rules are applied to guarantee privacy for authors:

  • AUDIENCE-SIZE GATE - PYLON applies an audience-size gate to all analysis results. All analysis results must represents at least 1000 unique authors, otherwise the analysis result will not be returned. In addition each individual data point within an analysis must represent at least 100 unique authors otherwise it will be omitted from results.

  • QUANTIZATION - PYLON rounds all returned interaction and user counts down to the nearest 100.

When you're creating queries it's easy to hit the audience-size limit, this is natural when you explore your recorded data set. If your query would result in fewer than 1000 unique authors being returned, then you will receive no results - your results have been redacted. You'll need to broaden your query so that the 1000 unique author limit is no longer hit.

Identifying Redacted Results

Redacted results are clearly indicated both when using the dashboard or API.

When using the dashboard you might see redacted results for an individual chart:

redacted-chart

When using the API redacted results are indicated by the redacted property in the returned analysis result:

{
  "interactions": 0,
  "unique_authors": 0,
  "analysis": {
    "analysis_type": "freqDist",
    "parameters": {
      "target": "fb.topics.category",
      "threshold": 15
    },
    "results": [],
    "redacted": true
  }
}

How is it Applied?

The audience-size gate is a simple concept, but it's not immediately obvious how it will impact your analysis queries.

All Analysis Queries

Regardless of the type of your query to receive any results from the API the query result must represent at least 1000 unique authors.

For a Time Series - at least 1000 unique authors must be included across the time span of your query.

For a Frequency Distribution - at least 1000 unique authors must be included in total across all of the categories in the result.

Time Series

Only time intervals that represent at least 100 unique authors will be returned in your result.

As long as your overall analysis represents at least 1000 unique authors all intervals that represent at least 100 unique authors within the time period will be returned. Those intervals that do not represent 100 unique authors will not be given.

When exploring your data as a time series you start broad, then use the API parameters to dig deeper:

  • start, end - segment data to a time period
  • filter - segment the data using CSDL
  • interval & span - increase the resolution of results

If you dig too deep you'll see redacted results. This can happen if you:

  • Segment the data into too small a subset or time period - Your filter or time period reduces the number of interactions to analyze so that there is not enough data to represent 1000 authors.

  • Have little data recorded in your index - Your original interaction filter recorded only a small data set to your index. You could broaden your interaction filter or wait for more data to be recorded.

Frequency Distributions

Only categories that represent at least 100 unique authors will be returned.

As long as your overall analysis represents at least 1000 unique authors all categories that represent at least 100 unique authors within the time period will be returned. Those categories that do not represent 100 unique authors will not be given.

When exploring your data through frequency distributions you start broad, then use the API parameters to dig deeper:

  • start, end - segment data to a time period
  • filter - segment the data using CSDL
  • threshold - the number of categories to return

If you dig too deep you'll see redacted results. This can happen if you:

  • Segment the data into too small a subset or time period - Your filter or time period reduces the number of interactions to analyze so that there is not enough data for the frequency distribution.

  • Have little data recorded in your index - Your original interaction filter recorded only a small data set to your index.

Getting Maximum Results

When you hit the audience-size limit your results are redacted, which can prove frustrating. It can take a little practice with the various options at your disposal to dig into your data and get detailed results.

Let's look at each option in turn to see how you can use it to your advantage.

Time Series

Default Parameters

The best place to start is with the system defaults for a time series query. If you leave these parameters empty they will default to as follows:

  • start, end - the last 24 hours
  • filter - no filter (all interactions in the time period)

The interval and span parameters are required. A good place to start is 1 hour intervals (interval = hour, span = 1).

So this initial query gives you intervals of 1 hour over the last 24 hours from the time you submit the query.

If you get redacted results with these default parameters check that you are recording a good number of interactions to your index to be analyzed. If you are sure this is the case, then you can use the start and end parameters to analyze your entire recording.

Filter, Start & End Parameters

Next segment your data, for instance choosing:

  • A time period using the start and end parameters
  • A demographic group using the filter parameter
  • A tag you added in classification

There is no 'best' order in how to increasingly segment your data set as this will depend largely on the data you've chosen to record.

However, when you first explore your data it's best to add conditions one at a time. For example first specify your time period, then add one condition to your filter, then add remaining conditions until you've dug as deep as you can.

If you segment to too small a subset, then you may receive redacted results. At this point you could decide to broaden your subset again by removing a condition, or you can adjust your interval and span parameters to ask for less granular results.

Interval & Span Parameter

The smaller the intervals you specify the more likely you are to hit the audience-size gate limit.

For instance if you have 2,000 unique authors in your data set for one day, requesting data in hourly intervals is likely to give you individual hours with less than 100 authors, so many of your intervals will not return data.

The span parameter is very useful for giving you the most granular results possible.

The best way to approach intervals is to pick your desired interval limit and make a query. If many of the intervals are not returned with data, increasing the span parameter could help you receive more data.

For instance if you choose 'hour' for your interval then make a query which returns few results, increasing the span to 3 (so specifying your interval size as 3 hours) may return more results as the intervals are now larger.

Frequency Distribution

Default Parameters

The best place to start is with the system defaults for a time series query:

  • start, end - the last 24 hours
  • filter - no filter (all interactions in your index)

So by default you will be analyzing the last 24 hours of your data set.

There is no default threshold value.

Filter, Start & End Parameters

As with time series queries, you'll next want to segment your data set using the start, end and filter parameters.

Again when you first explore your data it's best to add conditions one at a time. For instance selecting your time period, then adding conditions to the filter one at a time.

If you segment to too small a subset, then you may receive redacted results. At this point you could decide to broaden your subset again by removing a condition or increasing your time period.

Threshold Parameter

The threshold parameter states how many categories you would like in your results. For instance if you specify 6 and are analyzing links being shared, you are requesting the 6 most shared links in your data set.

Only categories with at least 100 unique authors represented will be returned. So if your threshold value is 20 but only 10 links have been shared across your entire data set by 100 authors, you will only recieve 10 results.

tip icon


These tips help you build your initial analysis queries. Once you have your analysis running in production you might find that previously successful queries become redacted. This may happen because the volume of data being recorded by your Interaction Filter has dropped or the data being recorded has simply changed as real world conversation has altered. It's important that you monitor your Interaction Filters once they are live - see Recording Data for more details.

Selecting Time Periods

The time period you choose is critical to maximising the results you retrieve from your index.

With frequency distribution analysis the time period is simply the period which will be used for counting the results. It's important to remember that if you specify no start or end parameter only the last 24 hours will be analyzed. So in most situations aside from exploration you will want to specify these parameters explicitly.

When you start working with time series analysis your results can easily be redacted if you don't specify a time period. Read out in-depth guide on Calculating Time Spans to learn more.