Redaction and Quantization

When you make queries to a PYLON index, you receive aggregated results rather than raw data. This approach gives you data that is very easy to work with but also ensures the privacy of the individual authors who created the content.

Even with aggregated results, you might think that you could segment the results into smaller and smaller audiences.

To guarantee privacy we apply the following rules:

  • REDACTION - PYLON applies an audience-size gate to all results so that the overall audience being analyzed must represent at least 1,000 unique authors. A result set that collectively represents fewer than 1,000 unique authors cannot be returned. In addition each individual data point must represent at least 100 unique authors, if it does not it will be redacted and removed from the analysis result.

  • QUANTIZATION - PYLON rounds all returned author counts down to the nearest 100.

note icon


The response from the pylon/analyze endpoint contains the 'redacted' property. This is set to true when the result has been redacted because less than 1000 unique authors are represented. No analysis results will be returned.

If the analysis result is not redacted the 'redacted' property will be set to false. In this case analysis all data points that represent more than 100 unique authors will be returned. If your analysis is not redacted but some data points are not returned then these will have been redacted. In fact, in some circumstances your analysis might not be redacted yet you receive no results. In this case your entire analysis represents over 1000 unique authors, yet none of the data points within represent at least 100 unique authors.

Example 1: frequency distribution analysis

Consider a frequency distribution analysis that uses fb.language as an analysis target to generate a frequency distribution of stories about DataSift in different languages:

English 611 unique authors
French 512 unique authors
German 95 unique authors redacted

This analysis includes an overall audience of 1,218 unique authors. Because this count is higher than 1,000, any individual result representing at least 100 unique authors can be shown.

We round down these author counts to the nearest 100, and redact any individual result which is less than 100. That means you will receive results from this analysis, but those results will exclude German entirely because it did not have at least 100 unique authors. The returned result set will show 600 unique authors posting in English and 500 unique authors posting in French.

Consider a second example:

English 411 unique authors redacted
French 312 unique authors redacted
German 95 unique authors redacted

The overall audience of 818 unique authors is less than 1,000, so the entire response is redacted. You will receive no data.

Example 2: time series analysis

Again, you must have at least 1,000 authors represented in the entire response. For example, consider a time series analysis in which the underlying data looks like this:

Hour 1: 611 unique authors
Hour 2: 50 unique authors redacted
Hour 3: 512 unique authors

The overall unique author count is 1,173. Because this count is higher than 1,000, you will receive results from this analysis, but those results will exclude hour 2 entirely because it did not have at least 100 unique authors. The returned result will show 600 authors posting in hour 1 and 500 author posting in hour 3.

Consider another time series analysis in which the underlying data looks like this:

Hour 1: 611 unique authors redacted
Hour 2: 50 unique authors redacted
Hour 3: 312 unique authors redacted

The overall unique author count is 973, so the entire response is redacted. You will receive no data.

Example 3: complete redaction

This can occur for both timeSeries and freqDist but we'll use timeSeries here to illustrate what happens.

If you have more than 1,000 authors represented in the entire response but none of the hours has more than 100 unique authors, every hour is redacted:

Hour 1: 99 unique authors redacted
Hour 2: 99 unique authors redacted
Hour 3: 99 unique authors redacted
Hour 4: 99 unique authors redacted
Hour 5: 99 unique authors redacted
Hour 6: 99 unique authors redacted
Hour 7: 99 unique authors redacted
Hour 8: 99 unique authors redacted
Hour 9: 99 unique authors redacted
Hour 10: 99 unique authors redacted
Hour 11: 99 unique authors redacted

You will receive no data at all.

What does unique mean?

An author who contributes interactions to multiple values in a result set is represented as a unique author for each value.

Using the three-hour time series as an example, an author who posts in the first hour and third hour is counted as a unique author for hour one and hour three. An author does not have to be unique across the whole time series.

Sub-minute time ranges

Setting a time range of less than 1 minute results in an error.

Rounded interaction counts

Responses from the /pylon/analyze and /pylon/get endpoints round down the interaction volumes to the nearest 100.

For more information, take a look at our In-Depth Guide.