Making Use of Super Public Data

Super Public data gives you access to the raw text of posts that users have chosen to make public. This sample content is a great way to improve your filters and classifiers.

What is a Super Public post?

A Super Public post is a story that is:

  • Posted by someone who has “Who can see your future posts?” set to “Public” under their Privacy Settings
  • Posted by someone who has the Follow setting enabled, allowing non-friends to see their stories
  • Is not posted to someone else’s Timeline

Essentially these are stories that an author has chosen to share publicly.

If you have access to the Super Public feature any posts that match your running Interaction Filters will be cached alongside your usual PYLON recordings.

How can they help me?

Super Public posts unlike other stories give you access to the raw text content of the post. Here's an example post:

    "fb": {
        "media_type": "photo",
        "content": "I love how the rear seats fold flat in the BMW X5",
        "language": "en",
        "hashtags": [
        "topics": [
                "name": "BMW",
                "id": 565634324
        "topic_ids": [
    "interaction": {
        "subtype": "story",
        "content": "I love how the rear seats fold flat in the BMW X5"
    "tag_tree": {
        "automotive": {
            "bmw": {
                "X_Series": [
    "links": {
        "code": [
        "domain": [
        "normalized_url": [
        "url": [

Note that demographic details or sentiment values are not available for Super Public posts. However content, topics, tags or scores (if you've added these with classification rules) and links targets are available.

The posts are very useful for two purposes:

  • Validation - You can use these posts to check the validity of your Interaction Filter
  • Machine learning - You can use these posts to train machine learning models and build classifiers to run within your Interaction Filters

note icon

Note that because demographic details and sentiment details are not available, if your interaction filter requires these targets you may recieve no super public samples. For example if your filter reads:

fb.content contains_any "BMW, Honda, Ford" AND == "United States"

Then you will recieve no super public posts because your filter mandates the author must be in the US, yet super public posts do not have a value for the target. You can work around this limitation by modifying your country condition. As all non-public interactions do have countries and no super public posts have countries you can modify your filter as follows:

fb.content contains_any "BMW, Honda, Ford" 
AND ( in "United States" or (not exists))

This will record only interactions from US authors into your index, but will give you super public posts from all countries to inspect.

How do I access Super Public posts?

You can access these posts using the /pylon/sample endpoint, specifying the recording you'd like the related Super Public posts for.

All Super Public posts that match your Interaction Filter will be stored, however you can only retreive a maximum of 100 posts an hour through the endpoint.

Although you can only retrieve 2,400 super public posts per day from the API you can use query filters and time ranges to find the most relevant posts.

Platform limits for Super Public data

To make the most of Super Public data, it's important to understand the following limits.

  • For each of your recordings the platform caches up to 1 million super public posts per day.
    • This limit is separate to the 1 million interactions / per day limit set on recordings.
    • The volume of super public posts is around 5% of the volume of topic data stories we receive from Facebook. So in practice, if your recording hits the 1 million interaction limit far fewer super public posts will have been cached, so these will continue to be cached until the separate limit is hit.
  • Super Public posts are expired in accordance with the same 32-day retention period as Topic Data.
  • Identity-level recording limits are also applied to super public post caching. If the identity running the recording is limited to 100,000 then this limit will also apply to the related cache of super public posts.

Validating your Interaction Filters

The first use case for Super Public posts is to validate your Interaction Filters.

For instance if you're filtering on certain topics or content keywords, you can check that the Super Public posts match these criteria. This will show that the right data is being recorded into your index.

To improve your Interaction Filter you might:

  • Find false positive terms to add to your filters

  • Expand the lists of words and phrases you use for filtering to reflect the way people really talk
  • Collect steady stream of over time to understand how conversations change and new terms arise

Equally you can also validate that your classification rules are working as expected. The Super Public posts will also be run through your tagging and scoring rules, just like non-public posts.

Read our design pattern on validating filters using super public data >

Building machine learned classifiers

Machine learned classifiers give you a powerful way of adding your own value to data.

The first step to build a machine learned classifier is to acquire a training set of data. You can regularly request data from the /pylon/sample endpoint to build up a training set.