What is DataSift PYLON for Facebook Topic Data?

PYLON lets you for the first time build applications that analyze Facebook topic data.

DataSift technology sits within the Facebook firewall. We receive an incoming stream of data from Facebook of users posting stories and engaging with these stories.

Using our platform you create a filter that records data you'd like to analyze into a private index. You then submit analysis queries to your index and build the aggregated results into your application.

Using classification rules you can add additional value to data before it is recorded into the index. For example you might create a classifier that identifies people discussing your products. You can then use this additional metadata in your analysis queries and build rich analysis results.

What is Facebook topic data?

When someone adds a post to their wall or they like, comment on or share another user's post these interactions form the basis of Facebook topic data.

PYLON gives you access to these interactions for analysis using a privacy-safe model that lets you build analysis results into your application.

Facebook topic data covers two types of interaction. The first is stories which represent a user adding a post to their timeline. The second is engagements which represent a user engaging with a story by liking it, commenting on it or resharing it with their friends.

Every interaction includes demographic (age, gender & location) details of the author, topics being discussed, links that have been shared and sentiment of the post. These details are made available through targets and can be used both when recording interactions and when performing analysis queries.

Topics are entities and concepts that have been extracted from story content. For example if a story mentions a well-known car brand such as "BMW" the story will flagged as having the topic "BMW" in the category "cars". Topics are valuable as they can be used when recording interactions and performing analysis and save you trying to build your own list of keywords.

note icon

Learn more about the data available from the Facebook topic data source:

Guaranteed user privacy

PYLON guarantees the privacy of Facebook users through a Facebook approved privacy model. PYLON ensures the privacy of users as:

  • Social data never leaves Facebook
  • User identity is removed before processing
  • Results are provided in anonymous, aggregated form subject to audience size-gate limits
  • Data is only retained for 30 days
  • Data from minors is not available

You can safely build applications that analyze Facebook topic data knowing that the privacy model protects users.


To record data into an index you create an interaction filter which selects the data you want to select from the Facebook data feed.

Selecting the data you need from the huge volume of real-time activity happening on Facebook sounds difficult, but it's actually very simple using CSDL. CSDL is DataSift's own language for processing semi-structured streaming data.

With CSDL you have access to a wide range of operators, including advanced text operators. You can both write simple powerful filters and filters of limitless complexity.

You write your filters to operate against targets, which are the data fields of each interaction. For example you can filter to only interactions from a certain country or look for interactions that mention your chosen topics. You can combine many conditions to build very precise filters.

Interactions that match your filter are recorded into your index for analysis.

note icon

Learn more about filtering and recording data in PYLON:


Using classification you can add your own unique value to your recorded data and also give yourself many more options in your analysis.

Classification rules are written in CSDL and are part of your Interaction Filter. You can use classification to:

  • Tag interactions to key features - For example you could tag your customer's brands and products
  • Run machine-learned classifiers - For example you could train a machine-learned classifier to classify an author's intent to buy a product

You can also make use our off-the-shelf classifiers to add extra value to your data.

note icon

Learn more about how you can classify data and analyze the results with PYLON:


When you run an Interaction Filter the data that matches the filter is recorded into an index. You submit queries to the index to perform analysis.

Data is stored for 30 days in your index, although you can store the results of your analysis queries permanently.

In PYLON you can create and run multiple Interaction Filters and therefore record data into multiple indexes. For example you might want to create an index for each of your customers, or create an index for each market you are analyzing.

note icon

Learn more about how to use indexes to serve your customers:


Once data is recorded into your index you use analysis queries to analyze the data.

Currently PYLON supports two type of analysis results - time series and frequency distributions. There are a wide range of targets you can use for your analysis covering demographics, topics, classification and sentiment.

When you submit a query you can specify a time window and also a filter (written in CSDL) to only analyze a subset of your index. For example you can specify only female authors who live in California to analyze a very specific demographic. Again there are a wide range of targets you can use.

PYLON also supports nested analysis queries. With nested queries you can perform multiple-level analysis in one query that would otherwise take many analysis requests.

By using query filters, nested queries and combining multiple query results you can build rich analysis results.

note icon

Learn more about analysis queries and advanced analysis options:

Super Public text samples

Super Public text samples are posts that Facebook users have chosen to share publicly. PYLON gives you access to these posts when they match your interaction filter.

Super Public posts are useful as they allow you to:

  • Validate your Interaction Filters
  • Build machine-learned classifiers from actual Facebook stories

note icon

Learn more about how you can take advantage of Super Public text samples:

Get Started

With PYLON for Facebook topic data you can build rich analysis applications.

To get started with PYLON take a look at our Get Started page.