Facebook topic data is the data source you work with to analyze Facebook audiences in PYLON. In this guide you'll learn more about the data available to you for analysis.
- What is Facebook topic data?
- The Facebook topic data model
- What data is included?
- Working with Facebook topic data
- Next steps…
Facebook topic data is a data source that you can analyze using PYLON. The source provides real-time access to posts (and engagements on posts) which users are posting on their Facebook timelines. All activity is anonymized to protect user privacy; however for all activities demographic details of the author is provided.
As soon as somebody posts a new story on their timeline or one of their friend's likes, comments or shares the story, this event is supplied by the Facebook topic data source to the platform. If you have an interaction filter running as a recording, when the event takes place it will be recorded into your index if it matches your filter criteria.
All events that the Facebook topic data source provides are called interactions. The Facebook topic data source provides two types of interactions:
- Stories - Posts (or status updates) a user adds to their timeline
- Engagements - An interaction with a story, such as a like, comment, or reshare
Each interaction (story or engagement) provided by the Facebook topic data source has a number of data fields that you can work with when you write interaction filters and analysis queries.
The data fields that you can work with are called targets. Targets are organized into a hierarchy of namespaces. Depending on the type of interaction different targets will be available to you to work with. For example, for story interactions the content of the story is provided in the fb.content target. However engagements have no content so the fb.content target will be empty, but the content for the story they relate to is provided in the fb.parent.content target.
This diagram shows you which targets are available for each type of interaction:
Take a look at the targets page to see all the data that's available for filtering in your interaction filters and analysis queries.
For most use cases you'll want to record both stories and indexes into your index.
Stories are posts that people add to their timeline. A story can be one of the following types:
- post - a simple text post
- link - a link shared to external content
- video - a video clip
- photo - a photo or image
- photo_album - a photo album
- reshare - a story created by a friend that a user chooses to reshare
- note - Facebook's long-form content; these are less frequently seen in the data
Data for the story is available in these target namespaces:
- fb.* - Details of the story itself, such as content of the post
- fb.author.* - Demographic details of the post's author
- fb.topics.* - Topics extracted from the post
Engagements represent people interacting with a story. An engagement can be one of three types:
- Like - A user or a page 'likes' a story
- Comment - A person comments on a story
- Share - A person shares a story from a friend on to their own timeline
For stories the fb.type target provides the type of engagement; like, comment or share.
Data for the engagement is available in these target namespaces:
- fb.author.* - Demographic details of the person who engaged
- fb.parent.* - Details of the parent story (content, links etc.)
- fb.parent.author.* - Demographic details of the parent story's author
- fb.parent.topics.* - Topics extracted from the parent story
It is important to note that the content of comments are not available for analysis.
You can access the values of parent story targets as long as the engagement has been 'hydrated' from the story cache. The story will be cached in the large majority of cases as long as you have filtered for stories in your interaction filter. You can read more about the process of engagement hydration in our understanding engagement hydration guide.
You'll often want to filter or analyze based on the content of the stories being posted or engaged with. Depending on the type of the story there are a variety of targets available for filtering and analysis.
These simple text posts are often called 'status updates' on Facebook. These stories contain text-only content, there are no links, videos or photos.
A story is labeled as type 'link' when the author posts a link alongside their text.
You can access the link shared using the fb.link target, the title of the page shared using fb.link_title and any text posted alongside using fb.content, or the parent equivalent for engagements. We recommend for most use cases that you use the links augmentation when filtering and analyzing links as this caters for multiple links being shared in a story.
_Note that it is possible for a story to contain both a photo or video and a link. In this case fb.media_type will be set to 'photo' or 'video' as these take precedence over the link media type._
Videos and Photos
People can share videos and images on Facebook. Although PYLON currently does not support analysis of the media itself, you can access any text posted alongside using fb.content (or fb.parent.content target for engagements).
People can choose to reshare content they see on the timeline.
Reshares are a special case which it is important to understand. When a person 'reshares' a story this creates two new interactions:
- The first is a new story with fb.media_type set to 'reshare'.
- The second is a new engagement with fb.parent.media_type set to photo / video / link as appropriate from the original story.
- fb.parent.content will contain the content from the original post.
Resharing is best explained using an example scenario. See how a chain of reshares is created:
David creates a story containing a link to a page about a new car he's just purchaed, with the text "I love my new car!"
- The story contains David's demographics in fb.author.* and his content in fb.content.
Steve 'reshares' David's post with his own text: "That looks awesome!"
- One story and one engagement are created.
- The story contains Steve's text in fb.content and his demographics in fb.author.*. The story also contains David's link in fb.link and links.url (note that links.url will also contain any additional links in Steve's post).
- The engagement contains David's text in fb.parent.content but does not contain Steve's text. It also contains David's demographics in fb.parent.author.* and Steve's demographics in fb.author.*.
Anna 'reshares' Steve's reshare but adds no text of her own.
- One story and one engagement are generated.
- The story contains Anna's demographics in fb.author.* and fb.content is blank as she added no new text.
- The engagement contains Steve's text in fb.parent.content; and David's link in fb.link and links.url. It also contains Anna's demographics in fb.author.* and Steve's demographics in fb.parent.author.*.
If you are recording both stories and engagements into your index you need to keep in mind the above process. You might for instance add a query filter to your analysis query to exclude reshares.
People can use Facebook's Notes application to write long form content. Notes are now rarely published.
Hashtags are used on Facebook just like on Twitter and Instagram.
Stories are posted on Facebook in a huge variety of languages.
If you use keywords in your filters you will naturally create bias towards a language. If you filter for Spanish keywords, naturally you'll receive mainly Spanish content!
In our examples area you can see example interaction filters and analysis queries using content and language targets.
All interactions given by the Facebook topic data source give you a full set of demographics details for the author. The details are based on information supplied by the author in their Facebook account.
For stories, demographic details of the author are available in the fb.author.* namespace.
For engagements the fb.parent.author.* namespace represents demographics of the parent story's author. The fb.author.* namespace represents demographics of the person who engaged with the story.
The authors of all stories and engagements are either a user (an individual person) or a page (usually a company or organization).
Distinguishing between these two categories of author is extremely valuable for your analysis. The fb.author.type target represents the author if the interaction is a story. For engagements the fb.author.type target represents the person engaging and the fb.parent.author.type target represents the author of the story being engaged with.
The age group of the author, categorized into a set number of bands. Notice there is no band below 18 years old as this data is not available.
When the value for an author is "unknown" the most likely reason is that the user has not provided this information or the author is of type 'page'.
The fb.author.age target represents the author if the interaction is a story. For engagements the fb.author.age target represents the person engaging and the fb.parent.author.age target represents the author of the story being engaged with.
The gender of the author. This can be one of male, female or unknown.
When the value for an author is "unknown" the most likely reason is that the user has not provided this information or the author is a 'page'.
The fb.author.gender target represents the author if the interaction is a story. For engagements the fb.author.gender target represents the person engaging and the fb.parent.author.gender target represents the author of the story being engaged with.
The location of the author to state or region level.
You can filter on an author's country using the fb.author.country and fb.author.country_code targets which represent the author posting the story, or the user engaging if the interaction is an engagement. Equivalent targets are also available in the fb.parent.* namespace.
Slightly more complex is the notion of region as this varies by the country the author is within. The region is provided in the fb.author.region and fb.author.country_region targets, and the parent equivalents. We recommend using the country_region targets as these give you the country context of the region. If an author is in the USA the region represents their state. If the author is in a country such as the UK then the region represents the country within the UK - so England, Scotland, Wales, Northern Ireland. The full list of regions for each country are listed on the countries and regions page.
When you start working with demographic targets they can seem initially confusing as they are quite different to other groups of targets.
For example if you want to filter by content keywords you use the fb.content target for stories and the fb.parent.content target for engagements. An interaction depending on its type will have one of the targets, not both.
However, for demographics, both stories and engagements have the fb.author.* targets. So filtering say on fb.author.age will capture both stories and engagements by an age group. Whereas filtering on fb.parent.author.age will capture engagements on stories written by an age group. The difference is subtle but very important to understand.
Take a look at our example interaction filters and analysis queries to learn more.
Where possible, stories will be given a set of topics that have been inferred from the content of the story.
The topics provided exist in the Facebook Graph. This includes topics such as movies, brands, famous people, in fact a huge number of topics. Topics are extremely powerful, giving you a richer, structured understanding of stories. Topics are organized into categories - a full list of categories is available here.
For each topic identified the id, category, and name is available. These can be accessed using the fb.topic_ids, fb.topics.category and fb.topics.name targets respectively for stories and their parent equivalents when working with engagements. Just like locations the fb.topics.category_name target provides access to the topic name together with the category.
As an example, if a person posts the following content:
I'm off to see Counting Crows at Madison Square Garden tonight!
Topics for the band "Counting Crows" and venue "Madison Square Garden" will be inferred from the content.
The following topics will be available in the following targets:
|5491862434||Musician/band||Counting Crows||Musician/band | Counting Crows|
|28859306498||Sports venue||Madison Square Garden||Sports venue | Madison Square Garden|
Depending on the topic category there are a number of additional targets you can work with. You can see all the possible targets on the Targets page. For instance a topic for a company such as Lamborghini provides you with a website address and company description - see https://www.facebook.com/Lamborghini/info?tab=page_info.
Read our in-depth guide on Discovering Topics to learn more about how to find and work with topics on Facebook.
In our examples area you can see example interaction filters and analysis queries using topics.
Sentiment of stories is provided by Facebook's sentiment analysis engine. Sentiment is classed as positive, negative or neutral. If no sentiment is detected no value is provided.
A list of languages supported for sentiment is available here.
As interactions arrive from Facebook the Datasift platform carries out a number of augmentations on the data, providing additional targets you can work within your filters and analysis.
The interaction augmentation brings together commonly useful data fields across data sources on the DataSift platform. You might be familiar with this namespace if you've used other DataSift products.
Within the interaction.* namespace the most useful targets are interaction.tags, interaction.tag_tree and interaction.ml.categories. These give you access to classification you've added to interactions using classifier rules. Read our classifying data guide to learn more.
The links augmentation gives you richer details for links shared in stories.
However, often there are multiple links shared within the content of a story. The links augmentation extracts all links contained in the text of a story so it gives you a more comprehensive coverage of shared links. These links are available in the links.* namespace.
In our examples area you can see examples interaction filters and analysis queries using shared links.
Facebook has a large number of features. Before you start working with the data let's look at what is (and what is not) included in Facebook topic data.
- Stories posted by
- Engagements on stories (likes, comments and reshares) by
- Activity in public groups.
- Sponsored posts -- Posts by
pagetype authors are received by PYLON when they are created. These posts may consequently be sponsored, however currently PYLON does not allow you to distinguish between sponsored and non-sponsored posts in your analysis.
- 'Instant Articles' -- Stories include a link to the content on the publishers site.
- Facebook live videos -- The value of fb.media_type is
video, just like other shared videos & clips.
- Stories posted by
- Not included
- Replies and likes on comments.
- Activity in closed or secret groups.
- Facebook Messenger activity.
- Facebook at Work activity.
- Stories where the author chooses 'Only Me' for who can see the post.
- Reviews on Facebook pages.
- Dark Posts. These stories can be created by
pageauthors. The stories do not appear on the author's page but are instead targeted to appear in specific users' news feeds.
Note that reactions are all currently received as likes.
When you record interactions on Facebook into an index you need to consider both stories and engagements.
As an example, if you're looking to capture engagement around car brands you'll want to capture both stories posted about the movie and engagements with those stories. This allows you to analyze both original posts and engagements on those posts, such as likes and shares.
If you are looking to analyze engagements only you need to also make sure you capture the story interactions being engaged with. This is because the fb.parent.* namespace is completed from the original story only if the story is in your index. You can read more about the process of engagement 'hydration' in our understanding engagement hydration guide.
So when creating your interaction filters you'll want to use both the fb.* and the fb.parent.* namespaces. For example:
fb.content contains_any "BMW,Audi,Nissan,General Motors" OR fb.parent.content contains_any "BMW,Audi,Nissan,General Motors"
For examples of filtering Facebook topic data see the examples page.
Again you need to keep in mind both stories and engagements when you create your analysis queries.
For example, do you want to analyze just original stories that share a link, engagements on a story that share a link, or both?
Depending on the set of targets you are working with this can be confusing. For instance, in an analysis filter, fb.language will analyze only story interactions and fb.parent.language will only analyze content from stories that are being engaged with. This is because an engagement does not have the fb.language target and a story does not have the fb.parent.language target so the two sets are mutually exclusive. However, both stories and engagements have the fb.author.* targets so if you use these for your analysis you will analyze both.
If you're in doubt you can add a filter to your analysis query using fb.type, for example, to ensure you only analyze stories:
fb.type == "story"
See the examples page for common analysis queries you might want to perform.
Now that you have an understanding of the data available in the Facebook topic data source, your next step is to start recording the data to an index for analysis.
Take a look at the following resources to get started: