Investigating Audience Snacking Habits with Facebook Topic Data

Richard Caudle | 3rd December 2015

You might have seen our recent blog post where we highlighted some surprising findings about snacking habits based on research using Facebook topic data. In this post we'll take a look at how the research was carried using our platform.

Testing long-held assumptions

We're all increasingly looking to make better data-informed decisions. In this case an agency working on behalf of a brand of popular snack was looking to improve the effectiveness of their advertising around the time of sporting events.

Until now the brand had assumed the following to be true:

  • Men between 18 and 24 engage most with the snack brand.
  • Pre-game excitement peaks just before game time.
  • Most games were watched with friends.

We used Facebook topic data to investigate whether these assumptions were true.

The conclusion of the research was in fact that:

  • Women between 35 and 65 engage most with the snack brand.
  • Pre-game excitement peaks 6 hours before the game.
  • Most games were watched with family.

The brand can use these insights to carry out better targeted advertising in future. Let's take a look at how the research was carried out.

Working with PYLON

Before we look at the detailed steps, here's a quick reminder of how PYLON works in practice.


You work with PYLON by:

  • Filtering the stream of data from Facebook to stories and engagements (such as likes and comments) you'd like to analyze. Filtered data is recorded into an index.
  • Classifying the data using your own custom rules to add extra metadata for your use case.
  • Analyzing the data you have recorded to the index.

You can learn more about the platform in our What is PYLON? guide. Now look at these steps in the context of this specific use case.

Filtering stories and engagements

The first step of working with Facebook topic data is recording data from a target audience for your analysis.

Using the DataSift platform you can capture stories and engagements on stories by creating a filter in CSDL. The filter specifies what data you'd like to be recorded from the Facebook data source to your index for analysis. The rules in your filter operate against the values of targets (data fields) of the stories and engagements.

For example this filter would capture stories and engagements relating to some popular snacks created by people in the US:

    fb.content contains_any "cheese puff, nachos, pork rind" 
    OR fb.parent.content contains_any "cheese puff, nachos, pork rind" 
    OR fb.content wildcard "corn chip\*, bombay\*mix" 
    OR fb.parent.content wildcard "corn chip\*, bombay\*mix" 
AND == "United States"

Using a filter like this you can start a recording. The recording will store any posts that mention the snacks and any engagements (likes, comments and reshares) from the audience into a private index that you can query for your analysis.

So for example if someone posts the following status:

Stocking up on nachos before the big game!

This story and any likes, comments or reshares on the story will be recorded to your index.

The above example used simple keywords for filtering conditions, but you can also take advantage of topics which are inferred automatically from the content of posts. For the example post topics such as sport, snacks and nachos would be inferred. So another example filter based on topics instead of keywords would be:

    fb.topics.category in "snack,sport" 
    OR fb.parent.topics.category in "snack,sport" 
    OR in "nachos,chips,crisps" 
    OR in "nachos,chips,crisps" 
AND == "United States"

CSDL gives you many powerful options for filtering data. It's important to test and improve your filters to make sure you are capturing the right data for your analysis.

Adding value through classification

Facebook topic data is already a rich data set, but you can add additional value using classification rules. In this case the agency was interested in analyzing the number of people watching sport with their friends versus their family.

By adding classification rules to a filter the platform will record additional meta-data for each story and engagement. You can use this additional metadata in your analysis.

tag "family" { 
    fb.all.content contains_any "with my father, with my dad, with my daddy" OR ( fb.all.content contains_any "with my" AND ( IN "Father" OR IN "Father" ) ) 
    OR fb.all.content contains_any "with my brother, with my brothers, with my sister, with my sisters" 
    OR ( fb.all.content contains_any "with my" AND ( IN "Sibling" OR IN "Sibling" ) ) 
    OR fb.all.content contains_any "with my mother, with my mom" 
    OR ( fb.all.content contains_any "with my" AND ( IN "Mother" OR IN "Mother" ) ) 
tag "friends" { 
    fb.all.content contains_any "with my friend, with my friends, with my mate" 
    OR ( fb.all.content contains_any "with my" AND ( IN "Friendship" OR fb.parent.topics name IN "Friendship" ) ) 

When a story or engagement matches the filter conditions the classification rules are applied before the data is recorded to your index. So in this case if a the content of a post reads:

Settling in to watch the match with my brother!

The story will be tagged with "family" when it is stored to the index.

The agency also wanted to gauge when the audience was most excited leading up to a game. There are many ways you could look to classify excitement. One way would be to combine sentiment with keywords. For example you could include a rule such as:

tag "excited" { fb.sentiment == "positive" AND fb.content contains_any "can't wait, excited, looking forward to" }

Sentiment is provided by Facebook's sentiment analysis engine. Combining this signal with keywords gives a strong indication of excitement.

Again when carrying out such research it's important to test and improve your classification rules so you get accurate analysis results from your index.

Finding audience insights

Once you've recorded data to your index you can immediately perform initial analysis using analysis queries.

You can perform a time series analysis to see how an audience engaged over time. You can perform a frequency distribution analysis to quantify the engagement by segments of your audience. A more advanced form of analysis is a nested query where you can segment and quantify your audience by multiple dimensions.

You also have the option of using query filters to filter to a portion of your recorded data before performing analysis. So for example you could use the example tags above and filter to only stories and engagements relating to friends before performing a time series analysis.

You can use analysis queries to perform in-depth analysis and test your hypotheses.

Which age group engaged most with the brand?

The first assumption was that men between 18 and 24 engage the most with the brand. To test this assumption we looked at the demographic breakdown of the audience engaging against the wider Facebook audience. This technique is called baselining and immediately reveals that the assumption was wrong.

To perform an age-gender breakdown you can use the following parameters for your analysis query. This example is a nested query that uses the and targets. These targets give access to the gender and age of who posted a story or engaged with a story.

    'analysis_type': 'freqDist',
        'threshold': 2,
        'target': ''
        'analysis_type': 'freqDist',
            'threshold': 5,
            'target': ''

datasift.pylon.analyze('snack index id', analyze_parameters)
datasift.pylon.analyze('wider Facebook index id', analyze_parameters)

Running this query firstly on the index for your audience and secondly on an index which contains interaction from the wider Facebook audience allows you to compare the two audiences.


On this chart the grey shading represents the wider Facebook audience. You can see that females between the ages of 35 and 64 are significantly overrepresented compared to the average Facebook audience.

When did pre-game excitement peak?

The second assumption stated that pre-game excitement peaks just before game time. To test this assumption we performed time series analysis of stories that we classified as showing excitement.

To classify the data we used tag similar to the example above. This tag allowed us to use a query filter for the time series analysis.

analyze_parameters = {
    'analysis_type': 'timeSeries',
        'interval': 'hour',
        'span': 2

datasift.pylon.analyze('recording id', analyze_parameters, 'interaction.tags IN "excited"')

From this chart you can see that excitement peaks around 10am, which was 6 hours before the start of the game.


Who do people watch games with?

The third assumption stated that people watch games mostly with their friends. To test this assumption we performed time series analysis of people mentioning their friends, versus those mentioning friends.

To classify the data we used tags similar to the example above, these allowed us to use query filters for two time series analysis, one for friends and one for family.

analyze_parameters = {
    'analysis_type': 'timeSeries',
        'interval': 'hour',
        'span': 2

datasift.pylon.analyze('recording id', analyze_parameters, 'interaction.tags IN "friends"')
datasift.pylon.analyze('recording id', analyze_parameters, 'interaction.tags IN "family"')

By plotting the two time series on one chart it was clear to see that more people mentioned family members.


Learn more…

PYLON for Facebook Topic Data gives analysts access to a vast new audience to test their assumptions and to inform better decisions.

To learn more about the platform take a look at our What is PYLON? guide.

Also, keep an eye on this blog for more Facebook topic data use cases which we'll be posting soon.

Previous post: Important Announcement on API Versioning

Next post: Nested Analysis Queries in PYLON