Analyzing Content by Keyword

As you know you cannot retrieve raw content from your index this is enforced by PYLON's privacy model. However you can still work with keywords when you submit analysis queries as they can be used in query filters.

In fact using keywords in this way is a great way to check the quality of your recorded data and 'prototype' changes such as new tags you'd like to add.

tip icon

Remember when you write any query you need to consider whether you are looking for stories, engagements or both. The fb.content target represents content found in original stories, whereas the fb.parent.content target represents content found in stories which are the parent of an engagement.

You can read our introduction to Facebook topic data for more details.

Analysis Query Filters

When you submit an analysis query you can provide an optional filter using the filter parameter. This parameter accepts CSDL allowing you to subset the data in your index before the analysis is run.

Filtering by Content

To filter content by keywords and phrases you use the fb.content, fb.parent.content and fb.all.content targets.

  • fb.content - If the interaction is a story, this is the text content
  • fb.parent.content - If the interaction is an engagement, this is the text content of the story being engaged with
  • fb.all.content - This target covers both stories and engagements it is populated by the story content if the interaction is a story, or the parent story's content if the interaction is an engagement

You can look for keywords in these targets using operators such as contains_any and wildcard.

If you are familiar with creating CSDL filters you need to be aware that not all operators can be used in query filters. The Query Filter Operators page summarizes the supported operators. For instance regular expressions aren't currently supported.

The following examples explicitly make use of the fb.content and fb.parent.content targets to help you understand exactly what is being filtered for. You can though make use of the fb.all.content if you wish, but make sure you understand what you are filtering for and how it impacts your results.

contains_any

The contains_any operator lets you look for any occurence of a set of words and phrases in content.

For example here we're filtering for well known car brands that appear in stories:

fb.content contains_any "BMW,Audi,Nissan,General Motors"

contains_near

The contains_near operator lets you look for words that near to exist close to each other. This can be extremely useful when for example search for book titles, movie titles or anything where in conversation authors are likely to leave out small word.

For example if you're searching for the 'The Girl with the Dragon Tattoo' in stories that are being engaged with, people may miss out the or with. To cater for this you could use the following CSDL:

fb.parent.content contains_near "girl,dragon,tattoo:5"

wildcard

The wildcard operator is great when searching for groups of similar words, or common misspellings.

For example if you're looking for words relating to printing, you'll want to search for words including print, prints, printer, printers, printing and printable. With CSDL this is easy:

fb.content wildcard "print*"

Or, if you're looking for a word that's often misspelled, such as accommodation, you can use a wildcard to cater for an 'optional' m:

fb.content wildcard "accom*odation" OR fb.parent.content wildcard "accom*odation"

An example full analysis query would be as follows:

{
    "analysis_type": "freqDist",
    "filter": "fb.parent.content contains_near \"girl,dragon,tattoo:5\" ",
    "parameters": {
        "target": "links.url",
        "threshold": 5
    }
}

Filtering by Language

When filtering by keywords, you're naturally creating bias towards a language. If I filter for Spanish keywords, natually I'll receive mainly Spanish content.

But if you want full control over the language of content you'd like to capture you can use the fb.language and fb.parent.language targets.

Filtering to one or more languages

It might be important for you to filter to one language, so for example to filter to stories written in Spanish:

fb.language == "es"

Or for a list of languages:

fb.language in "en,es"

Aligning keywords with languages

Or for tighter control you can align your content conditions with languages.

fb.language == "es" AND fb.content contains "café"

An example query would be as follows:

{
    "analysis_type": "freqDist",
    "filter": "fb.language == \"es\" AND fb.content contains \"café\"",
    "parameters": {
        "target": "links.url",
        "threshold": 5
    }
}

Filtering by Hashtags

Hashtags are used on Facebook just like on Twitter and Instagram.

You can filter for hashtags using the fb.hashtags and fb.parent.hashtags targets:

fb.hashtags in "sun,summer,sunny"

Analysis Targets

It's not possible to use content targets (fb.content and fb.parent.content) as analysis targets due to PYLON's privacy model.

You can though analyze languages and hashtags used by authors.

For example to analyze the top languages used in stories:

{
    "analysis_type": "freqDist",
    "parameters": {
        "target": "fb.language",
        "threshold": 5
    }
}

Or the top hashtags in stories people are engaging with:

{
    "analysis_type": "freqDist",
    "parameters": {
        "target": "fb.parent.hashtags",
        "threshold": 5
    }
}