Analyzing Tagged Data

A key feature of PYLON is the ability to classify data with custom rules and then use this to greatly increase you analysis options.

You can use tags you've added to data in both your analysis query filters and as targets to be analyzed.

Analysis Query Filters

By adding tags to data you give yourself many more ways to subset your data for more analysis that specifically is tailored to your use case.

When you submit an analysis query you can provide an optional filter using the filter parameter. This parameter accepts CSDL allowing you to subset the data in your index before the analysis is run.

Simple Tags

For example you could add the following tags to your interaction filter for your recording:

tag "BMW" { fb.parent.content contains_any "BMW" or fb.content contains_any "BMW" } 
tag "Honda" { fb.parent.content contains_any "Honda" or fb.content contains_any "Honda" } 
tag "Ford" { fb.parent.content contains_any "Ford" or fb.content contains_any "Ford" }

With these tags in place you can now subset data in your index by the automotive brand.

For example you could use the following filter to analyze just interactions that mention BMW:

interaction.tags == "BMW"

Or, you could filter to multiple brands:

interaction.tags IN "BMW,Honda"

If you're looking to analyze for example the top links shared by an audience, you now have the abillity to analyze this by brand:

{
    "analysis_type": "freqDist",
    "filter": "interaction.tags == \"BMW\"",
    "parameters": {
        "target": "links.url",
        "threshold": 5
    }
}

Namespaced Tags

You can of course use namespaced tags in the same way. The advantage of namespaced tags is that they enable you to build large taxonomies of tags in an organised fashion that's easy to analyze.

Using namespaces let's expand our example tags:

tag.car.brand "BMW" { fb.parent.content contains_any "BMW" or fb.content contains_any "BMW" } 
tag.car.brand "Ford" { fb.parent.content contains_any "Ford" or fb.content contains_any "Ford" } 
tag.car.ford "E-150" { fb.parent.content contains_any "E150,E 150,E-150" OR fb.content contains_any "E150,E 150,E-150" } 
tag.car.ford "E-350" { fb.parent.content contains_any "E350,E 350,E-350" OR fb.content contains_any "E350,E 350,E-350" } 
tag.car.bmw "3 series" { fb.parent.content contains_any "3 series, 3-series, 3series" OR fb.content contains_any "3 series, 3-series, 3series" } 
tag.car.bmw "5 series" { fb.parent.content contains_any "5 series, 5-series, 5series" OR fb.content contains_any "5 series, 5-series, 5series" }

With these tags in place you could filter by brand:

interaction.tag_tree.car.brand == "BMW"

Or by model:

interaction.tag_tree.car.bmw == "3 series"

Or by combinations of both:

interaction.tag_tree.car.brand == "BMW" AND interaction.tag_tree.car.bmw == "3 series"

Note that the interaction.tag_tree target is case-sensitive. So in the above example "bmw" would not match any tags.

note icon


It's important to note that an interaction filter can include up to 10,000 tag or scoring rules, including from those you have included using the stream keyword.

Analysis Targets

By adding tags to data you can perform frequency distribution analysis on the tagged data.

When you submit an analysis query you can specify tags as your analysis target. You can do so for both simple and namespaced tags.

Simple Tags

When analyzing results of simple tags you use the interaction.tags target. Analyzing this target gives you a count of the number of interactions for each tag in the set of data you're analyzing.

If you applied the simple tags example above you could analyze the spread of brands across data in your index using this query:

{
    "analysis_type": "freqDist",
    "parameters": {
        "target": "interaction.tags",
        "threshold": 3
    }
}

Namespaced Tags

When analyzing results of namespaced tag rules you make use of the interaction.tag_tree target.

If you applied the namespaced tags example above you could analyze the spread of brands like so:

{
    "analysis_type": "freqDist",
    "parameters": {
        "target": "interaction.tag_tree.car.brand",
        "threshold": 3
    }
}

Notice here you specify the level of tags which contains the 'leaves' in your tag structure that you'd like to analyze. You cannot specify a level which contains sub namespaces, so you can only analyze one group of tags at a time.

Of course you could add a filter based upon your tags too. For example to analyze the top models in the BMW brand:

{
    "analysis_type": "freqDist",
    "parameters": "interaction.tag_tree.car.brand == \"BMW\" "
}