New Tokenized Targets for PYLON Query Filters

Ed Stenson | 23rd March 2016

In this blog I'm going to look at tokenized targets in query filters. If you're new to the PYLON platform take a look at our PYLON 101 and Get Started guides.

PYLON offers two types of CSDL filter:

  • interaction filters
  • query filters

An interaction filter takes data from Facebook, filters it, and records the result to an index. For instance, you might write an interaction filter to sift through Facebook topic data looking for stories relating to the automotive sector.

A query filter is optional. It takes data from the index, filters it, and PYLON then allows you to perform analysis on the result. For example, if you want to use the automotive index to discover the age breakdown of people talking about Tesla vehicles you would create a query filter that excludes all the stories and engagements that relate to other brands and then perform your analysis query on the remaining interactions, which all focus on Tesla.

What's new?

To date only a handful of targets have been tokenized for query filters but the latest release brings many more. Here's the full list:

Tokenized targets bring out the full power of the contains operator. For example, suppose the URL of a news story is:

This filter, which uses the contains operator, will match the URL:

links.url contains ""

Without tokenization you could only match against the entire argument (in this case the entire URL) so you would have written the filter like this:

links.url == ""

Using tokenization in query filters

A few days ago I wrote an interaction filter to record data to an index about links to content at The Independent:

links.url contains ""

So far I've recorded more than 1.2 million interactions. It's reasonable to assume that these interactions include links that include all the sections of the site. Suppose I want to study reader demographics. Some of the main areas at The Independent are:

Title Path
News /
Voices /voices
Culture /arts-entertainment
Lifestyle /life-style
Tech /life-style/gadgets-and-tech
Sport /sport

We can use a query filter to filter our index further, selecting just stories in the sport section for example, and then feed those interactions into an analysis query to generate an age-gender breakdown for visitors to that section of the site. Then we can repeat the process for each of the other sections to generate independent age-gender breakdowns for each of those.

The combined query filter and analysis query for the sport section might look like this:

curl -X POST 
    -d ' { "filter": "links.url contains \"\"", "id": "1f42c40be4446f63aa0c5008ab7f700e", "parameters": { "analysis_type": "freqDist", "parameters": { "threshold": 2, "target": "" }, "child": { "analysis_type": "freqDist", "parameters": { "threshold": 9, "target": "" } } }, "start": "1447675740" } ' 
    -H 'Authorization: id:api_key' 
    -H "Content-type: application/json"

The query filter 'lives' in the "filter" parameter in the JSON and the analysis query 'lives' in the "parameters" object.

When I tested the call to /pylon/analyze on my own data, the age/gender breakdowns for unique authors in the Voices and Lifestyle sections of the site looked like this:

One conclusion you might draw here is that millenials of either gender publish more links to the Lifestyle section of the site than the Voices section.



  • The interaction filter filled our index with data about visitors to the site regardless of the section they visited.
  • The query filter allowed us to focus in one any section and exclude the ones we didn't immediately want to study.
  • The analysis query allowed us to generate an age-gender breakdown.

The query filter and analysis query 'live' in a single call to the /pylon/analyze endpoint. We need to hit that endpoint multiple times to analyze each of the site's sections in turn.

Previous post: Filter Swapping (part 2)

Next post: Announcing PYLON 1.7.1 - Introducing Enhanced Sampling Support