Query Filters

Query filters are used to select matching interactions in an index ready for analysis.

Query filters are written in CSDL, but the level of comlexity and range of operators available is less than when writing interaction filters.

This page summarizes the differences between interaction and query filters.

Supported operators

You can use the full set of CSDL operators in your interaction filters but only a subset of the CSDL operators in your query filters.

Operator or feature Availability in query filters
contains Works on tokenized targets only.
substr Not available.
contains_any Works on tokenized targets only.
Limited to 100 items in the argument list.
wildcard You can use either version of wildcards (? or \*).
Works on any target (tokenized and non-tokenized).
An argument cannot start with a wildcard character.
Limited to 100 items in the argument list.
You can only use alphanumeric characters in your arguments.
contains_all Works with tokenized targets only.
Limited to 100 items in the argument list.
contains_near Works on any target (tokenized and non-tokenized).
Limited to 100 items in the argument list.
exists Works on any target (tokenized and non-tokenized).
in Works on string and numeric targets; for example:
"red, white, blue"
[1,2,3]

Limited to 100 items in the argument list.
url_in Not available.
==
!=

Work on non-tokenized targets only.

>
>=
Work on numeric targets only.
regex_partial Not available.
regex_exact Not available.
geo_box
geo_radius
geo_polygon
Not available.
The cs modifier for case sensitivity. Not available. See below.

Equals and not equal

When using == and != in an interaction filter, the following expressions are not equivalent:

  1. not links.domain == "facebook.com"
  2. links.domain != "facebook.com"

The first example states that facebook.com is not in the list of values in links.domain.
The second example states that there is something in the list of values that is not facebook.com.

However, when using == and != in a Query Filter, they behave differently.

  1. not links.domain == "facebook.com"
  2. links.domain != "facebook.com"

The first example matches all interactions which don't have facebook.com in the list of domain values, or have facebook.com as one of the values as long as it is with other values which are not facebook.com.
The second example states that facebook.com must not appear in the list of values.

Wildcard operator

When using the wildcard operator it is important to note that only the following characters are supported:

  • Western alphabet letters, a to z (upper and lower case).
  • Numbers 0 to 9.

Accented letters, punctuation and characters from other alphabets are not currently supported.

For instance if you're looking to filter to content mentioning carparks, car parking, and other variations of the term, the following filter would not compile because of the - character:

fb.all.content wildcard "carpark*, car-park*"

You could instead replace the - with a ?. This would then match the - in content and allow the filter to compile:

fb.all.content wildcard "carpark*, car?park*"

But now

Case sensitivity and tokenization

Non-tokenized targets

Some of the targets in query filters are non-tokenized and case sensitive. For instance, imagine we have a target called example.author.gender which filters against a data source called Example.com. Suppose that the example.com data source can hold these values for example.author.gender:

  • female
  • male
  • unknown

This query filter will succeed:

example.author.gender == "female"

This query filter will find no data:

example.author.gender == "Female"

Thus it is important to run some initial tests on the targets you plan to use with your chosen source. Where possible in our target pages, we attempt to give you guidance and sample values that we have seen. These should help you in most situations. But remember that example.com has complete control of their data, and they can change their mind at any time. A more robust filter might be:

example.author.gender in "Female, female, FEMALE"

Tokenized targets

Some targets are tokenized and case insensitive when you use them in query filters:

Logical operators

You can use up to 30 logical operators (and, or, not) in a query filter. For example, this query filter uses two logical operators:

fb.author.gender == "female" and fb.author.country == "United States" and fb.content contains "coffee"

This query filter uses four logical operators:

fb.author.gender == "male" and not fb.author.age == "65+" and fb.author.country_code == "US" and fb.content contains "coffee"

Precedence of logical operators

In query filtering OR takes precedence over AND so:

A and B or C and D

is the same as:

(((B or C) and A) and D)

Note: this is not the same as the operator precedence for interaction filters. We recommend that you use parentheses to indicate exactly the precedence you require.

Character escaping

You need to escape double-quote (") characters if they are contained within parameter values you want to use with operators in query filters.

To illustrate this idea, let's look at an example:

  1. You want to filter for a topic name containing quotes, such as 'Mike "Coach K" Krzyzewski'.

  2. The string parameter required by the CSDL parser would read:

Mike \"Coach K\" Krzyzewski

This is because the CSDL parser requires a backslash before each double quote to interpret the double quote character correctly.

  1. The CSDL for your query filter including the target name would be:

fb.topics.name == "Mike \\"Coach K\\" Krzyzewski"

The additional backslash has been added to escape the double quote within the string value for the condition.

  1. Finally you would encode your filter parameter in JSON for your analysis call:

"filter": "fb.topics.name == \"Mike \\\\\"Coach K\\\\\" Krzyzewski\""

Note that encoding the filter CSDL in JSON introduces JSON's own escaping syntax.