Regular Expressions

DataSift supports Regular Expressions to allow complex pattern matching against any CSDL target that is a string. Regular expressions can be highly complex but they allow almost infinite power and flexibility for pattern matching. DataSift uses the Google RE2 engine which has a few differences from PCRE. We have included a range of resources and CSDL examples to help you get acquainted with the way they are used within the DataSift platform.

CSDL includes two operators for regular expressions:

  • regex_partial - filter for content that can appear anywhere within the target you select.
  • regex_exact - filter for content that matches the entire target you choose.

Regular expressions in streams can be very simple, like this one which filters for any content that contains a lower-case letter:

interaction.content regex_partial "[a-z]"

Or they can be more ambitious. Can you figure out that this regex would find?

interaction.content regex_partial "([a-z0-9_\\.-]+)@([\\da-z\\.-]+)\\.([a-z\\.]{2,6})"

The expression would find content containing an email address.

CSDL regex examples

Here are some examples of regexs in CSDL.

Escaped characters

We strongly recommend that you take a look at our page about escaped characters.

Escape sequences are familiar to programmers but remember that DataSift's regex engine is embedded inside CSDL. Thus, you have to escape some characters twice, first to get them past the CSDL parser and then to indicate to the regex engine that the character is a literal, not a metacharacter.

The need to escape backslashes ("\") will affect how you write arguments for the regex_exact and regex_partial operators. For example, searching for newlines ("\n") in the regular expression becomes "\\n" otherwise the CSDL compiler will assume that you want to insert a newline into the argument.

Learning resources

Here are some useful resources for learning about regular expressions:

Software for testing regular expressions

Here are some useful resources for learning about regular expressions: