Translating Powertrack rules to CSDL

In this guide we'll look at how you can translate your PowerTrack rules into CSDL, DataSift's filtering and classification language.

tip icon

Before you get started take a look at our What is DataSift STREAM? page. Here you'll learn the key features of the DataSift platform and learn terms we'll use in this guide.

Translating the PowerTrack rules model to DataSift

It is a relatively straight-forward process to migrate your PowerTrack rules to the DataSift platform.

As a PowerTrack user you typically will have:

  • a set of rules that select data from a source (such as Tumblr) you are migrating to DataSift.
  • one stream of data delivered to your application which includes all data selected by your rules.
  • tags for a number of your rules that allow you to identify which of your rules caused a piece of data to be selected. You may be using tags to match data to your end customers.

Equivalent features are supported on the DataSift platform. You can replicate this setup by:

  • creating a set of filters, one for each of your rules, for each translating the PowerTrack rule into CSDL.
  • creating one overarching filter that includes the individual filters that replicate your rules, allowing you to deliver all data to your application as one stream.
  • using tags to identify which filter caused a piece of data to be selected.

In this guide we explain how you can manually translate your rules. We also provide a command-line tool that carries out the same steps as below and can be used to help automate the translation of large numbers of rules.

Manually translating your PowerTrack rules

Step 1: Translating each PowerTrack rule to a CSDL filter

PowerTrack rules and CSDL filters are similar in that they allow you to specify which data items (called interactions on the DataSift platform) to select based upon values of fields within the data. They also allow you to combine a number of conditions using logical operators and parentheses.

Taking an example PowerTrack rule for Tumblr:

(audi OR bmw OR honda OR url_contains:"" OR url_contains:"" OR url_contains:"")

This rule looks for content where the body contains any of the keywords "audi", "bmw", and "honda", or any content that contains links to any of the brands' websites. Specifying just a keyword implies you want to examine the body of a post for the keyword. Specifying the url_contains operator you are examining the URL links shared in a post.

The equivalent written as CSDL would be:

tumblr.body contains_any "audi, bmw, honda"  
OR links.domain in ",,"

Here we have two conditions. The first uses the tumblr.body target to access the body of a post and the contains_any operator to examine the value of the target. The second condition uses the links.domain target to access the domain of any link in the post and the in operator to specify a list of values to test against. The two conditions are combined with the OR logical operator.

PowerTrack operators specify the data field to inspect and how to inspect the value, whereas in CSDL the target specifies what field to inspect and the operator says how to inspect it.

To complete the CSDL for the rule we recommend you specify the source you want to filter against. DataSift filters are run by default against all sources enabled in your account. As you may add additional sources in future it's a good idea to specify sources explicitly in your CSDL filter so you don't receive data from other sources unexpectedly.

(tumblr.body contains_any "audi, bmw, honda"  
OR links.domain in ",,")
AND interaction.type == "tumblr"

The additional condition uses the interaction.type target to specify that only data from the Tumblr source should be selected.

CSDL filters need to be compiled before they are run on the platform. Compiling a filter provides a hash which is the identifier for the the filter. Take a look the 'compiling and previewing a filter' section of the migration guide for instructions on how to compile and preview the output for a filter.

You can use our list of translated PowerTrack to CSDL cheat sheet to write your CSDL manually. You can use our online tool to translate your individual PowerTrack rules to CSDL automatically.

Step 2: Combining your filters into one stream

The DataSift platform allows you to stream the data from any of your individual filters to your application. However, with PowerTrack you have been consuming one stream which contains data from all of your rules. To replicate this model you can create an overarching filter that includes the output from all of your individual filters.

You can use the stream keyword in CSDL to include output from one filter into another filter. You can use this feature to create your overarching filter. You can specify a stream clause for each of your filters you created from your rules, specifying the hash for each compiled filter. You can then combine these clauses using the OR logical operator. For example:

stream "filter-1-hash" 
OR stream "filter-2-hash" 
OR stream "filter-3-hash" 
OR stream "filter-4-hash"

You can use this new filter to deliver data from all of your filters in one stream.

Step 3: Using tags to match delivered data to rules

Once you've combined your filters into one overarching filter you need a way to identify which filter delivered which interactions. You can use CSDL tags to tag interactions based on the filter they originate from.

You have translated each of your rules into a CSDL filter which you have compiled and received a hash for. You can add a tag rule for each of your individual filters to your overarching filter that will tag data based on which filter matched the interaction.

tag.filter "1" { stream "filter-1-hash" } 
tag.filter "2" { stream "filter-2-hash" } 
tag.filter "3" { stream "filter-3-hash" } 
tag.filter "4" { stream "filter-4-hash" } 
return { 
    stream "filter-1-hash" 
    OR stream "filter-2-hash" 
    OR stream "filter-3-hash" 
    OR stream "filter-4-hash" 

Notice that the filter from the previous step is now enclosed in a return statement. You must use this syntax when adding tag rules to filters.

PowerTrack allows you to add tags to rules just as we have above, specifying a text label. Many PowerTrack customers use this feature to match the data selected by rules to their end customers.

For instance this PowerTrack rule would match any posts that contain the word 'bmw' and add the "BMW" tag to each matching post:

{"value":"\"bmw\" ","tag":"BMW"}

If you have specified tag names in your PowerTrack rules you could use these for the tag labels:

tag.filter "audi" { stream "audi-filter-hash" } 
tag.filter "bmw" { stream "bmw-filter-hash" } 
tag.filter "honda" { stream "honda-filter-hash" } 
return { 
    stream "audi-filter-hash" 
    OR stream "bmw-filter-hash" 
    OR stream "honda-filter-hash" 

When you add tag rules to a filter you need to recompile it and receive a new hash. You can then use this new hash to stream tagged data to your application.

Note that a filter (including any child filters it contains) can contain a maximum of 10,000 tags. If you do need to use more than 10,000 tags in total you may need to run multiple filters to work around this limit.

See the 'delivering data to your application' section in the migration guide for guidance on delivering data.

Translating rules using the translation tool

If you have been using PowerTrack for some time now you may have a large number of rules. You can use our command-line tool to help with your translation.

The tool will run through the steps outlined in the manual process above:

  • each individual rule will be translated to a CSDL filter and compiled on the platform.
  • the individual filters will be combined in one overarching filter.
  • a tag will be added to the overarching filter for each source rule.
  • the hash returned by the tool is the result of compiling the final overarching filter.

For details of the command-line tool contact your account manager.

note icon

The translation tool gives you a fast way to migrate your rules so you can get started with DataSift quickly, however the CSDL it produces is a literal translation and is not optimised. Read our optimizing translated rules guide to learn more.