Filter Swapping (part 1)

Ed Stenson | 9th March 2016

In this two-part blog I'm going to look at use cases involving a feature of PYLON called filter swapping which allows you to change the CSDL code of an interaction filter without stopping your recording.

If you're new to the PYLON platform take a look at our PYLON 101 and Get Started guides.

Use Case: Countries

Suppose you want to monitor a brand in your own country initially and then gradually expand your geographical coverage to include other countries. Your plan might look like this:

  1. Start with US only
  2. Add UK
  3. Add Mexico and Canada
  4. Add France, Germany, Italy, and Spain

You can already perform this kind of filtering in PYLON but you would need to start a new recording at each step. With filter swapping you can perform the entire study with just one interaction filter which you can leave running perpetually.

Step 1: Create the initial US-only interaction filter

It's easy to write a filter in CSDL to record data to an index. It can be as simple as this:

fb.all.content contains "mybrand" and == "United States"

This interaction filter finds stories about "mybrand" from authors who choose a Current City in the United States in their Facebook profile. It will also find engagements (likes, comments, shares) on those stories where the author of the engagement is in the United States.

We have a complete step-by-step guide for the PYLON API but the steps you need to get started are:

  1. Hit the /pylon/validate endpoint to check that your CSDL is valid. This step is optional.
  2. Compile your CSDL code using the /pylon/compile endpoint. This endpoint returns a hash for the CSDL code.
  3. Pass the hash to the /pylon/start endpoint to kick off a recording. The endpoint returns a recording id.

The recording id is important; we'll need it in the next step.

Step 2: Add UK

Next you can add a new argument to expand your coverage to the UK. Since you're going to be using a list of countries you'll need to change our operator from == to in. The new CSDL code for our interaction filter is:

fb.all.content contains "mybrand" and in "United States, United Kingdom"

To replace the old CSDL with a new CSDL filter:

  1. Hit the /pylon/validate endpoint to check that your new CSDL is valid. This step is optional.
  2. Compile your new code using the /pylon/compile endpoint. This endpoint returns a new hash for the new CSDL code.
  3. Pass the new hash and the id of the recording that you want to update to the /pylon/update endpoint. This swaps your interaction filter from the old CSDL code to the new code.

If the recording is currently running the change takes place within a few seconds. If it the recording has stopped for any reason the new CSDL code is applied to your interaction filter but you will need to hit /pylon/start to start recording your new CSDL interaction filter to your index.

Step 3: Add Mexico and Canada

Now you just need to repeat Step 2 each time you want to change your CSDL again. To add Mexico and Canada our CSDL becomes:

fb.all.content contains "mybrand" and in "United States, United Kingdom, Mexico, Canada"

Again, compile the code and then hit /pylon/update to make the replace the CSDL again.

Step 4: Add France, Germany, Italy, and Spain

For completeness I'll include the CSDL for the final step:

fb.all.content contains "mybrand" and in "United States, United Kingdom, Mexico, Canada, France, Germany, Italy, Spain"

Some issues to think about

Filter swapping is a tremendously useful technique but bear in mind that it does affect the homogeneity of your index. At the start of the recording you were looking at US data only. Suppose that you allowed that first CSDL filter to run for a day, then added "United Kingdom" as your second argument and allowed that to run for a further day. Your index now contains 48 hours worth of US data and 24 hours of UK data. Depending on the analysis you want to perform this imbalance can introduce a skew so it's important to remember the history of your filter swaps.

DataSift automatically deletes recordings after 32 days. That is, on day 33, DataSift automatically deletes the part of the recording that was made on day 1. Make sure that you perform all the analysis you want on an index within that 32-day window of opportunity. This expiration feature is beneficial in some ways because:

  • it means you don't need to worry about cleanup.
  • if you stop adding arguments to your CSDL, the entire recording will become more and more representative of the CSDL arguments in Step 4 as, day by day, old data expires from the index and new data is recorded.

In part 2 of this blog I'll look at a more ambitious, real-world use case including API calls, the JSON responses you'll receive, and ways to overcome common problems that you might run into.

Previous post: Planning Your Migration from API v1.2 to v1.3

Next post: Filter Swapping (part 2)