Ed Stenson | 9th March 2016
In this two-part blog I'm going to look at use cases involving a feature of PYLON called filter swapping which allows you to change the CSDL code of an interaction filter without stopping your recording.
Use Case: Countries
Suppose you want to monitor a brand in your own country initially and then gradually expand your geographical coverage to include other countries. Your plan might look like this:
- Start with US only
- Add UK
- Add Mexico and Canada
- Add France, Germany, Italy, and Spain
You can already perform this kind of filtering in PYLON but you would need to start a new recording at each step. With filter swapping you can perform the entire study with just one interaction filter which you can leave running perpetually.
Step 1: Create the initial US-only interaction filter
It's easy to write a filter in CSDL to record data to an index. It can be as simple as this:
fb.all.content contains "mybrand" and fb.author.country == "United States"
This interaction filter finds stories about "mybrand" from authors who choose a Current City in the United States in their Facebook profile. It will also find engagements (likes, comments, shares) on those stories where the author of the engagement is in the United States.
We have a complete step-by-step guide for the PYLON API but the steps you need to get started are:
- Hit the /pylon/validate endpoint to check that your CSDL is valid. This step is optional.
- Compile your CSDL code using the /pylon/compile endpoint. This endpoint returns a hash for the CSDL code.
- Pass the hash to the /pylon/start endpoint to kick off a recording. The endpoint returns a recording id.
The recording id is important; we'll need it in the next step.
Step 2: Add UK
Next you can add a new argument to expand your coverage to the UK. Since you're going to be using a list of countries you'll need to change our operator from == to in. The new CSDL code for our interaction filter is:
fb.all.content contains "mybrand" and fb.author.country in "United States, United Kingdom"
To replace the old CSDL with a new CSDL filter:
- Hit the /pylon/validate endpoint to check that your new CSDL is valid. This step is optional.
- Compile your new code using the /pylon/compile endpoint. This endpoint returns a new hash for the new CSDL code.
- Pass the new hash and the id of the recording that you want to update to the /pylon/update endpoint. This swaps your interaction filter from the old CSDL code to the new code.
If the recording is currently running the change takes place within a few seconds. If it the recording has stopped for any reason the new CSDL code is applied to your interaction filter but you will need to hit /pylon/start to start recording your new CSDL interaction filter to your index.
Step 3: Add Mexico and Canada
Now you just need to repeat Step 2 each time you want to change your CSDL again. To add Mexico and Canada our CSDL becomes:
fb.all.content contains "mybrand" and fb.author.country in "United States, United Kingdom, Mexico, Canada"
Again, compile the code and then hit /pylon/update to make the replace the CSDL again.
Step 4: Add France, Germany, Italy, and Spain
For completeness I'll include the CSDL for the final step:
fb.all.content contains "mybrand" and fb.author.country in "United States, United Kingdom, Mexico, Canada, France, Germany, Italy, Spain"
Some issues to think about
Filter swapping is a tremendously useful technique but bear in mind that it does affect the homogeneity of your index. At the start of the recording you were looking at US data only. Suppose that you allowed that first CSDL filter to run for a day, then added "United Kingdom" as your second argument and allowed that to run for a further day. Your index now contains 48 hours worth of US data and 24 hours of UK data. Depending on the analysis you want to perform this imbalance can introduce a skew so it's important to remember the history of your filter swaps.
DataSift automatically deletes recordings after 32 days. That is, on day 33, DataSift automatically deletes the part of the recording that was made on day 1. Make sure that you perform all the analysis you want on an index within that 32-day window of opportunity. This expiration feature is beneficial in some ways because:
- it means you don't need to worry about cleanup.
- if you stop adding arguments to your CSDL, the entire recording will become more and more representative of the CSDL arguments in Step 4 as, day by day, old data expires from the index and new data is recorded.
In part 2 of this blog I'll look at a more ambitious, real-world use case including API calls, the JSON responses you'll receive, and ways to overcome common problems that you might run into.