CSDL Notes

Using != and NOT

CSDL offers two different methods of negation. One uses the != operator:

twitter.source != "web"

The other uses the NOT logical operator:

NOT twitter.source == "web"

At first glance, these appear to perform identically. However, there is one important difference. The twitter.source target is not always populated and we need to consider what happens in DataSift's filtering engine if this target is missing. It's easy to check which targets are always populated and which ones might sometimes be unpopulated. Just go to the documentation for the target you're using and look in the top right corner of the page:

Inside the filtering engine, an operator returns a value of True when it finds a match or False otherwise. If, overall, the result is True for an interaction, we deliver that interaction to you. Suppose, for a particular Tweet, twitter.source exists and contains the value "iPhone". In this case:

This filter: Returns this value internally:
twitter.source != "web" True
twitter.source == "web" False
NOT twitter.source == "web" True

Now, let's look at what happens when twitter.source is unpopulated:

This filter: Returns this value internally:
twitter.source != "web" False
twitter.source == "web" False
NOT twitter.source == "web" True

Since twitter.source does not exist in this interaction, the != operator always returns a False value. Clearly, the two filters are different. So, for targets that might sometimes be unpopulated, it's best to filter using the != operator.

This behavior is not restricted to the != operator, of course. You would see the same problem with the contains operator, for example. The twitter.user.description target holds a user's 160-character biography but it can be blank. The following filter appears to match every Tweet from a user who does not include the word "data" in their bio but, in fact, it also matches Tweets from users who leave their bio blank:

NOT twitter.user.description contains "data"