Storage and Access of Filter Definitions

At the core of the DataSift platform is the filtering engine which uses our Curated Stream Definition Language (CSDL) to filter across high volumes of real-time and historic data. Each filter definition that you create is stored within the DataSift platform on our own private servers, and the purpose of this document is to explain the process that DataSift uses to define, store, and access CSDL filter definitions. We pay particular reference to security and intellectual property considerations.

Filter Creation and Hash Identifier

When you create a CSDL filter via our API or our User Interface, we store the definition of that filter (the CSDL code) within the DataSift platform. We generate a unique identifier known as the “stream hash” or "hash identifier". An example may look like this:


If you create the filter via our API, we send this hash directly back to you. If you create it via the User Interface, the hash is available there, immediately after you save your code. We generate the hash using a one-way algorithm and we use the hash both internally and externally to uniquely identify CSDL filter definitions. We do not expose the mapping between the hash and the filter definition outside of the platform.

Given that the hashing method for generating the identifier is a one-way process, it is impossible to derive the filter definition from the hash. As a result, if you forget your CSDL definition, it is impossible for you to retrieve it from DataSift. The hash can still be used to filter data, but its associated definition cannot be discovered from the hash alone.

Stream Authentication

We control access to each filter definition in DataSift. We authenticate all requests to consume a stream and we log requests using a combination of a unique username and API key. DataSift supports the ability to change both the username and API key via our user interface.