The DataSift Glossary

The DataSift Glossary contains terms and vocabulary used frequently to talk about features and aspects of our platform. Following each definition are links to related articles in our help center for further exploration of each term.  

 

A - B - C - D - E - F - G - H - - J - K - L - - N - O -  P - Q -  R -  S -  T -  U - V - W–Z

 

API - Application programming interface. If you write external code that works alongside DataSift, it will communicate with DataSift via calls to our REST API or Streaming API.

 

API key - A unique identifier for your DataSift account. To access our platform via its API, you need to have a DataSift account and an API key.

 

Argument - A user-defined value that is used by an operators and tested against a target. For example, the argument in this filter is "iPad":

 

    twitter.text contains "iPad"

 

Client library - Code supplied by DataSift or by a third party that you can use to access our API easily. You can write programs that hit the API directly (in fact, that exactly what the client libraries do) but it's easier to write code that communicates with DataSift's API via a client library.

 

Connector - A means of connecting to a Data Destination when you are using our Push service for data delivery.

 

CSDL - Curated Stream Definition Language - The language used to define streams in DataSift.

 

Filter - Code written in CSDL defining a DataSift stream. A filter consists of one or more predicates. Each predicate consists of a Target, an Operator and (nearly always) an optional Argument. The Predicate returns a Boolean value. A filter can consist of just one predicate such as:

 

    twitter.text contains "LadyGaga"

 

However, it usually comprises multiple predicates linked together with logical operators.

 

Hash - A 32-character alphanumeric code that identifies a DataSift stream or a 20-character code that identifies a DataSift historic stream. It is not case sensitive.

 

Historics - The DataSift Historics service gives you access to an archive of content gathered from a variety of social media sites. Historics is useful when you want to turn the clock back and filter against data from the past.

 

Interaction - A single object such as a Tweet, a Facebook message, or a Wikipedia edit, passing through DataSift. Since our platform normalizes data when we receive it, all interactions, regardless of their source, are handled in the same way inside DataSift. For instance, this filter delivers all interactions that contain the word "data" no matter which source they come from:

 

    interaction.content contains "data"

 

Logical Operator - The logical operators are used to determine the result of a stream definition. The three supported logical operators are AND, OR and NOT. Using these in combination, it is easy to fashion other operators; for example: you can implement NAND using AND and NOT together. 

 

Nested Stream Definition - A nested stream definition is where another stream definition is executed against the input object and the Boolean result of the nested stream definition is used within the stream.

 

Operator - A keyword that appears in an CSDL filter. The simplest filters consist of a target, an operator and (nearly always) an optional argument. An operators compares an argument you supply against a target you select. For example, the CSDL operator in this filter is "contains" and it filters for occurances of "iPad" in the body of a Tweet:

 

    twitter.text contains "iPad"

 

Post - Any piece of content posted on a social-media site. For example, it might be a Tweet, photograph, video, Facebook status update, blog post, message board post, wiki entriy, or it might be a brand new data source added just today.

 

Stream - The data delivered by a DataSift filter.

 

Push - Push is a simple and robust mechanism for periodically delivering your data directly to a Data Destination such as Amazon AWS S3, and FTP server, or Google Big Query.

 

REST API - A REpresentational State Transfer API, a commonly found architecture for implementing client-server communication. In DataSift, you will use REST API calls to validate and compile CSDL code, request information concerning your account (for example, your credit balance), and test a stream before you put it into production. The /stream endpoint (which you can use when you are testing a stream) in the REST API is buffered. 

 

Stream - A flow of curated real-time data from a collection of real-time sources. A stream is built using CSDL commands, either in the DataSift UI or via the /compile endpoint in the REST API.

 

Stream Definition - The CSDL code that defines the way you want DataSift to filter. Here's an example of a stream definition written in our CSDL programming language to filter for Spanish-language Tweets that mention Apple:

 

    twitter.text contains "Apple" and language.tag == "es"

 

Streaming API - The streaming API is a real-time, high-throughput way to integrate your own software with DataSift in a production environment. For example, if you need to filter for mentions of a breaking news story, your CSDL code will run on DataSift's platform and your output will flow from the streaming API into your own systems. The streaming API is not buffered.

 

Tagging - A feature that allows developers to add metadata tags to stream definitions. The metadata tag provide additional information to further processing of the output data.

 

Target - An element that you can filter on using the CSDL language. The simplest filters consist of a target, an operator and (nearly always) an optional argument. For example, the CSDL operator in this filter is "twitter.text":

 

    twitter.text contains "iPad"

 

Type - Every target has a single data type. The type can be "string", "int", "float", or an array of one of these types, or it can be "geo".

 

Version - The history of changes made to a stream definition. Each time you save a stream definition in the UI, you create a new version.