The DataSift Glossary
The DataSift Glossary contains terms and vocabulary used frequently to talk about features and aspects of our platform. Following each definition are links to related articles in our help center for further exploration of each term.
API key - A unique identifier for your DataSift account. To access our platform via its API, you need to have a DataSift account and an API key.
Client library - Code supplied by DataSift or by a third party that you can use to access our API easily. You can write programs that hit the API directly (in fact, that exactly what the client libraries do) but it's easier to write code that communicates with DataSift's API via a client library.
CSDL - Curated Stream Definition Language - The language used to define streams, that are curated by the DataSift engine.
Filter - Code written in CSDL defining a DataSift stream. A filter consists of one or more predicates. Each predicate consists of a Target, an Operator and (nearly always) an optional Argument. The Predicate returns a Boolean value. A filter can consist of just one predicate such as:
However, it usually comprises multiple predicates linked together with logical operators.
Input Object - The data source used to execute a Stream Definition, for example a Tweet from Twitter.
Logical Operator - The logical operators are used to determine the result of a stream definition. The three supported logical operators are AND, OR and NOT. Using these in combination, it is easy to fashion other operators; for example: you can implement NAND using AND and NOT together.
Nested Stream Definition - A nested stream definition is where another stream definition is executed against the input object and the Boolean result of the nested stream definition is used within the stream.
Operator - A keyword that appears in an CSDL filter. The simplest filters consist of a target, an operator and (nearly always) an optional argument. An operators compares an argument you supply against a target you select. For example, the CSDL operator in this filter is "contains" and it filters for occurances of "iPad" in the body of a Tweet:
Post - Any piece of content posted on a social-media site. For example, it might be a Tweet, photograph, video, Facebook status update, blog post, message board post, wiki entriy, or it might be a brand new data source added just today.
Private Stream - A stream which is not searchable and can only be viewed by its owner.
Public Stream - A searchable stream that can be viewed by any DataSift user.
REST API - A REpresentational State Transfer API, a commonly found architecture for implementing client-server communication. In DataSift, you will use REST API calls to validate and compile CSDL code, request information concerning your account (for example, your credit balance), and test a stream before you put it into production. The /stream endpoint (which you can use when you are testing a stream) in the REST API is buffered.
Stream Definition - The CSDL code that defines the way you want DataSift to filter. Here's an example of a stream definition written in our CSDL programming language to filter for Spanish-language Tweets that mention Apple:
Stream Output - The collection of input objects that have matched the stream definition. These may be accessed through the DataSift site preview or using the API.
Streaming API - The streaming API is a real-time, high-throughput way to integrate your own software with DataSift in a production environment. For example, if you need to filter for mentions of a breaking news story, your CSDL code will run on DataSift's platform and your output will flow from the streaming API into your own systems. The streaming API is not buffered.
Tagging - A feature that allows developers to add metadata tags to the streams. The metadata tag provide additional information to further processing of the output data.
Target - An element that you can filter on using the CSDL language. The simplest filters consist of a target, an operator and (nearly always) an optional argument. For example, the CSDL operator in this filter is "twitter.text":
Type - A target has a single type. It can be "string", "int", "float", or "geo" or an array of one of these types.
Version - The history of changes made to a stream definition. Each save of the stream definition creates a new version.