Getting Started

Updated on Wednesday, 2 April, 2014 - 18:14

DataSift is the complete real-time social media solution for enterprises looking for comprehensive data that delivers actionable insight.

DataSift is Big Data. We have 100% of the Twitter Firehose always available in real time.

Our media curation platform is cloud-based and highly scalable. It delivers a variety of metadata including enriched augmentations such as geo location, social influence, and sentiment analysis in one place, so you will never miss a thing. Delivered through our APIs or through our extensive analytics partnerships, our flexible pricing means that whether you're a large enterprise or a single developer you can get access to the data that you need.

Aggregating, Filtering, and Analyzing

DataSift filters for information as it is posted. For example, you could filter for:

  • any mention of an individual.
  • any message from a particular social media site.
  • any message sent within 25 miles of the 10 largest U.S. cities.

You can aggregate data to monitor streams of messages from more than one social media site simultaneously, or you can exclude individual sites.

We augment our real-time streams with third-party solutions such as Klout.

Here's a full list of the information your filters can target.

Historics

You can also filter against the DataSift Historics archive, a large body of content gathered from a variety of social media sites. Historics is useful when you want to turn the clock back and filter against data from the past.

It uses the same filtering language that you use when you're looking at live data. For Twitter, it works 100 times faster than live streaming and it offers 100% coverage.

REST API

The REST API enables developers to access DataSift's core functionality. Using DataSift's simple, powerful programming language, you can access the social media sites you want to monitor. The REST API provides easy ways to test and compile your code. You can run one or more streams at the same time and export data in real time.

Push API

Push is a simple and robust mechanism for periodically delivering your data directly to a Data Destination. With the Push mechanism, we buffer up to 60 minutes worth of data for you. Periodically, we connect to your Data Destination to deliver accumulated data in a short burst. You control the time interval between these bursts.

Push removes the need for you to maintain an online connection with us over long periods of time, it can deliver higher throughput streams more reliably, and it adapts to a range of widely supported, standard, stable protocols, such as HTTP POST, FTP, and Amazon AWS DynamoDB.

Streaming API

The Streaming API offers data from all our sources in near real time. This API is for those developers building applications that do the heavy lifting, capturing continuous streams of data with no defined end and making sure that nothing is missed. If you're building an application for a major data-mining task, the Streaming API is the place to start. It also allows you to run multiple streams at the same time, filtering for two or more different sets of information simultaneously