|What you'll learn:||How to access data sources and create your first data filter.|
Table Of Contents
DataSift is a platform that allows you to access & process social and news data sources, and integrate the results into your applications or analysis.
For a overview tour of the platform, take a look at the What is DataSift STREAM? page. Here we'll focus on creating our first filter and seeing the results that would be delivered to an application.
To complete this guide you'll need a DataSift account. This will have been setup by your account manager or a member of our sales team.
Log in to your account using your username and password.
To access data from the platform you'll need to activate at least one data source. Tumblr is a popular source, so let's start there.
Select Data Sources in the menu.
Click on the Tumblr source. Then click the Activate button to activate the source.
You'll need to complete the form and agree to the terms & conditions for the data source.
A data source gives you access to data, whereas an Augmentation adds extra value to data before you apply your filter. Here we'll activate the Links augmentation which expands shortened links in posts, and gives you access to the metadata of the destination URL.
Return to the Data Sources page by clicking on Data Sources in the top menu.
Click on the Links Augmentation, which is on page two of the list.
Now that you have enabled a Data Source, and enriched the data with an Augmentation, you'll next need to create a filter to stream data from the platform.
Click on Filters in the top menu. Click the Create a Filter button.
Enter a name and description for your new filter. Select CSDL Code Editor as your choice of editor.
Click Start Editing to continue.
Now you need to define your filter using DataSift's filtering language - CSDL. As this is a quickstart we'll keep things simple for now.
Copy and paste this code into your filter. Then click Save.
// Filter to Tumblr content in English interaction.type == "tumblr" AND language.tag == "en" AND ( // Mentions of brands in content interaction.content contains_any "Calvin Klein, GQ, Adidas" OR // Content reblogged from brand blogs tumblr.reblogged.root.url contains_any "http://calvinklein.tumblr.com/,http://gq.tumblr.com/,http://adidasoriginals.tumblr.com/" OR // Content that links to brand websites links.domain in "calvinklein.com,gq.com,adidas.com" )
In this filter we filter for only content from Tumblr that is written in English. Within this content we select posts that mention three brands, share link to the brand websites or are reblogged from brand blogs.
Notice that each condition has three parts:
- Target: The part of the data you want to inspect. For example interaction.content is the content of the post.
- Operator: The comparison to be made. For example contains_any searches the target value for any of the keywords.
- Argument: The value(s) you are searching for.
You can use logical operators such as AND & OR to combine conditions. Read more about the CSDL language in our language guide.
Also notice that the links.domain target is available because you activated the Links augmentation.
Now that you've created a filter you can preview the data it will deliver to your application.
Click Live Preview on your new filter's page.
At the bottom of the preview page click the Play button to preview your filter.
Wait a moment for data to appear.
Now click the Pause button to pause the stream. Click on one of the posts to inspect the data in depth.
Notice all of the data fields, most of these you can filter against. Click the plus sign next to each field to see more. Access as much or as little of this data for your apps and analysis.
Now that you've created a filter and previewed data using the dashboard your next step is to get started with the API.
Take a look at a quick start guide for your preferred language: