Richard Caudle's picture

Keeping Tags Organised Using Namespaces

The launch of DataSift VEDO introduced new features to allow you to add structure to social data. In my last post I introduced you to tagging and explained how you can use the feature to categorise data before it reaches your application.

In this post I'll introduce you to tag namespaces, a simple and elegant way to organise tags sets. Many of our customers have built tag sets containing hundreds of tags for their project. As you use tagging more and more you'll find that namespaces are a great way to keep your tags clean and structured.

What Is A Tag Namespace?

In my last post I showed you how to declare a tag, for example:

tag "iPhone" { bitly.user.agent substr "iPhone" OR interaction.source contains "iPhone" }

This is a great start, but as our customers have increasingly adopted tagging they’ve ended up with hundreds of tags without any structure.

Say I have some tags which identify a user’s device, but alongside I have tags which identify companies, it would be great if I could break any matching tags into groups. Just like a well organised set of code, where you use namespaces to organise classes into function or a business model, you can do exactly the same with tags by using namespaces.

How To Add A Namespace...

You can add a namespace to a tag using this syntax:

tag.[namespace] "[tag name]" { // CSDL filter }

For example:

tag.device "iPhone" { interaction.source contains "iPhone" }

That’s one level of namespace, but why stop there? "iPhone" { interaction.source contains "iPhone" }

I’m sure you get the idea. For most use cases 2 or 3 levels will no doubt do the trick.

In The Real World...

So maybe I’d like to track conversations around some companies, I can (for a simple example) use their stock symbols. When I get this data in my application though it would be great if the companies were grouped for me by index.

This is really easy with tag namespaces:

If somebody sends a tweet from their iPhone about Nike, this is what I'll receive in my app:

When I receive this data it’s nice and easy to look into the tag tree and split my data into buckets, or run logic as necessary.

Tag_tree vs Tags

If you used tags before you’ll notice that instead of the tags being output in the ‘tags’ property of interaction, they now appear in the ‘tag_tree’ property. This allows us to keep backward compatibility for existing customers using tags without namespaces.

See our docs for a full explanation of this change.

And There’s More!

So we’ve covered a quick example, but of course you can take things much further. You can use namespaces to build rich deep taxonomies to cover your business model. For inspiration check out our library of tag definitions. You can import these tags into your streams right now.

For full details of the features see our technical documentation.

Next time I’ll be looking at scoring, a way to give relative numerical values to interactions. This is a great feature for modelling priorities and confidence scoring cases.

If you’re new to DataSift, what’s stopping you? Register now and feast from the world of social data.

Richard Caudle's picture

Introducing Tags - Categorise Data To Fit Your Model

The launch of DataSift VEDO introduced new features to allow you add structure to social data. These new features allow you to add custom metadata to social interactions, saving you post-processing work in your application.

In this post I’ll introduce you to tagging, and explain why this will make working with social data a whole load easier. Tagging allows you to categorise data to match your business model. Keep watching this space as over the next few posts I’ll cover all of the new features in detail.

What Are Tags?

Tags are a simple but powerful way to add custom metadata to social interactions before they are delivered to your application. Once the platform has filtered your sources of social data using your filter, you can use the same language (CSDL) to add tags and classify interactions so saving you post-processing effort.

A Quick Example - Categorising User Devices

Device identification is incredibly useful when you’re trying to analyse audiences and how they interact. Using CSDL I can identify the device used to create the content and tag that interaction appropriately.

Let’s look at two interactions, the first from Twitter and the second from

For a Twitter interaction the interaction.source target tells us which application was used to post the content. Whereas for interactions the bitly.user.agent (the user-agent string) gives us a detailed profile of the browser or device used to post the link.

As different sources provide context information in a variety of formats and in different structures, writing application code to process this data is time consuming. By using tags we can simplify this task hugely and use the full power of CSDL to carry out this work.


I can use the tag keyword to add tags to my data above. Any interaction that matches the CSDL in the brackets will be given the declared tag. The syntax for declaring a tag is:

tag "[tag name]" { // CSDL to match interactions }


Carrying on my example I'll create three tags to apply to my data:

In this definition I’m tagging interactions based on the user-agent and source properties. (Including both iOS and iPhone might seem strange, but this demonstrates that you can add multiple tags to an interaction!)


You’ll notice I’ve used the substr operator to inspect the user-agent field as often these strings are stripped of white space. 

bitly.user.agent substr "Blackberry"

Will match the following:

BlackBerry9700/ Profile/MIDP-2.1 Configuration/CLDC-1.1 VendorID/144
BlackBerry8520/ Profile/MIDP-2.0 Configuration/CLDC-1.1 VendorID/121


Whereas for the source property contains works perfectly because these values have a cleaner format.

interaction.source contains "iPhone"

Will match:

Twitter for iPhone
UberSocial for iPhone


When the sample interactions pass through my definition the result will be:

The first interaction has been given two tags because ‘iPhone’ is included in both the iPhone and iOS tags. Whereas the second interaction only matches the Blackberry tag.

When this data arrives at my application it is decorated with clean metadata. I can inspect the tags array and easily apply business rules rather than have to perform text processing.

Of course, I could extend my definition to cover many more data sources and devices, but regardless of the complexity CSDL gives us the power to classify interactions and deliver structured data to applications.

And That’s Just The Start!

My example covered just one scenario where tagging can be extremely effective and efficient.
Our latest release takes tagging to the next level allowing you to tag and numerically score interactions, and to build reusable tag taxonomies fit for complex use cases. I’ll be explaining these new features in detail in my next few posts, so watch this space. 

For inspiration check out our library of tag definitions. You can import these tags into your streams right now. And for full details of the features see our technical documentation.

If you’re new to DataSift, what’s stopping you? Register now and experience the power of our platform for yourself!!

Richard Caudle's picture

Announcing DataSift VEDO - Giving Structure To Social Data

Today we announced the arrival of DataSift VEDO. In this post I’ll outline what this means to you as a developer or analyst.

DataSift VEDO gives you a robust solution to add structure to social data, solving one of the common challenges when working with unstructured ‘big data’. VEDO lets you define rules to classify data so that it fits your business model. The data delivered to your application needs less post-processing and is much easier to work with. The new features will save you time and give you a load more possibilities for your social data.

Data Is Meaningless Without Structure

When working with big data such as social content, one challenge you will always need to tackle is giving unstructured data meaningful structure. If you’re working with our platform currently, you will no doubt be extracting data to your server and running post-processing rules to organise the data to meet your needs.

Processing unstructured data is expensive and not much fun, but it’s where we excel. VEDO lets you offload processing onto our platform. You can now use CSDL (the same language you use for filtering) to add custom metadata labels and scores to data specifically for your use case.

Introducing Tagging And Scoring

VEDO introduces new features which let you attach this metadata, these are tagging and scoring.

Tagging allows you to categorize interactions to match your business model. Any interaction that matches a tagging rule will be given the appropriate text label, serving as a boolean flag to indicate whether an interaction belongs to a category.

Scoring builds on tagging allowing you to attach numerical values to interactions rather than just labels. Scoring allows you to build up a score over many rules, and allows you to model subtle concepts such as priority, intention and weighting.

As you begin to use tagging and scoring more and more, you will want to be able to organise your growing set of rules. To help we have also introduced tag namespaces and reusable tag definitions. Tag namespaces allow you to define taxonomies of tags. You can group tags at any number of levels in namespaces and build deep schemas to fully reflect your model. Reusable tag definitions allow you to perfect your rules and reuse them across any number of streams and projects.

Definition Library

Tagging and scoring are powerful features, but at this point you might not have grasped exactly how they can help you. Therefore alongside the tagging features we’ve also introduced a library of definitions to get you started. Some definitions you can use immediately in your streams (and benefit from our experience), and some serve as example definitions to show you what is now possible.

For example, we have definitions that help you score content for quality (such as how likely is the content a job advert?) and make it easier to exclude spam. On the other hand we have an example definition that shows how you can use the new features to classify conversations for customer service teams, picking out rants, raves and enquiries.

You can view the library here.

There’s More...

Although tagging is the main theme of the new release, there is an awful lot more happening here at DataSift. Alongside the release of VEDO we’re giving you more power, more connectivity and a wider range of sources to play with.

For instance we’ve just introduced delivery destinations for MySQL and PostgreSQL. These new destinations allow you to map your filtered data directly to a tabular schema and have it pushed directly into your database.

We’re also in the process of bringing many more sources onboard (you may have seen our recent announcements!), including many asian social networks.

Look out for improvements to help you work with a wider variety of languages, updates to our developer tools and client libraries, and much much more. I’ll cover these all soon.

Watch this space

In summary there’s far too much to cover in detail here. So watch this space, as over the coming weeks I’ll cover every feature of the new release in depth, with worked examples and sample code so you can take advantage of all these new powers for yourself.

If you can’t wait, all of these new features are fully documented in our Documentation area. Again, check out the new library for inspiration.

If you’re new to DataSift, what’s stopping you? Register now and experience the power of our platform for yourself!!

Jason's picture

Deprecating Historics "volume_info" Output Field

On December 2nd, 2013, we plan to remove the "volume_info" field from the DataSift Historics API call response. Please ensure that your application does not expect to receive this field from Historics API calls by this date.

If you are using one of the official DataSift API client libraries, support for this has already been implemented in the following versions of the libraries:

  • Java - 2.2.1+
  • Python - 0.5.4+
  • Ruby - 2.0.3+
  • PHP - 2.1.4+
  • .NET - 0.5.0+
Jason's picture

New delivered_at meta field for Push

DataSift is adding a new metadata field to each JSON object delivered via Push in the json_meta output format - a delivered_at timestamp. This new timestamp represents the time DataSift delivered this particular object. An example of a json_meta formatted Push delivery containing this new field can be seen below:
{"count":3, "hash":"4ede6111534c5e29145f", "hash_type":"historic", "id":"58802d124916ed826a08d58d791f85c5", "delivered_at":"Tue, 08 Oct 2013 09:53:33 +0000" "interactions":[{...
Please ensure your application is capable of accepting new output fields to prevent this change from interrupting your data delivery. This change is due to be released on Monday, October 14th, 2013.


Subscribe to Datasift Documentation Blog