Keeping Tags Organised Using Namespaces

Richard Caudle | 17th December 2013

The launch of DataSift VEDO introduced new features to allow you to add structure to social data. In my last post I introduced you to tagging and explained how you can use the feature to categorise data before it reaches your application.

In this post I'll introduce you to tag namespaces, a simple and elegant way to organise tags sets. Many of our customers have built tag sets containing hundreds of tags for their project. As you use tagging more and more you'll find that namespaces are a great way to keep your tags clean and structured.

What Is A Tag Namespace?

In my last post I showed you how to declare a tag, for example:

tag "iPhone" { bitly.user.agent substr "iPhone" OR interaction.source contains "iPhone" }

This is a great start, but as our customers have increasingly adopted tagging they’ve ended up with hundreds of tags without any structure.

Say I have some tags which identify a user’s device, but alongside I have tags which identify companies, it would be great if I could break any matching tags into groups. Just like a well organised set of code, where you use namespaces to organise classes into function or a business model, you can do exactly the same with tags by using namespaces.

How To Add A Namespace...

You can add a namespace to a tag using this syntax:

tag.[namespace] "[tag name]" { // CSDL filter }

For example:

tag.device "iPhone" { interaction.source contains "iPhone" }

That’s one level of namespace, but why stop there? "iPhone" { interaction.source contains "iPhone" }

I’m sure you get the idea. For most use cases 2 or 3 levels will no doubt do the trick.

In The Real World...

So maybe I’d like to track conversations around some companies, I can (for a simple example) use their stock symbols. When I get this data in my application though it would be great if the companies were grouped for me by index.

This is really easy with tag namespaces: "Nike" { interaction.content contains_any "$NKE" } "Walt Disney" { interaction.content contains "$DIS" } "Apple" { interaction.content contains_any "$AAPL" } "Google" { interaction.content contains_any "$GOOG" }

tag.device "Android" { interaction.source contains "Android" }
tag.device "Apple" { interaction.source contains_any "iPhone, iPad" }

If somebody sends a tweet from their iPhone about Nike, this is what I'll receive in my app:

  "interaction": {
    "content": "$NKE Raises Quarterly Dividend 14%",
    "source": "Twitter for iPhone",
    "tag_tree": {
      "dowjones": {
        "company": ["Nike"]
      "device": ["Apple"]

When I receive this data it’s nice and easy to look into the tag tree and split my data into buckets, or run logic as necessary.

Tag_tree vs Tags

If you used tags before you’ll notice that instead of the tags being output in the ‘tags’ property of interaction, they now appear in the ‘tag_tree’ property. This allows us to keep backward compatibility for existing customers using tags without namespaces.

See our docs for a full explanation of this change.

And There’s More!

So we’ve covered a quick example, but of course you can take things much further. You can use namespaces to build rich deep taxonomies to cover your business model. For inspiration check out our library of tag definitions. You can import these tags into your streams right now.

For full details of the features see our technical documentation.

Next time I’ll be looking at scoring, a way to give relative numerical values to interactions. This is a great feature for modelling priorities and confidence scoring cases.

Previous post: Introducing Tags - Categorise Data To Fit Your Model

Next post: Introducing Scoring - Attach Confidence, Quality And Rank To Your Social Data