Build Reusable Tagging And Scoring Rules To Use Across Your Projects

Richard Caudle | 19th December 2013

The launch of DataSift VEDO introduced new features to allow you to add structure to social data. In my last few posts I introduced you to tags, tag namespaces and scoring and explained how you can use these features to classify data before it reaches your application.

In this post I will show you how once you’ve spent time building these rules, you can reuse them across many projects, getting maximum value for your hard work.

Creating A Reusable Tag Definition

On our platform a ‘tag definition’ is a stream you define which contains only tag rules, and no return statement.

To be clear, until now you may have used tags and a filter (as a return statement) together in one stream for example: "iPhone" { interaction.source contains "iPhone" } "Android" { interaction.source contains "Android" }

return {
  interaction.content contains_any "Apple,Google,Microsoft"

You can break out your tags into a reusable tag definition by saving just the tags in a new stream without the return statement: "iPhone" { interaction.source contains "iPhone" } "Android" { interaction.source contains "Android" }

When you save the stream (or compile using the API) you will be given a hash which represents this definition.

Using The Definition In A Stream

Now that we have a hash for the tag definition, we can make use of it in another stream using the tags keyword.

tags "b18638ed3c5eaee2929dccb11d721579" // Hash for your stream

return {
  interaction.content contains_any "Apple,Google,Microsoft"

This will import the tag rules just as if they were written in place in the same stream. The tags will be applied as appropriate to interactions that match the filter in the return statement.

Using this simple but powerful feature you can create a library of valuable tag & scoring rules for your business model and reuse the same definitions across any number of streams.

Applying A Namespace

This is already a great feature but things get even better when we throw in a namespace. Imagine you have a great set of tag rules you like to reuse often in your streams, you might want to organise your namespaces differently depending on the exact stream.

When you import a set of tags you can wrap them within a namespace:

// Import tags into top-level user namespace
tags.user "b18638ed3c5eaee2929dccb11d721579"

return {
  interaction.content contains_any "Apple,Google,Microsoft"

Now the tags will be applied as before, but they will sit within a top-level namespace of user.

In fact, in practice I’ve found that this helps keep tag rules much more concise. You can declare tag rules with a shallow namespace, but when you import them you can wrap them in namespaces to build a very rich taxonomy.

Something To Remember

It’s important to note that the hash for your tag definition will change if you update the tag definition itself. If you’ve used the stream keyword before you’ll be familiar with this concept.

This makes sense when you consume streams via the API, allowing you to make changes to definitions on the fly and switching to new definitions when suitable for your application.

You just need to remember that if you make a change to a reusable tag definition, make sure you take the new hash and update the streams which import the definition.

Let’s Not Stop There Though…

Reusable tag definitions are super powerful because they allow you to build up a library of rules which you can use across projects and throughout your organisation.

For example, you could build the following reusable definitions:

  • A spam classifier tailored perfectly to your use case
  • A rich taxonomy to exactly fit your business model or industry
  • An influencer model to use across many projects

To give you a head start we’ve also released our own library of reusable definitions. In minutes you can benefit from our hard work!

For full details of the all the features see our technical documentation.

This post concludes my series on our new tagging and scoring features. Don’t go away though as there are many more features I’ll cover in the coming weeks, and I’ll also take you through some much richer real-world examples.

Previous post: Introducing Scoring - Attach Confidence, Quality And Rank To Your Social Data

Next post: Build Better Social Solutions Faster with the DataSift Library