Introducing Machine Learned Classifiers To Inspire Your Next Solution

Richard Caudle | 20th January 2014

The launch of DataSift VEDO introduced new features to allow you to add structure to social data. Alongside we introduced the DataSift library to help you build solutions faster and learn quicker.

Today we continue this theme by adding further items to the library. These include examples of machine learned classifiers which are sure to whet your appetite and get your creative juices flowing.

Machine Learned Classifiers

Since we announced VEDO there's been a lot of buzz around the possibilities of machine learning. Look out for a blog post coming very soon for an in-depth look.

We've introduced the following classifiers to the library to give you a taste of just what's possible:

  • Customer Service Routing - Many organisations employ staff to read customer service tweets and route them to the correct team. This classifier is trained specifially for airline customer services and shows how you could automate this process and save staffing costs.
  • Product Purchase Stage - Knowing at what stage a customer is from initially assessing a product, through to ownership is incredibly powerful. This classifier demonstrates the concept and has been trained for PS4 discussion.
  • People vs Organizations - In many use cases you will want to distinguish between content created by organisations and individuals. This generic classifier allows you to do just that at scale.

These classifiers have been created by our Data Science team. They take a large sample of interactions from the platform, manually classify the interactions and use machine learning to learn key signals, which dictate which category interactions should belong to. The result is a set of scoring rules that form the classifier. The resulting classifier can be run against live or historic data ongoing.

You can try out any of the classifiers now by creating a stream from the example code at the bottom of the library item page. For more details see my previous post.

Geo-Based Classifiers

Knowing a user's location can be extremely valuable for many use cases, yet location as a field can be very tricky to normalise.

As an example of how VEDO can help you with this process, we've introduced the following classifiers, which normalise geo-location information:

  • Major Airports - Categorises tweets made in and around major airports
  • NBA Arenas - Categorises tweets made in and around NBA venues
  • NFL Stadiums - Categorises tweets made in and around NFL stadia.

Outside of game days you'll see little traffic around sporting venues, but try running these on a match day to see the power of these definitions!

Improved Classifiers

Alongside introducing new classifiers and increasing the library's breadth, we've also worked hard on improving further two existing classifiers. We think you'll find these two extremely useful in your solutions:

  • Professions & Roles - We've restructured the taxonomy to professional function based on the LinkedIn hierarchy.
  • Twitter Source - This classifier has also been restructured to bucket applications into useful categories, including whether content has been manually created (say by a user on their mobile phone) or by an automated service.

Even More To Follow

We're not stopping here. Expect to see more and more items being added to the library, covering a wider range of use cases and industries. Keep an eye out for new items and please watch this blog for further news.

To stay in touch with all the latest developer news please subscribe to our RSS feed at http://dev.datasift.com/blog/feed


Previous post: Build Better Social Solutions Faster with the DataSift Library

Next post: Using Library Classifiers To Analyse A Product Launch