Blog

Big Data, Bigger Networking

One of the really attractive aspects of working at a startup like DataSift is managing the challenges that come with the rapidity of organic growth. Our experiences with the networking aspects of Hadoop are an excellent example of this. When we started working with Hadoop in mid 2011 we very…

Read Big Data, Bigger Networking >

A Journey into Optimizing Hadoop Jobs

At DataSift we are in the enviable position of receiving the full Twitter Firehose in real time (currently 400 million messages/day), plus many other data sources and augmentations. So far, we've been offering the ability to filter the messages and deliver the custom output stream in real time.…

Read A Journey into Optimizing Hadoop Jobs >

Language Detection v2.0

Hello, my name is Christopher Gilbert, and I am a senior member of the DataSift engineering team. Today, I am pleased to announce the release of a major revision of the language augmentation service, Language Detector v2.0, which provides improved accuracy and increases the number of languages that…

Read Language Detection v2.0 >

Newcomer's Guide to DataSift's Streaming API

A couple of days ago, a Data Scientist friend asked me how to stream data through DataSift's APIs. He'd been recording his streams and then analyzing the JSON or CSV output. Now, he wants to go to the next level. In other words, he's already familiar with creating streams and running them. Here are…

Read Newcomer's Guide to DataSift's Streaming API >

HubFlow - GitHub and the GitFlow Model Together

We really like GitHub for hosting our Git repositories, and we've found Vincent Driessen's GitFlow very useful for organizing how we work inside each of our Git repositories. But it could be clearer and easier how to use the two together, especially if you're new to Git too. To help, we've adapted…

Read HubFlow - GitHub and the GitFlow Model Together >