Blog posts in Communication

Gerrit Schultz - Internship at DataSift

Gerrit Schultz describes the time he recently spent from August to November as a intern in the Development group at DataSift.    I'm very happy that as part of my university studies I'm now having the chance to work as an intern with DataSift. It's certainly been a brilliant experience. From the first day I've been involved in the regular development process. After only a few days I could see my first work results live in production. I had chosen to join the front-end...

Read Gerrit Schultz - Internship at DataSiftimageimage

Regular Expressions

Introduction You've probably written streams that use CSDL's native operators such as contains and any. You might not have tried our embedded regular expression (regex) engine yet. If you already know how to write a regex, just read our regular expression page, take a look at the escaping guidelines, check out our regex_partial and regex_exact keywords, and you'll be ready to write your first regex stream. If you haven't used a regex before, read on...  ...

Read Regular Expressionsimageimage

High Scalability

DataSift is the subject of the latest post on the High Scalability blog which includes a detailed overview of the platform architecture and the problems involved in meaningfully filtering unstructured data from the Twitter API  in real time.   ‘You have to be able to reliably consume it, normalize it, merge it with other data, apply functions on it, store it, query it, distribute it, and oh yah, monetize it. Most of that in realish-time. And if you are trying to create a...

Read High Scalabilityimageimage