Blog

Jacek Artymiak's picture

The Query Builder

At DataSift we love open source. We use it and we create it. As part of our commitment, we're proud to announce that a major new component of the DataSift platform, the Query Builder, is now available. It's open source and you can download it from GitHub today. Take a look at our demo page to try out the Query Builder.

What is the Query Builder?

Everyone talks about Big Data, but not many people know how to handle it. We live it. We created the Query Builder to bring the advanced functionality of DataSift to business users.

We consume over a billion items per day, processing them, augmenting them with analytical data, and making them available in JSON format. The Query Builder includes a built-in dictionary that shows all 450 of the different targets that users can include in their DataSift filters, so even novices can get started right away.


 

The Query Builder is a code generator that produces SQL-like commands that users can share. It does everything via a point-and-click interface where users create queries visually. They can use the features of the Advanced Logic Editor, shown above, to build complex filters by combining simpler ones.

Responsive design and standards compliance for the post-PC era

You worked hard on your site and the last thing you want to put on it is an ugly widget that clashes with the rest of the page. Rest assured that we've put a lot of time and effort into making sure the Query Builder is standards-compliant, responsive, and ready for post-PC touch screen devices. We strive to follow the latest standards for good design, responsiveness, and programming, be they official or commonly agreed upon. In a browser, the Query Builder supports the latest versions of Internet Explorer, Firefox, Safari, Opera, and Chrome.

The Query Builder is built using standard tools and technologies (JavaScript, HTML5, JQuery and CSS). The responsive design fits a broad range of screen sizes; it's fully compatible with the iPhone, Android, and iPad as well as laptops and desktops. It even includes graphical assets for Retina-resolution displays. And since it works equally well with a mouse or a touch screen, filtering for answers in an ocean of a billion interactions is as easy as sending a Tweet from your iPhone.

Getting Started

The Query Builder project is hosted on GitHub. When you want to embed it on a web page, log into your server, change the working directory to the document root directory, and then clone the repository with a single command:

    git clone https://github.com/datasift/editor.git

Alternatively, download the project archive and unpack it to the document root directory on your web server.

In both cases, you should end up with a directory that contains a number of subdirectories. Most of the time you will only need datasift-editor/minified, unless you want to do some deep modifications of the code base and the resources. But make sure you read our configuration guides before you do that; in most cases, you only need to make small modifications to the Query Builder object initialization code. This is done by overriding the exposed configuration options.

The Query Builder produces code just like a programmer would using a code editor, but in a user-friendly way. The code is based on DataSift's Curated Stream Definition Language, CSDL, with added machine-readable comments that allow it to work with the Query Builder. This enhanced version of CSDL is known as JavaScript CSDL, or JCSDL.

This process enables users to generate and share CSDL code without knowing how to program. All that power is available without having to learn how to write a single line of code. Simply clone the Query Builder repository or upload the files that the Query Builder needs to run to your server and add eight lines of HTML code to the page where you want to embed it.

Modular and highly-customizable by nature, the Query Builder is easy to embed on a web page, blog, or inside a web view in a desktop or mobile application. You can customize it to match a variety of requirements for integration and branding.

Customizing the Query Builder

The Query Builder code you can download today from GitHub is exactly the same code we use on our website. We give you full freedom of choice when it comes to the use of our code and the approach to implementation.

The simplest form of customization you would perform might be to make the Query Builder follow the look and feel of your site. This is easily done by overriding the CSS style definitions with your own modifications. If you want to go one stage farther, you can replace the Query Builder's graphical assets with your own. The design of the CSS stylesheet is optimized to facilitate quick changes with minimal effort. When you want to add your own CSS, simply import it after the original stylesheet and all will be well.

Next, you might decide to customize the functionality and behavior of the Query Builder. You can modify the responsiveness of the interface or narrow down the choice of the data sources available to users. Changes like these do not require extensive knowledge of programming and can be implemented quickly by someone with a little knowledge of JavaScript. You can find the example of reducing functionality of the Query Builder and a working demo on our developer documentation site.

We have added built-in help in the form of tool tips so that end users of the Query Builder can learn more about DataSift's targets and operators. These are downloaded directly from our servers, so any changes will appear on your users' screens as soon as they are published, without you having to do anything unless you want to jump in and create your own tool tips.

 

We also support you, the developer. We have a whole site dedicated to the subject of embedding, styling, and configuring the Query Builder. 

Connecting to DataSift

Once your implementation of the Query Builder is fully operational, it's time to connect it to our plaform. You need to capture the JCSDL generated by your users, pass it on to DataSift, capture the results, and present them back to the user. You have full freedom to implement your own solution here as well as full freedom of user management.

This is where you can add a lot of your own creativity and value. Processing and presentation of the results is one important area where you can create your own tools and make your users happy. We have prepared a sample implementation to get you started. Read through the code, try it, see what it does, and create your own magic. And you do not have to worry about backward compatibility. If you follow our configuration procedures, upgrading your installation of the Query Builder will be as simple as unpacking an archive.

You are also free to manage your own users in any way you like. You can choose to require your users to provide their own DataSift credentials or you can use a single set of DataSift credentials for company-wide access without having to manage multilpe accounts. Or you could manage your users' accounts for them based on their internal credentials.

So, there you have it. Now go make something amazing and let the world know about it.

Ed Stenson's picture

Open Source Software at DataSift

DataSift is built on open source software. Here are some of the comments our developers have made on the subject:

 

    "It's like having a bigger team"

   "We learn from the best by reading and using their code."

   "Without open source, we wouldn't have PHP, we woudn't have Python, we wouldn't have Perl."

   "At DataSift, we're building a world-class platform, and we need to use the very best tools for the job."

 

From PHP to Hadoop, everything we do to filter over one billion items every day is built with components that the international community of developers has shared. Even our favorite data delivery format, JSON, is an open standard. It's obvious that the future lies in open source.

DataSift engineers contribute to and release a great deal of open source software. Some of the most important projects we use and contribute to include:

  • Apache Hadoop - distributed computing framework, including HDFS and MapReduce
  • D3.js - a JavaScript library to display given digital data into graphic, dynamic forms
  • Chef - configuration management tool
  • Redis - advanced key-value store
  • ZeroMQ - advanced socket library

Take a look at some of the open source projects we love and see more of the projects that DataSift's engineers are building.

 

Development

Today, we're releasing a new data tool, the visual Query Builder. It's the latest in a series of open source projects, all of which are available from DataSift's GitHub account. Here's a summary of our recent work:

Title Developer Comments
Query Builder

The Query Builder is a browser-based graphical tool that allows users to create and edit filters without needing to learn the DataSift Curated Stream Definition Language (CSDL). It started life as an internal project at DataSift where our staff quickly recognized its potential. The Query Builder is a serious tool that can be used to build complex CSDL filters without using DataSift's Code Editor.

 

 

Hubflow

Hubflow is an adaptation of GitFlow and the GitFlow tools git extension for working with GitHub.

If you look at Vincent Driessen’s original blog post, he’s listed all of the individual Git commands that you need to use to create all of the different branches in the GitFlow model. They’re all standard Git commands … and if you’re also still getting your head around Git (and still learning why it is different to centralised source control systems like Subversion, or replicated source control systems like Mercurial), it adds to what is already quite a steep learning curve.

Vincent created an extension for Git, called GitFlow, which turns most of the steps you need to do into one-line commands. At DataSift, we used it for six months, and we liked it - but we wanted it to do even more. We also wanted it to work better with GitHub, so to reduce confusion with the original GitFlow tools, we’ve decided to maintain our own fork of the GitFlow tools called HubFlow.

Arrow The Arrow dashboard is a visualization tool designed to show the full capabilities of DataSift. It's a framework that helps us to visualize and analyze DataSift's output streams. The goal was to find a way to show the huge amount of information that we filter. Arrow is open source too; in other words, we built this awesome project and we want you to play with it!

The visualizations are written using the D3 library for rendering. We currently support three types of visualizations: pie charts, line charts, and maps.

We designed Arrow to be as flexible as possible, so you can pull out the visualizations and use them in your projects, or even create visualizations of your own.

Here's a glimpse of one small part of Arrow but there's much, much more:

Dropwizard Extra

This suite of additional abstractions and utilities that extend Dropwizard. There are several modules:

Sound of Twitter

Using DataSift, this is a little application which visualizes the sentiment from Twitter with lights and sounds. You can see a demo over on YouTube or read more information on DataSift Labs.
 


 

Sublime Text CSDL plug-in

Sublime Text plugin to validate and compile DataSift CSDL, consume a sample set of interactions, and enjoy correct syntax highlighting. Do it all without leaving Sublime Text!

 

Code and documentation licensing

The majority of open source software exclusively developed by DataSift is licensed under the liberal terms of the MIT License. The documentation is generally available under the Creative Commons Attribution 3.0 Unported License. In the end, you are free to use, modify and distribute any documentation, source code or examples within our open source projects as long as you adhere to the licensing conditions present within the projects.

Note that our engineers like to hack on their own open source projects in their free time. For code provided by our engineers outside of our official repositories on GitHub, DataSift does not grant any type of license, whether express or implied, to such code.

 

Contact us

We support a variety of open source organizations and we're grateful to the open source community for their contributions. Our goal is to maintain our healthy, reciprocal relationship. If you have questions or encounter problems, please Tweet us at @DataSiftOS.

Gerrit Schultz's picture

Gerrit Schultz - Internship at DataSift

Gerrit Schultz describes the time he recently spent from August to November as a intern in the Development group at DataSift. 

 

I'm very happy that as part of my university studies I'm now having the chance to work as an intern with DataSift. It's certainly been a brilliant experience.

From the first day I've been involved in the regular development process. After only a few days I could see my first work results live in production. I had chosen to join the front-end team. The revamp of a central part of the UI including the Stream Preview and CSDL Code Editor, as well as the integration of the Query Builder as a new feature were planned for the next sprint. This meant that during the following month I could contribute to another big release and, at the same time, follow my goal of getting deeper into different JavaScript technologies. Being an active part of the team and, in a real-world scenario, working on a big code base allowed me to gather lots of hands-on experience.

An internship at DataSift is far from making coffee - unless you want some for yourself. The only time you spend in the kitchen is when you want to grab your favourite chocolate from the fridge. As an intern with DataSift you're given the chance to contribute to the development of serious software and broaden your knowledge in your own field of interest. I had the free choice of what I wanted the focus of my internship to be, and I've been given amazing support from everyone around me.

During the last few days I was now looking for some fresh input, trying out a bit of Scala. When I mentioned that I would be interested, I was immediately offered to have a few tasks of a project on filtering Facebook posts assigned to me, got an introduction to the existing code, and had a Scala book on my desk. Everything is possible as long as you are keen on trying out new stuff.

Besides that, the working atmosphere is fantastic. Sometimes you almost forget that you're in an office. Occasional foam bullet gun fights are just as much part of the work life as helicopters being manoeuvred through the room and people playing card games after enjoying some delicious catering provided by DataSift for lunch. Even events like going to the cinema are arranged from time to time.

Overall a great place to work and a very good choice for a challenging, enriching internship.

Gerrit Schultz 

 

We're always looking for good people. If you have what it takes, if you're looking for an internship or a placement year, or if you're a recent graduate, you can reach us at [email protected] Please ensure that you are eligible to work in the United Kingdom and that you approach us directly, not through a recruitment agency.

 
Lukas Klein's picture

Lukas Klein - Internship at DataSift

Lukas Klein describes the time he recently spent in August and September as a intern in the Development group at DataSift. 

 

The last month has been really exciting for me. Me, that is 19-year-old student Lukas from Germany. I decided to work for DataSift as an intern before university, and it really paid out. Before, I have been working for either small one-man businesses or big players like SAP but never in such a fast growing startup. Even though I've never been to the valley before, I think working for DataSift is much like working for a San Francisco-based startup, except it's in the center of Europe.

The company is very engineering-driven, so the decisions are taken by people who really know how the technology works, which is a huge benefit if you're a developer. When I came into the office, which is located in the Enterprise Center of the University of Reading, for the first time, everyone welcomed me warmly (until the first Nerf gun battle started, but that's another story) and showed me around.

When you work at DataSift, nobody tells you what you have to use, you can choose whatever tools you want, whether it be a Mac or a PC running Linux (if you're using Windows, DataSift is not the right place for you, I guess), vim or Sublime Text, Coke or Pepsi (the fridge is always full!). All the people in the office are highly skilled and there's an expert for everything, whatever question you have. (I'm sure there's even someone who can help you with building a nuclear Nerf gun that can shoot Curiosity off Mars).

I used my time at DataSift to dive into new technology I've had little time to use before, such as node.js, Backbone or redis. Working with the DataSift API is really straightforward and at the end of my first day I had a visualization up and running that showed the current trending articles on bbc.co.uk in growing bubbles.

The great thing at DataSift is that you can turn almost any idea into a real product. One morning when I was working on my CSDL (DataSift's own curation language) I thought it would be nice if I could do this in my favorite editor, Sublime Text. So I simply wrote a plugin for Sublime Text, put it on GitHub, and minutes later Stuart forked it and helped me to extend it. Here in the office everyone is helping each other and if you're stuck at a problem, it can only be a matter of minutes until someone comes up with a great solution.

I've really enjoyed my time in the UK and I can only advise every student who's into cutting edge technology to check them out. Conclusion: If you don't mind getting hit by several foam bullets a day and know how to assemble IKEA furniture (you have to build your own desk), you should definitely come to DataSift!

 

We're always looking for good people. If you have what it takes, if you're looking for an internship or a placement year, or if you're a recent graduate, you can reach us at [email protected] Please ensure that you are eligible to work in the United Kingdom and that you approach us directly, not through a recruitment agency.

Shruti Desai's picture

Getting Started with DataSift

DataSift offers organizations a cloud-based platform to filter for real-time social media data. Every second, social media sites generate massive amounts of data. This data can provide valuable insight to your organization. DataSift filters for content as it is posted. For instance, you could filter for the mention of an individual, a message posted on a social media site, or all messages posted within a specified location. DataSift offers you an integrated solution that filters, aggregates, and delivers the exact content that you need. This blog post aims to help you understand the various features that the DataSift platform offers.

With DataSift, you can filter for content in real time. This is achieved with the help of DataSift's own programming language, the Curated Stream Definition Language (CSDL). You use CSDL to write simple pieces of code that filter for the content you need. The code for a single filter contains a target, an operator, and an argument. A target specifies the data source from which the content will be filtered. The argument specifies what you are trying to filter for. The operator defines how a target will filter against an argument. Once you save and run the code, it is then referred to as a data stream. The data stream filters for the content you want and delivers the output data in JSON (JavaScript Object Notation) format which is lightweight and easy to read. You can store this output data in DataSift or use your own data destination. You can create a recording and export the output data received from your streams. You can also go back in time and filter for content in the past by creating a Historics query for a data stream.

Now that you are familiar with how DataSift works, let's look at the DataSift UI and learn how to get started.

The Dashboard

The DataSift platform is easy to navigate and the first step is to create a DataSift account. You can register with your email address or you can use your Twitter, Facebook, LinkedIn, Google, Foursquare. or Yahoo account.
 

          Register
 

After signing up with DataSift, log in to your DataSift account to access your Dashboard. The Dashboard is the control panel for your account. You can manage your account from here and access many of DataSift's features. The Dashboard displays your API details which are required for authentication when you use the DataSift API.
 

          Dashboard
 

You can also access Settings from the Dashboard, where you can manage your account settings, such as account details, billing details, data licenses, identities and password.
 

          Settings          


The Dashboard provides six tabs that navigate you to the different features that make up DataSift. Let's look at these features in brief.

Streams

You can create new streams or access existing streams by clicking on the Streams tab. You can create streams in the CSDL language using the Visual Query Builder or by writing CSDL code manually using the Code Editor.
 

           
 

Visual Query Builder

You don't have to be a developer to create filters for social media data streams. The Visual Query Builder allows you to construct filters for complex social media data streams without using the CSDL programming language. Simply choose a data source such as Twitter, then the relevant target field from a list of available target fields and, lastly, select or enter an argument describing what you want to filter.

 

          


You can customize the Visual Query Builder to allow users to build queries for a limited set of targets. It can also be integrated to match your organization's graphical identity scheme.

CSDL Code Editor

More advanced users such as developers, prefer to work directly in our CSDL Code Editor. To create a stream in CSDL in the Code Editor, simply enter the CSDL commands that define the content you want to filter for. When you click Save & Close, the editor validates your code and notifies you if it finds an error.

 

          
 

Once you have created a stream, you can:

  • preview the output data from your stream.
  • consume the stream via the API.
  • record the stream and export the output data.
  • create a Historics query for your stream.
  • share the CSDL code of your stream.

 

                    


Tasks

Once you have created your first streams, you can perform tasks on them. To access or monitor these tasks, click the Tasks tab. All your existing tasks are displayed on this page. You can also delete your tasks or export data from your tasks. You can perform two main tasks on your streams:

  • Create a recording of your stream by clicking the Start a Recording button.
  • Create a Historics query of your stream by clicking the New Historics query button.

 

          
 

Data sources

You can use the DataSift platform to filter for content from a range of data sources such as Twitter, Facebook, and Amazon. The Data Sources tab displays all the websites from where we acquire data for your streams. Our sources include a range of blogs, boards, media sharing websites as well as some of the most widely used social media sites. However, keep in mind that you must activate and sign a license for the data source if you want to receive their data in your stream output.

 

          
 

Data destinations

DataSift also offers you options to export your output data to a range of data destinations such as FTP, HTTP, SFTP, Amazon S3, Amazon DynamoDB,  ElasticSearch, Splunk Storm, and so on. You can view or access these by clicking the Data Destinations tab. You can add or edit settings for individual destinations from here. You must also ensure that they are correctly configured and set up with their own unique settings, including authentication details. DataSift allows you to test the connection from the platform to your data destinations.

 

          

          
 

Billing

You can monitor your usage statistics and the costs of streams that are currently running, from the Billings tab. You can also view the total costs, usage, data volume, connected hours, and historic hours from last seven days. 
 

          


          

 

Summary

DataSift offers state-of-the-art technology to filter real-time data relevant to your organization. DataSift offers this service through a feature-packed user interface that is intuitive and easy to use. The DataSift GUI can be used by non-developers as well as advanced users. You can create streams to filter for content, recordings of the streams, export output data from the streams, and create Historics queries to retrieve data from the past. You can also view the data sources through which we run your streams to filter for content. To export the output data from your streams to an external data storage, you can configure your own data destination. Any activities you perform through our UI or the API are logged in your usage statistics. You can also view your billing details and DPU usage.

To try out and preview the DataSift platform, sign up today for a free trial.

Pages

Subscribe to Datasift Documentation Blog