Recording and exporting data

Shruti Desai | 7th November 2012

The DataSift UI has a new tab. You can now use the Tasks tab to manage recording and exporting of the output data received from your streams. You can also use it to record and track Historic queries.
Recordings can be scheduled or you can start a recording and manually stop it. When a recording has been running for over an hour, you can export it. You can export a recording in parts or export it whole.
Log in to DataSift to try it out.

Recording a stream

You can record a stream by clicking on the Streams tab and selecting a stream to record.

tab2

Alternatively, you can use the Tasks tab to start a new recording.

  1. Click the Tasks tab at the top of the screen.

    tab1

    This page displays links to access and manage recordings and data exports, as well as Historic queries.

  2. Click Start a Recording.

    screen2
  3. Select a stream.

    screen14_1
  4. Check the Start now box or enter a start time.
  5. Enter a finish time. Alternatively, you can leave the Finish field blank if you want to manually record your streams.
  6. Give the task a name.

    screen3_0
  7. Click Create.

    A summary page displays details such as the name of the Stream, the Timeframe and the Processing Cost.

    screen4
  8. Click Confirm & Start Recording to continue.

    screen6

    Your recording has been created. You can stop the recording any time by clicking Stop Task.

  9. Select your recording to view how many interactions have been recorded by DataSift.

    screen5

Exporting a stream

  1. Click the Tasks tab at the top of the screen.

    tab1

    Your recordings and Historic queries are displayed on this page. Find the stream you want to export from the list of Recordings and click Export Data for that stream.

    screen7
  2. Enter a name for your new export.
  3. Select a format: JSON or CSV.
  4. Choose the start date and time. By default, the export begins at the start of the recording; but you can change it to start anywhere between the start and finish time of your recording.
  5. Choose a finish date and time.
  6. Select a destination for your export: DataSift Storage or Amazon S3.

    When you select Amazon S3 as the destination for your export, you need to provide access credentials from your Amazon S3 account, such as the Access Key ID as well as the Secret key. You will also have to provide the name of the bucket where you want to store your export.

    screen12_1

    For more information on using Amazon S3 for storage, please see the Addendum below.

  7. Uncheck the All checkbox for Filter Columns to select which targets to include and which to exclude. By default the export selects all the targets that DataSift has recorded for you.

    screen8
  8. Click Create.

    You can also track the progress of your export. Once the export is ready, a Download link is available. If you selected DataSift as the storage destination for your export, the download link will expire in 7 days.

    screen9

Deleting a recording

  1. Click the Tasks tab at the top of the screen.

    tab1
  2. Select the Recordings tab on the left. Your recordings are listed on this page.

  3. To delete a recording, click Delete Task for the stream.

Output formats

Recordings can be lengthy and hence DataSift compresses the data. The export files are downloaded in GZIP format. You must unpack them before you can use them.

You can receive the data in comma-separated value (CSV) format or JavaScript Object Notation (JSON) format.

  • CSV is easy to parse but the simplest way to look at CSV a file is to import it into a spreadsheet.
  • JSON is an easy-to-read, lightweight format for data exchange. Objects are sent in text format, one after another. In raw format is can be difficult to read, but a good free formatter is jsonlint.com. Try formating this sample of raw JSON in jsonlint.com.

    {"company name": "DataSift", "sector": "big data software", "location" : "San Francisco", "products": ["DataSift","TweetMeme"]}


Addendum

Please note that while using Amazon S3 Storage service to store your exports, certain restrictions apply with respect to the name of the bucket where storing the export, and your Amazon Secret Key. These restrictions are:

  • You cannot use a bucket with an underscore ( _ ) in the bucket name. Any restrictions by Amazon on naming an S3 bucket will also apply. To view naming conventions for an Amazon S3 bucket, please refer to Amazon S3 documentation.
  • You cannot use an Amazon Secret Key that includes a forward slash ( / ). If your Amazon Secret Key includes a ( / ), please use DataSift storage to first download the export and then upload it into your Amazon S3 storage.

If ignored, these restrictions are known to cause problems within our API and the export will be unsuccessful. Make sure you use the alternate methods suggested, to work around these restrictions.

Please also note that these restrictions do not apply to Push connectors.


Previous post: Big Data, Bigger Networking

Next post: Open Graph and Twitter Cards