This connector is available only to customers on the Enterprise Edition.

 

Zoomdata is a data analysis and visualization service that helps you turn the sea of raw numeric and textual data into its animated graphical representation.

Configuring Zoomdata for Push delivery

To use zoomdata with Push delivery, follow the instructions below, skipping the steps you have already completed. We use a 64-bit CentOS 6.4 Linux distribution running on an EC2 Standard Medium Instance:
 

  1. Launch a 64-bit CentOS 6 image on an EC2 M1 Medium Instance.

    Make sure that you have a copy of the key required to connect to the newly created instance.
     
  2. Setup security policies.

    Modify the Security Group associated with your CentOS server instance to allow inbound connections to ports 22 and 8443.

    Check your firewall settings, as they may be interfering with the flow of packets, even when you open the right ports on the AWS firewall, which is independent from iptables. Should you have no option left, try this command:
     
    sudo iptables -F
     
  3. Try to log in to your CentOS sever using ssh.

    If this is the first time, use the root username. Be careful, you will be logged in as superuser with all the powers over the system.
     
  4. Update CentOS:
     
    su -c 'yum update'
     
  5. Download the Zoomdata CentOS RPM package.

    Zoomdata is distributed as a virtual appliance or as a CentOS RPM package. For the purpose of this document we are going to use the RPM package.
     
  6. Download support packages listed on the Zoomdata documentation site.
     
  7. Copy the Zoomdata RPM and support packages to your CentOS server.
     
  8. Install support packages and Zoomdata.

    Follow the instructions on the Zoomdata documentation site.
     
  9. Start MongoDB.

    mongod --config /etc/mongod.conf
     
  10. Start Zoomdata.

    /opt/zoomdata/bin/startup.sh
     
  11. Connect to your Zoomdata server.

    Type https://ec2-XXX-XXX-XXX-XXX.location.compute.amazonaws.com:8443/zoomdata-web/ into your web browser.

    Use username admin and password admin.
     
  12. You are now ready to set up the Zoomdata connector.
     

Configuring Push for delivery to Zoomdata

  1. To enable delivery, you will need to define a stream or a Historics query. Both return important details required for a Push subscription. A succesful stream definition returns a hash, a Historics query returns an id. You will need either (but not both) to set the value of the hash or historic_id parameters in a call to /push/create. You need to make a call to /push/get or /historics/get to obtain that information or you can use the DataSift dashboard.
     
  2. Once you have the stream hash or the Historics id, you can give that information to /push/create. In the example below we are making that call using curl, but you are free to use any programming language or tool.


    For more information, read the step-by-step guide to the API to learn how to use Push with DataSift's APIs.
     
  3. When a call to /push/create is successful, you will receive a response that contains a Push subscription id. You will need that information to make successful calls to all other Push API endpoints (/push/delete, /push/stop, and others) You can retrieve the list of your subscription ids with a call to /push/get.
     
  4. You should now check that the data is being delivered to your server. Log in and use the graphical interface to explore data.

    Please remember that the earliest time you can expect the first data delivery is one second with consecutive deliveries performed in 10-second intervals.

    If there is a longer delay, this might be due to the fact that the stream has no data in it or there is a problem with your server's configuration. If want more information, make a call to /push/log and check the value of the success field,. If it is set to failure, check the value of the message field for clues. Also, make sure to make a call to /push/get and see if the response includes information about DataSift retrying to deliver data to your data delivery destination. When the status field is set to retrying, you should verify that your server is receiving data.

    Please make sure that you watch your usage and add funds to your account when it is running low. Also, stop any subscriptions that are no longer needed otherwise you will be charged for their usage. There is no need to delete them. You can can have as many stopped subscriptions as you like without paying for them. Remember that any subscriptions that were paused automatically due to insufficient funds, will resume when you add funds to your account.
     
  5. To stop delivery, call /push/stop. To remove your subscription completely, call /push/delete.
     
  6. Familiarize yourself with the output parameters (for example, the host name) you'll need to know when you send data to a Zoomdata server.

Notes

Twitter sends delete messages which identify Tweets that have been deleted. Under your licensing terms, you must process these delete messages and delete the corresponding Tweets from your storage.

 

Output parameters

Parameter: Description:
output_params.host
required

The HTTPS URL (a host name or an IP address) of the Zoomdata host that DataSift will connect to.
Example value: https://zoomdata.example.com

output_params.port
required

The port that you want to use on your server.
Example value: 8443

output_params.source
optional
default = DataSift

The label used to mark data delivered to Zoomdata.

This will be used to label data sets in the Zoomdata interface and database.
Example value: DataSift

output_params.auth.username
required

The username for authorization.
Example value: joe

output_params.auth.password
required
The password for authorization.
Example value: secret

 

Data format delivered: 

JSON document containing an array of JSON objects, each representing one DataSift interaction. Here's an example of the output.

Storage type: 

For each delivery, DataSift sends one file containing all the available interactions.

Limitations: 

Take care when you set the max_size and delivery_frequency output parameters. If your stream generates data at a faster rate than you permit the delivery, the buffer will fill up until we reach the point where data may be discarded.