Falling prices of memory chips make it possible to process large amounts of data in RAM. One of the tools that come in very handy when you need to manage gigabytes of data in RAM is Redis, an advanced key-value store distributed under an Open Source license.

DataSift can deliver interactions in a format of your choice straight to your Redis server. You will need to set up your own Redis server. It is your responsibility to ensure that your Redis server can handle the volume of data sent by DataSift.

Here are some guidelines to help you use Redis data destinations successfully with Push. Read the step-by-step guide to the API to learn how to work with DataSift's APIs.

Configuring Redis for Push delivery

To set up and use Redis for Push delivery, follow the instructions below, skipping the steps you have already completed. We are going to use Debian and Ubuntu Linux distributions for our examples, but the principles apply to any operating system:

  1. Update your operating system. Refer to your system's specific instructions for this step. Debian and Ubuntu users can do it using:

    sudo apt-get update
     
  2. Install Redis on your system. The exact commands used for that purpose will differ from one operating system to another. Debian and Ubuntu users will issue the following command:

    sudo apt-get install redis-server

    You can also build Redis from the sources, if you want to use the latest version and if you are not afraid of building software yourself. For more help, refer to your system's documentation and the Redis installation guide.

    If the installation fails, check that your OS update was successful and try again.
     
  3. Add the following line to /etc/redis/redis.conf:

    port = 6379

    This line may already be present in the Redis configuration file. The port parameter can be set to other values, as long as they fall within the allowed TCP port range (1-65535, but not 0), although for security reasons you should only use non-priviledged ports (1024-65535).
     
  4. Remove the following line from /etc/redis/redis.conf:

    bind = 127.0.0.1

    When you remove bind, the Redis server will listen for connections on all interfaces. If you want to set bind to a specific address, use ifconfig to find out what IP addresses have been assigned to which interface.
     
  5. Save the changes and restart Redis:

    sudo /etc/init.d/redis-server restart
     
  6. Test that Redis is working and that you can reach it from the outside. You can do that with telnet on a computer connected to the public internet:

    telnet redis.example.com 6379

    The server should respond with a message similar to the one shown below:

    Connected to redis.example.com.
    Escape character is '^]'.

     
  7. Type the following command:

    ping
     
  8. Hit the Enter/Return key. The server should respond with the following message:

    +PONG

    This confirms that you can send commands to a remote Redis server and receive responses.
     
  9. Type the following command:

    quit
     
  10. Hit the Enter/Return key. The server should respond with the following message:

    +OK

    The telnet client will then exit with the following message:

    Connection closed by foreign host.
     
  11. You are now ready to set up the Redis connector.
     

 

Configuring Push for delivery to Redis

  1. To enable delivery, you will need to define a stream or a Historics query. Both return important details required for a Push subscription. A succesful stream definition returns a hash, a Historics query returns an id. You will need either (but not both) to set the value of the hash or historic_id parameters in a call to /push/create. You need to make a call to /push/get or /historics/get to obtain that information or you can use the DataSift dashboard.
     
  2. Once you have the stream hash or the Historics id, you can supply that information to /push/create. In the example below we are making that call using curl, but you are free to use any programming language or tool.
     
  3. For more information, read the step-by-step guide to the API to learn how to use Push with DataSift's APIs.
     
  4. When a call to /push/create is successful, you will receive a response that contains a Push subscription id. You will need that information to make successful calls to all other Push API endpoints (/push/delete, /push/stop, and others). You can retrieve the list of your subscription ids with a call to /push/get.
     
  5. You should now check that the data is being delivered to your server. Use telnet to connect to the remote Redis server and get the list of keys using the following command:

    keys *

    When DataSift is able to connect and deliver interactions to this directory, you should see there the key defined in output_params.list (see "Output parameters"). Use that name as the parameter of the following Redis command:

    llen RedisListName

    As new interactions are being delivered, the number of interations reported by the llen command will grow. If you notice a delay in the delivery of interactions, this might be due to the fact that the stream has no data in it or there is a problem with your server's configuration. In the first case, preview your stream using the DataSift web console and in the second case, make a call to /push/log to see if that has any additional information.

    Please make sure that you watch your usage and add funds to your account when it is running low. Also, stop any subscriptions that are no longer needed otherwise you will be charged for their usage. There is no need to delete them. You can can have as many stopped subscriptions as you like without paying for them. Remember that any subscriptions that were paused automatically due to insufficient funds, will resume when you add funds to your account.
     
  6. To stop delivery, call /push/stop. To remove your subscription completely, call /push/delete.
     
  7. Familiarize yourself with the output parameters (for example, the host name) you'll need to know when you send data to an Redis server.

 

Notes

Twitter sends delete messages which identify Tweets that have been deleted. Under your licensing terms, you must process these delete messages and delete the corresponding Tweets from your storage.

 

Output parameters

DataSift sends JSON data once it becomes available, according to the "delivery_frequency" interval you configure in the output parameters. Data is bundled into batches of up to "max_size" bytes. Your server must respond with a success message upon each delivery.

Parameter: Description:
output_params.host
required
The name of the Redis host that DataSift will connect to.
output_params.port
optional
The port that you want to use on your server.
output_params.database
optional
The numeric id of an existing database.
output_params.list
required
The name of a list that stores interactions. The list does not have to exist, it will be created if necessary.
output_params.format
optional
default = json_interaction
The output format for your data:
  • json_interaction_meta - This is specific to the Redis connector for now. Each interaction is sent separately except it is framed with metadata.
  • json_interaction - This is specific to the Redis connector for now. Each interaction is sent separately with no metadata.

If you omit this parameter or set it to json_interaction_meta, each interaction will be delivered  with accompanying metadata. Both the interactions and the metadata are delivered as JSON objects.
 

Take a look at our Sample Output for File-Based Connectors page.
 

If you select json_interaction, DataSift omits the metadata and sends each interaction as a single JSON object.

output_params.auth.password
optional
The password for authentication.

 

Data format delivered: 

JSON document containing an array of JSON objects, each representing one DataSift interaction. Here's an example of the output. May include meta information. See the format output parameter description in the table above.

Storage type: 

For each delivery, DataSift sends all the data that is available. It is up to you to configure and manage the Redis server to handle the storage.

Limitations: 

Make sure that your server has a lot of RAM. The more, the better. If one machine is not capable of handling the amount of data sent by DataSift, have a look at distributed Redis management solutions. If the stream your Push subscription is based on generates data at a faster rate than you permit the delivery, the buffer will fill up until we reach the point where data may be discarded.