Amazon AWS DynamoDB

Updated on Tuesday, 5 August, 2014 - 14:12

Amazon AWS DynamoDB is a high-performance, scalable NoSQL database as a service. It saves you time and money, because it is hosted in the Amazon AWS cloud and you only pay for the resources you actually use.

So that you can take advantage of the power and scalability of Amazon Web Services and avoid the burden of managing your own data store, we can deliver your data directly to Amazon DynamoDB.

Configuring Amazon AWS DynamoDB for Push delivery

To use Amazon AWS DynamoDB with Push delivery, follow the instructions below, skipping the steps you have already completed. It does not matter what operating systems you use as long as you can connect to the internet:

  1. Create a new DynamoDB table.

    There are two ways of doing it: programmatically and via a web browser. Whichever way you choose, make sure that the table's Primary Key Type is set to Hash and Range and that its Hash Attribute Name is set to a string whose value is InteractionID; the value of the Range Attribute Name string should be set to InteractionCreatedAt.
     
  2. You are now ready to set up the Amazon AWS DynamoDB connector.

Configuring Push for delivery to Amazon AWS DynamoDB

  1. To enable delivery, you will need to define a stream or a Historics query. Both return important details required for a Push subscription. A succesful stream definition returns a hash, a Historics query returns an id. You will need either (but not both) to set the value of the hash or historic_id parameters in a call to /push/create. You need to make a call to /push/get or /historics/get to obtain that information or you can use the DataSift dashboard.
     
  2. Once you have the stream hash or the Historics id, you can give that information to /push/create. In the example below we are making that call using curl, but you are free to use any programming language or tool.

  3. For more information, read the step-by-step guide to the API to learn how to use Push with DataSift's APIs.
     
  4. When a call to /push/create is successful, you will receive a response that contains a Push subscription id. You will need that information to make successful calls to all other Push API endpoints (/push/delete, /push/stop, and others) You can retrieve the list of your subscription ids with a call to /push/get.
     
  5. You should now check that the data is being delivered to your Amazon AWS DynamoDB instance. Log in to your AWS account and examine the table.

    When DataSift is able to connect and deliver interactions to your Amazon AWS DynamoDB instance, it will populate the table listed as the value of the output_params.table output parameter. Each interaction is stored in a separate row.

    Please make sure that you watch your usage and add funds to your account when it is running low. Also, stop any subscriptions that are no longer needed otherwise you will be charged for their usage. There is no need to delete them. You can can have as many stopped subscriptions as you like without paying for them. Remember that any subscriptions that were paused automatically due to insufficient funds, will resume when you add funds to your account.
     
  6. To stop delivery, call /push/stop. To remove your subscription completely, call /push/delete.
     
  7. Familiarize yourself with the output parameters (for example, the host name) you'll need to know when you send data to an Amazon AWS DynamoDB server.

Notes

  • Twitter sends delete messages which identify Tweets that have been deleted. Under your licensing terms, you must process these delete messages and delete the corresponding Tweets from your storage. If you're using DynamoDB, DataSift handles those messages for you and deletes the relevant Tweets automatically from your DynamoDB table. Learn more...
     
  • We buffer your data inside DataSift and periodically write it into to Amazon DynamoDB. To keep write throughput requirements low, we write a few times each minute, and adapt the number of interactions written to the throughput level you've set on your Dynamo table.
     
  • Amazon AWS DynamoDB has different levels of throughput charged at different rates. Please refer to the Amazon AWS DynamoDB documentation for details.

 

Output parameters

Parameter: Description:
output_params.table
required
The name of the Amazon DynamoDB table where the data is stored.
output_params.auth.access_key
required
Your Amazon AWS access key. Make sure this value is properly URL-encoded.

Please create custom credentials to ensure that access to your DynamoDB account is restricted.
output_params.auth.secret_key
required
Your Amazon AWS secret key. Make sure this value is properly URL-encoded.

Please create custom credentials to ensure that access to your DynamoDB account is restricted.
output_params.region
required
The id of the AWS region your DynamoDB table lives in.

This has to be the full-length id. Currently, the following ids are supported:

dynamodb.ap-northeast-1.amazonaws.com
dynamodb.eu-west-1.amazonaws.com
dynamodb.us-east-1.amazonaws.com
dynamodb.us-west-2.amazonaws.com
dynamodb.us-west-1.amazonaws.com
dynamodb.ap-southeast-1.amazonaws.com

 

Data format delivered: 

DynamoDB native format. Each interaction is stored as 1 document.

Storage type: 

One interaction per document.

Limitations: 

Throughput can exceed 10,000 Kb/second if you provision for it.

Each DynamoDB record must be 64KB or smaller. For comparison, a typical Tweet might be 2KB. Most other interaction types are likely to be smaller than 64KB but you're likely to encounter problems if you use the Newscred or Wikipedia data sources with DynamoDB.

Newscred interactions might exceed 64KB although the largest we have seen was almost 60KB. Wikipedia interactions frequently exceed 64KB, and interactions of 1MB are very common; the largest we have seen was 7.8MB.