Configuring DynamoDB

The DynamoDB connector supports single HashKey or HashKey + RangeKey table setups. It automatically detects the names and AttributeTypes of these keys; the only constraint imposed by DataSift is that the HashKey must be a string because we populate it with our target, which is a hexadecimal number.

We place the interaction time into the RangeKey as a Unix timestamp; if no RangeKey is set, the 'ts' column will contain this timestamp.

Avoid the use of DataSift target names as HashKey and RangeKey names, otherwise key data may be lost; recommended key names are "id" for the HashKey and "timestamp" for the RangeKey.


DataSift also checks the write throughput of the table and uses this to moderate the rate at which data is inserted. DynamoDB rounds up the size of each interaction to the nearest kilobyte so DataSift attempts to account for this by delivering data at roughly half to two thirds of the throughput of the table.

Check the subscription logs for any warnings about the maximum throughput being exceeded. If you receive a warning, increase your provisioned throughput capacity in DynamoDB. Since we buffer data for up to one hour, you do not need to pause or otherwise interrupt the delivery process as long as you make this change promptly.

Delivery will resume once enough throughput has been provisioned. For more information on write and read throughput settings, consult Amazon's DynamoDB documentation.

Set the write throughput for the table to a minimum of 100 Kb/s. Be aware that some blog posts or other large interactions might exceed this limit.

Optionally, you can configure alarms in DynamoDB.


Please note the region your table was created under. DynamoDB tables cannot currently be accessed via regions other than that in which they reside.