We have also set up an Amazon S3 bucket where we will push files that comprise of batches of compliance events. This bucket will also contain a small number of historical interactions.
Each of our public social firehoses interaction types have their own directory at the top level. Within each of these top level directories, we have a directory for each source, year (yyyy), month (mm) and then day (dd). For example:
disqus\ 2018\ 04\ 29\ 30\ 05\ 01\ 02\
wordpress\ 2018\ 04\ 29\ 30\ 05\ 01\ 02\
tumblr\ 2018\ 04\ 29\ 30\ 05\ 01\ 02\
reddit\ 2019\ 03\ 14\ 15\ 16\ 17\
This facilitates ease of processing at any point in time. The day refers to the UTC date at which we push the data into S3. This is not the date of the interaction so you may need to process either size of day boundaries.
Within each day directory, we push batches of new-line delimited JSON files. This is the same technique we use for our Push delivery system. We will deliver multiples files each day. A file will not be modified after it has been uploaded to the S3 bucket. This allows customers to read this data at the same time, whilst we push new compliance messages to the feed close to real-time.
An example Wordpress compliance file for 25th May 2018 looks like this:
Reddit feeds contain additional compliance message types and therefore the files are split up into deletes and updates like this:
- For deletes:
- For updates:
With the content like this:
- Disqus delete
- Tumblr post delete
- Tumblr blog delete
- IntenseDebate comment delete
- Wordpress post/page delete
- Wordpress comment delete
- Wordpress blog delete
- Reddit delete
- Reddit update
For self-hosted Wordpress blogs, there might be a few extra content types like
jetpack-testimonial (see jetpack documentation).
Delete events for these content types should all have the same structure, as documented in the following example:
The bucket data is available on-request by contacting email@example.com. You can use the Amazon SDK tools to read this data, or manually consume it using a UI tool.