For the Facebook topic data source the DataSift platform receives two streams of data, a stream of stories and a stream of engagements.
The engagements we receive do not contain data fields from the stories they relate to, only the ID of the parent story. On the platform we attach story data fields to engagements so that you can filter, classify and analyze engagements based on parent story attributes. We call this process engagement hydration.
How does engagement hydration work?
The diagram above shows how we hydrate engagements:
- Your interaction filter is run against stories arriving from Facebook.
- Stories that match your interaction filter are:
- passed on through classification and stored in your index.
- stored in the context cache.
- Additional stories are added to the context cache based on global platform rules discussed below.
- As engagements arrive from Facebook the context cache is checked, based on the related story ID, to see if the parent story is stored. If so, the context builder adds data fields from the story to the engagement.
- Engagements are then run through your interaction filter.
- Engagements that match your interaction filter are passed on through classification and stored in your index.
For engagements to be hydrated the related story must exist in the context cache. Due to the vast quantity of stories posted on Facebook globally we do not store all stories in the context cache. Stories enter the context cache with sub-second latency so they are available in advance of engagements taking place.
The context cache is global across all recordings running on the platform, this gives the maximum chance that an engagement will be hydrated. Stories are removed from the cache on a least-recently-used basis. That is to say that stories that have been more recently engaged with are kept in the cache and those that are not being engaged with are removed to give capacity for new stories. On average stories typically are stored for between 4 and 20 days.
If the parent story for an engagement does not exist in the context cache then the targets relating to the story (fb.parent.*) will not be populated. If your interaction filter uses these targets you might not record some engagements you were expecting.
Therefore it's important that your design your interaction filter to record the parent stories for any engagements you'd like to analyze.
To help hydrate more engagements for customers we add additional stories to the context cache, regardless of the recordings that are running.
Authors are split into two groups: 'page' and 'user'. ‘Page’ type authors usually represent organizations and brands. For many use cases posts by these users that contain links are the key stories customers would like to analyze engagements for, therefore we add all stories created by 'page' type authors containing a link to the context cache.
This means that engagements on stories created by 'page' type authors containing a link will always be hydrated (within the caching period), whereas engagements on stories created by 'user' type authors will only be hydrated if you have selected these in your filter and they are stored in the context cache.
Example interaction filter
Let's look at some examples to explain hydration in practice.
If you're looking to analyze engagements on stories about cars, you might start with the following CSDL:
fb.parent.topics.category == "Cars"
Running this as your interaction filter would fail to capture the parent stories for the engagements. The result would be that only engagements on stories created by 'page' type authors would be hydrated.
Although you are not intending to use stories in your analysis you would need to add a condition to your filter to record the stories, so that they exist in the context cache:
fb.parent.topics.category == "Cars" AND fb.topics.category == "Cars"
This change means that the engagements can be hydrated from the stories that will have been added to the context cache.
Engagement hydration in-depth
To learn more about engagement hydration you can take a look at our training video.
Why aren't all engagements hydrated using the context cache?
The huge volume of stories posted on Facebook would be impractical to cache.
What happens if an engagement arrives from a story created a few days ago?
Stories are removed from the context cache on a least-recently-used basis. If an engagement arrives from a story posted some days in the past, whether the story exists in the cache depends on how much engagement the story has received over time. If the story has received ongoing engagement the story will be kept in the cache, so the new engagement will be hydrated. If the story hasn't received ongoing engagement then the story will have been removed from the cache and the new engagement will not be hydrated.
What happens if I pause then resume my recording?
Stories that were posted whilst your recording was paused will not be in the context cache. Engagements on these stories will not be hydrated.