This guide explains the components of PYLON, how they work together, and how you can get the most out of the platform.
How can you get the most from your data index? How should you split your interaction filters and analysis queries? How can you serve many use cases and many customers? This guide answers these questions and more.
Table of Contents
- PYLON Principles
- Recordings Interactions
- Analyzing Interactions
- Common Architectures
Before we look at each component in depth, let's recap on how PYLON has been designed for your use.
PYLON is designed to allow you to capture sets of data into indexes, then segment and analyze the recorded data with analysis queries.
When you design your solution, keep in mind these principles:
- Each interaction filter you record from will save data into a separate index.
- Create & allocate indexes for each customer or use case, as you see fit. This allows you to tune your filter for each, and keeps the data siloed.
- With PYLON you are not subject to data licensing costs, but you do need to keep within platform and account volume limits.
- Keep your interaction filters broad, capturing all the data you need for analysis, but also removing noise that will impact the quality of your analysis. Then use analysis filters to segment the data for analysis later.
- You can leave recordings running indefinitely, making sure you always have fresh data in your index.
- Although both interaction filters and analysis queries provide filtering, Interaction Filters are long-running, more powerful queries with the full range of CSDL targets and operators available.
- Therefore complex filtering such as noise reduction are better handled in Interaction Filters upfront.
- Analysis filters are less powerful but can be run in an instant on data in your index.
Now let's take a deeper look at recording and the components involved.
First a quick recap on the key terms that relate to recording interactions:
- Interaction Filter - Written in CSDL, an interaction filter defines which interactions you want recorded to an index. It also can include optional classification rules you want to apply to the interactions.
- Recording - Runs an interaction filter for a period of time (usually ongoing), saving any matching interactions to an index.
- Index - An indexed store for interactions you record that serves analysis queries. Note that a separate index is created for each interaction filter you record from.
- Recording Limits - Your recordings are subject to platform and account limits, depending on your package.
Now let's take a look at each of these in more depth.
Interaction filters are written in CSDL. This gives you access to the full range of operators and targets when defining your filter.
Interaction filters can include classification rules (powered by VEDO). Classification rules allow you to add extra value to the data, which can then be used as part of your analysis.
Interaction filters run against the raw data stream that we receive into the platform for the sources you have enabled. It's here that you have maximum access to data fields you'd like to filter on or classify.
When an interaction filter is run in a recording, each interaction that arrives in the system (for sources you have enabled) is checked to see if it is a match. Any classification rules you have defined are then run on the matching interaction, and it is then saved into the index.
It is important you tune your filter so that you capture all of the data you need for your analysis, but exclude noise that might impact your results. You can filter out data in analysis queries, but it is much easier to have a clean data set to analyze.
PYLON is designed for you to constantly record data into your indexes, and always have the latest data available for analysis.
For instance, imagine you are looking to analyze conversations around movies. The principle of PYLON is for you to always be able to analyze the latest up-to-the-minute conversations.
You can of course start and stop your recordings whenever you like, however remember that if you stop a recording, the index being recorded to will gradually empty as the data it contains will expire over time. Stopping a recording does not freeze the data set already in the index.
When data is saved into an index it is immediately indexed, giving you fast query access across a potentially huge data set.
All data stored in an index is siloed and kept secure. No other DataSift users can access this data, and you can assign indexes as you see fit for your end customers, confident that they will only be able to see their own results.
Data stored in an index will be kept for 32 days, after which it will expire and no longer be available.
As indexes give you fast query access you can record data, then use analysis queries to explore the data set you have captured. You can also pass on this performance to your end users.
Although it's good practice to keep your interaction filter broad, there are platform and account limits you need to keep in mind.
The platform and account limits you are subject to are detailed on the Platform Allowances page.
It's important to keep in mind the individual recording limit and your overall account recording limit.
Interaction Filter Recording Limit
Each filter you run as a recording can record a maximum of 1 million interactions per day. If you exceed this limit data will stop being recorded until the daily limit resets at midnight Pacific Standard Time.
You could consider splitting your recording, segmenting the data into different indexes, this is dicussed later in this document. However you are still subject to your overall account limit.
Account Recording Limit
Your account will have an overall recording limit, defined by your pricing package. This is the maximum number of interactions you can record in a day across all of your recordings.
If you are running multiple recordings, your account limit will be used up as data arrives in each recording. When your limit is reached data will stop being recorded across all recordings you have running (regardless of the individual recorded volume for each) until the daily limit resets at midnight Pacific Standard Time.
This is important to consider because you could have one recording that consumes your account limit very quickly, stopping all your other recordings from saving data.
You will need to update your Interaction Filters, initially you'll iterate changes quickly as you create filters for a new project, then when you have a project live you'll want to make occasional changes to filters as requirements change.
You can update the CSDL definition for a recording by using the pylon/update endpoint. If your recording is currently running the filter will be updated almost immediately recording any interactions that match the new filter definition to the same index.
For advice on managing changes to interaction filters see Recording Data.
Now let's take a deeper look at analyzing interactions.
Again, a quick recap on the key terms that relate to analysis:
- Analysis Filter - Written in CSDL, you can optionally specify a filter when making an analysis query. This allows you to select a sub-set of data to analyze.
- Analysis Query - Is the combination of an optional analysis filter, the target you would like to analyze, and analysis to perform on the target's data.
Now let's take a look at each of these in more depth.
Analysis queries are the mechanism you use to explore your data and return analysis results.
You can make many queries to your index, analyzing each aspect at a time, segmenting data into deeper and deeper subsets as you go. This enables you to build up a picture of the data, for instance implementing an analysis dashboard for end users.
Results of analysis are quick, and we expect you to make many queries, but you are still subject to API limits. We recommend you cache results of queries, if you need the results often.
If you've included classification rules in your recording filter, you can reference these in your analysis. The combination of data classification and analysis queries is extremely powerful.
For more details on classification see Classifying Data.
Analysis filters allow you to slice your data in many different ways and to examine small segments. Mastering analysis filters is key to getting the most from your recorded data.
When writing an analysis filter you can:
- Specify an optional time period
- Combine up to 10 conditions in your CSDL filter
- For each operator (such as contains) specify up to 100 values
Not all targets are available for filtering in analysis queries - check out the target documentation for details.
You can filter data based on tags added to data through classification rules in interaction filters. Adding classification to data allows you to add your taxonomy to data, and in turn makes the data easier to segment.
Both interaction filters and analysis filters allow you to filter data. Interaction filters are used to build your data set for analysis, whereas analysis filters are used to segment the recorded data set for an analysis query.
Keeping with the movie example, imagine you're analyzing people discussing the latest box office movies:
- Interaction Filter
- Filter to include conversations about movies in the box office list
- Filter to remove noise, such as marketing promotions for movies
- Classify each movie being mentioned, so each movie is clearly identified
- Analysis Queries
- To analyze the ages of all people discussing the movies - no analysis filter is required for your query
- To analyze the ages of people discussing a particular movie - add an analysis filter to your analysis query to select only content with the required movie's tag
When you design your solution consider how you will serve use cases and your customers using the indexes you will create.
This is the simplest way to use PYLON, recording one interaction filter will create one index that you can submit analysis queries to. This setup is perfect for serving a single use case from one data set, and is no doubt how you'll start using the platform.
Again with our movie example, if you are designing a solution that analyzes the demographics of people discussing movies currently in the box office:
- Interaction filter - Select discussions around current box office movies, classifying movies identifying each title with a tag.
- Index - Recording your filter will create an index, containing box office movie discussions.
- Analysis queries - Analyze your recorded data, segmenting by titles you have tagged, and any other dimension you wish.
If you require separate data for different use cases or scenarios you can record data from multiple interaction filters into separate indexes. You simply issue your analysis requests to the index with the data you require.
Building on the previous example, perhaps you are analyzing discussions around movies in both the US and UK box offices. As titles in the box office will differ between markets and language used by authors will also vary, you can capture each market into it's own index:
- Interaction filters - Create two filters, one for US discussions and one for UK.
- Indexes - Recording your filters will create two indexes, one for US data and one for UK.
- Analysis queries - Analyze your recorded data, submitting queries to each index as appropriate.
If you'd like to serve multiple groups of end users, such as multiple customers, you can create separate indexes for each customer (or even multiple indexes for each customer). You can then serve analysis to each customer from the indexes you have assigned to them.
Building on the previous example, perhaps you have two film studios as customers, each wanting to analyze discussions relating to their titles. You can create interaction filters for each customer, and keep their data siloed from each other. You can also add custom requirements for each customer such as custom classification rules.
- Interaction filters - Create two filters, one for each customer, filtering to the list of movies you for each as appropriate.
- Indexes - Recording your filters will create two indexes, one for each customer.
- Analysis queries - Analyze your recorded data, submitting queries to each index as appropriate. Note you need to keep track of which index applies to which customer.
If you need to serve a customer who's use case requires a very high volume of data to be recorded, then you will need to consider splitting your recording into multiple indexes.
Imagine you wish to record all mentions of a very popular brand such as Apple - this could cause you to hit the platform limit of a maximum 1 million interactions per day per recording.
Here we suggest you:
- Create an initial filter and briefly record data to allow you to estimate the data volume you will expect
- Analyze the data in your index to determine a clean way to 'split' your recording
- Create seperate Interaction Filters for each of your splits
- Record from each of your Interaction Filters, resulting in a distinct index for each split
- Submit analysis requests to each index, combining results before presenting these to your end users
How you choose to split your recording will depend on your use case. We recommend you choose a split that avoids the risk of double-counting unique authors.
For example if you choose to split by gender there are three possible values for gender that are mutually exclusive. Recording interactions from males into one index and females into another leaves no risk that content can appear in both. Therefore when you perform analysis and receive a unique user count you can safely combine the values to give you a total. In general demographics targets are a good way to split your recordings.
You could choose to split recordings by topics, brands or products. If you do choose to do so make sure that if you choose to combine results from the indexes you do so in a way that does not produce erroneous results.