Considerations for Product Builders

In this guide we'll take a look at important aspects you need to consider when building a product that utilizes PYLON for Facebook product data.

Serving multiple customers using your PYLON account

You will want to serve many customers with your PYLON-based product.

As a product builder you need to ensure that:

  1. when serving each customer you are accessing the PYLON API on their behalf with their registered Facebook token.
  2. you share your account limits across your customers so you can continue to serve each fairly.

Accessing PYLON on the behalf of your customers

To comply with terms and conditions you need to ensure that when you call the PYLON API you use a valid Facebook token that represents your end customer. You need a valid token for each end customer.

Identities allow you to manage your tokens and API calls. You must create a separate identity for each of your end customers, assign the end customer's token to the identity then call the PYLON API using the appropriate identity for the end customer.

Read more about identities in the Managing Identities guide.

Sharing your recording allowance across your customers

Your PYLON account is subject to a number of limits. Your account recording limit is stated in your contract and sets a monthly limit on the number of interactions you can record across all of your running recordings. You need to consider how you will share this limit across your end customers. Your decision may depend on the package you sell to your customers.

To portion this limit across your customers you can use identity recording limits. These allow you to set a daily recording limit for each identity. As an identity represents an end customer, then essentially you are setting a daily recording limit for each of your customers.

In some cases you might choose to configure multiple identities for one customer. For instance you might use one identity to run ongoing recordings on their behalf and another to create one-off reports. Or, if you would like to set recording limits for each recording you create, you could consider creating an identity for each recording with these limits set. In these cases you can use the Facebook token for the customer for each identity.

recording-limits

We recommend that you always apply recording limits to your identities so that you can ensure ongoing service for each of your customers. Imagine that a recording you have running for a customer suddenly sees a large spike in activity. If you have not set an identity limit for the customer this spike could consume a large amount of your overall account recording limit and therefore impact the amount of data recorded for your other customers.

Read more about identities and limits in the Managing Identities guide.

Sharing your API limits across your customers

Your PYLON account is subject to two distinct API rate limits.

Firstly, your overall API rate limit. This limit is fixed for all customers and applies to all API calls except those to the pylon/analyze endpoint.

Secondly, your pylon/analyze endpoint limit. This limit depends on your DataSift package.

api-limits

When serving multiple end customers you need to consider how these rate limits may need to be portioned across each customer. You may also choose to design pricing packages you offer to your customers accordingly.

To portion your pylon/analyze limit across your customers you can use identity analysis query limits. These allow you to set an hourly limit of calls to the endpoint for each identity. As an identity represents an end customer, then essentially you are setting an hourly query limit for each of your customers.

Read more about identities and limits in the Managing Identities guide. The Understanding Limits & Monitoring Usage guide describes how you can also use headers returned by API calls to monitor your API usage.

Managing volumes of individual recordings

Even if you have applied identity limits to each of your customers you may need to consider monitoring the volume of interactions recorded by each recording in your account. For instance, you may start a number of recordings for one end customer (using the same identity), one of the recordings may see a spike which consumes the entire recording limit for the identity, and so impacting the other recordings for the customer.

In cases like this you can choose to pro-actively monitor the volume of each recording and take action if necessary.

You can use the pylon/get endpoint to see how many interactions have been recorded by each recording. You can use pylon/stop to stop a recording of pylon/update to update a the recording's filter conditions as necessary.

Creating interaction filters

You will need to create interaction filters using CSDL to record data for each of your end customers. As a product builder you need to consider how you will manage this process.

You should consider the following options:

  1. Allow your users to create their own filters, using the open source Visual Query Builder component.
  2. Allow your users to create their own filters, building your own user interface from scratch.
  3. Have your own internal team create filters based upon customer requirements.

If you do allow your customers to create filters this task can be a steep learning curve for your users. Building a unique easy-to-use interface for this process is a way to differentiate your product.

Building a user interface with Visual Query Builder

We maintain an open source user interface component that product builders can use as part of their application.

vqb

The component can be styled and customized, and ensures that valid CSDL is generated for interaction filters.

The component is available on GitHub here: https://github.com/datasift/editor-PYLON

Additional guidance for using the component is here: http://dev.datasift.com/tools/query-builder/embed

Creating your own user interface

If you choose to implement your own interface then you will need to consider how you ensure your customers create filters with valid CSDL code.

For instance, you could create a point-and-click interface that allows users to select a number of keywords and topics, then choose locations and demographics and generate CSDL for the filter behind the scenes. This saves your users from needing to learn CSDL.

The recording data guide takes you through the basics of creating interaction filters. You may want to create an interface similar to that which you find in your dashboard for creating PYLON interaction filters.

However you design your interface you will need to consider which targets and operators to allow, and the complexity of boolean logic to support.

You may also want to allows your customers to define tags for classification. The classifying data guide covers the key concepts of classification which you might want to consider.

Managing filters through your internal team

If you choose to manage filter creation for customers then we recommend that you arrange training for your team members. We offer a range of training courses, and offer self-paced training online.

Managing recordings

When you've decided how you will create your filters you will next need to consider how you manage your recordings.

PYLON is designed so that you can run ongoing recordings. You can start and stop recordings at will, and you can update the filter definition used for a recording on-the-fly.

Starting and stopping recordings

You can start, stop and resume recordings using the pylon/start and pylon/stop endpoints.

You might choose to expose this control to your end customers. If you choose to do so, make sure you use identity limits to protect other customers you serve from your account.

The managing recordings guide explains how to control your recordings.

Monitoring & reacting to recorded volumes

Whether you or your end customer is in control of managing recordings you will want keep a close eye on the volumes being recorded by each recording.

You can use the pylon/get endpoint to monitor the volume of interactions in each recording in your account.

You should consider how you might want to react to recording volumes, for instance you may want to:

  • Alert your end customer when they get near their allocated limit
  • Automatically stop recordings when they reach certain volumes
  • Update filter definitions for recordings to reduce the volumes recorded

For more details on monitoring platform usage see our in-depth guide.

Updating recording filter conditions on-the-fly

PYLON allows you to adjust your filter conditions for recordings, even when they are running. You can update a recording's filter using the pylon/update endpoint.

There are a number of reasons why you might want to update a filter:

  • You customer's requirements change.
  • Things in the real world change. If you are recording for example conversations around box office movies, you will want to change your filter as the list of movies changes.
  • You are getting near your limits. If you notice you are recording a high volume of data you can adjust your filter to exclude data.

If you allow your customers to define their own filters, you should allow them to update their filters when they choose. You can call the pylon/update endpoint behind the scenes to apply changes.

Read our blog post for a worked example.

Remembering changes made to filters

It is almost inevitable that for long running recordings you will update the interaction filter over time.

Of course updating a filter will impact your analysis. For example if you add more countries to your recording then you need to be aware of this when you next perform analysis queries.

We recommend you maintain a version history for your recordings' filter definitions. Each time you update a recording's filter you should make a note of the timestamp and the CSDL used for the filter.

Maintaining this record allows you to associate analysis results with the filter running at the time, and also allows you to revert to previous filter conditions if you make a mistake in an update.

If you are allowing your end customer to update filter conditions for recordings then you should also let them access this version history.

Presenting results to your customers

The analysis results you choose to present and how you choose to present these to your customers is critical to the success of your product.

Choosing the right analysis to perform

PYLON allows you to perform time series and frequency distribution analysis queries. You'll use these standard analysis types for your core charts and visualizations. The basics of performing analysis are covered in the analyzing data guide.

To make your product stand out we recommend you include multi-dimensional analysis results in your application. You can use nested queries to perform this analysis.

right-analysis

To see what can be achieved with nested queries see our examples page.

Allowing your users to dig into the data

The pylon/analyze endpoint support three important parameters that you should be aware of, and that you can choose to expose to your users. These parameters allow your users to explore the data in depth.

The start and end parameters allow you to analyze exact time periods. You could consider using these to allow your users to 'zoom in' to specific time periods in their analysis.

The filter parameter allows you to filter to a subset of data in an index before performing the analysis. You could consider allowing your users to select topics, demographic and other aspects of interest in your interface, then perform detailed analysis filtered by the selected values. For example your user could select females only and your analysis results could update to reflect this change.

Create unique results with custom analysis queries

You can also use the filter parameter to perform custom analysis queries and provide unique results in your product.

Imagine there are brand mentions tagged in your recorded data. You could submit a time series analysis query for each brand, using the filter parameter and combine the results on one chart.

unique-results-1

Or you could break down the mentions in regions, again using the filter parameter submitting a query for each region.

unique-results-2

You cannot achieve this analysis result with a nested query because of cardinality rules.

The filter parameter allows you to analyze data in unique ways and make your product distinctive.

Choose visualizations carefully

The same analysis result can be visualized in many ways. How you choose to present a result can make the result more compelling, but also easier to digest.

This chart shows the same data as the map above. The map may be more compelling, but this chart makes the result easier to compare between states.

choose-visualizations-1

Network graphs can be a great way to visualize relationships between topics discussed by your audience. Relationships can be analyzed using the fb.topic_graph and fb.parent.topic_graph targets.

choose-visualizations-2

Read our blog post to learn more.

Allowing your customers to export results

It's important to keep in mind that your customers may want to make use of analysis results elsewhere.

Consider allowing your users to export analysis results to CSV or a spreadsheet format. You might also consider offering integrations into common business applications and systems.

Benefit from super public text samples

One important feature of PYLON you shouldn't overlook is super public text samples.

Displaying super public text samples

As you cannot display raw text from interactions recorded into your index you should consider displaying super public text samples alongside analysis results. This gives your customers the confidence that the analysis relates to conversations they were expecting.

It's important to note that terms for consuming Facebook topic data state that super public text samples can only be displayed within applications that are protected by a customer login.

Removing noise from your filters

Super public samples are also valuable when you are looking to improve your interaction filters.

Inspecting super public samples shows you noise that is being captured by your filter. You can display samples to your user if they are responsible for authoring their own filters so that they can adjust their filter conditions.

Read our pattern on validating interaction filters for more details.

Training machine learned classifiers

If you are looking to classify concepts such as intent and sentiment in conversations you can use super public samples to train a machine learned model, translate this model to VEDO scoring rules and have interactions classified as they are recorded into your index.

Storing super public samples

Remember that when you hit the pylon/sample endpoint this retrieves super public samples cached for your recording, and removes the retrieved samples from the cache.

Therefore you can only retrieve a particular sample once. It's important that you store super public samples you retrieve otherwise they will be be lost.

Retaining results long term

A key feature of PYLON's privacy model is that data is removed from your index after 32 days. If you are looking to provide analysis results to your customers for further in the past you will need to consider how you can work within this limitation.

We recommend you implement an archive of analysis results, storing query results in your database. Aside from allowing you to provide longer term analysis results you can also use your archive to:

  • Conserve your pylon/analyze rate limit - You can serve analysis results you frequently need from your archive which acts as a cache.
  • Improve your application's performance - Analysis queries are computationally expensive and can take a number of seconds to return. You can improve the performance of your application by serving results from your archive.

Implementing an analysis archive

When you implement your archive we recommend the following approach:

  • Decide on the charts and reports you will need data for.
  • Design your database schema to fit serve these reports.
  • Design the analysis queries you will need to submit to fetch the data.
  • Decide on how often you need to submit the queries and create a schedule of queries.
  • Make sure your queries use the start and end parameters to analyze data for only for the required period.
  • Run your queries according to your schedule on an ongoing basis so that you build a growing archive of data.

The key to building a good analysis archive is deciding up front the results you need to present.

Read our design pattern to learn more.

You should also cater for storing super public text samples in your archive to sit along your analysis results.

The risk of double-counting unique authors

When you create an analysis archive you will fill your archive with query results covering distinct periods of time. For instance you might submit a query each day to record the top links shared by an audience.

If you present a list of links for one day to your user then the unique author count will be correct. However, if you decide to roll-up a number of daily results, for instance to create a weekly summary of shared links there is a risk you will double-count authors, as the same author might be part of two distinct daily analysis results.

You have a number of options:

  • Consider this risk acceptable and use the summed unique author count across the 7 days.
  • Submit the same query on a schedule that matches the roll-up report, in this case once a week, and use this result for the weekly report.
  • Report the number of interactions instead of unique authors as there is no double-counting risk here.

Catering for changes to your interaction filter

If you have a long running recording it is highly likely you will update the CSDL definition for the recording as you or your customer's requirements evolve over time.

As discussed above it is important to keep track of the changes. Record each time you update the filter for a recording so that you can tie these changes to your archive.

DataSift training courses

We offer a wide range of training courses that teach you the key concepts of PYLON.

You might find the following course modules helpful:

  • DS-211 - Implementing PYLON for Facebook Topic Data Course:
    • Managing Identities & Tokens Module covers creating and managing identities, tokens and service limits.
    • Writing Interaction Filters Module describes CSDL structure, targets, operators, hydration.
    • Classifying Interactions
    • Writing Analysis Queries

Details for enrolling on courses can be found on our website.