Managing Recordings

Data is recorded into indexes for your analysis. In this guide you'll learn how to manage your indexes and stay within your account limits.

Recordings, indexes and hashes

When you work with PYLON you start by recording data you'd like to analyze into an index. Firstly you compile an interaction filter for which the platform returns a hash. You then use this hash when you start a recording. When you start a recording the platform returns an id for the recording which you use to reference the recording in future.

The terms recording and index are easily confused. Recording is the term used for filtering data from data sources and recording the matching data into an index. Index is the term used for the store of data that results from a recording.

Developer-Guide---Managing-Recordings

What is in my index?

When you start recording data, your first question will be 'What have I managed to record?'. It's important you regularly check what data you have recorded and that you are within your account limits.

How much data have I recorded?

You can use the /pylon/get endpoint to see how much data has been recorded into your indexes and whether you've hit your account limits.

For example in Python you can check the volume for a recording as follows:

from datasift import Client
datasift = Client("your username", "identity API key")
datasift.pylon.get([id for your recording])

The response from the API takes this form:

{
    "volume": 12300,
    "start": 1436085514,
    "end": 1436089932,
    "status": "running",
    "name": "Automotive example",
    "id": "4b12ed1bb6962e2562466f4e749482d8"
}

In this example the number of interactions in the index is 12,300.

You could also perform a time series analysis of the data in your index to see the volume recorded over time.

result = datasift.pylon.analyze([id for your recording], { 'analysis_type': 'timeSeries' })
print(result)

What data have I recorded?

The privacy model of PYLON prevents you from seeing the raw data you have recorded for non-public posts. However you can use analysis queries and super public text samples to understand what interactions are stored in your index.

Super public text samples are posts users have chosen to share publicly. Any of these posts that match your interaction filter are cached alongside your recording. Looking at these posts is a good way to validate your interaction filter. Read more about super public text samples in our guide.

You can use analysis queries to explore your recorded data. For example you might start by analyzing the types of Facebook interactions you are recording:

datasift.pylon.analyze([id for your recording], {
    "analysis_type":"freqDist",
    "parameters":
    {
        "target":"fb.type"
        "threshold": 4
    })

Or perhaps the topics identified in the interactions recorded:

datasift.pylon.analyze([id for your recording], {
    "analysis_type":"freqDist",
    "parameters":
    {
        "target":"fb.topics.name"
        "threshold": 10
    }})

Learn more in our analyzing data guide. See our examples page for more analysis queries you can try.

Data retention

PYLON is designed to allow you to record a 'rolling-window' of data for analysis. When you record data into an index the data is retained for 32 days, this is for privacy reasons. When an interaction is 32 days old it will automatically be removed from your index.

If you need to provide analysis results for a longer period of time you will need to store analysis results from your index outside of the platform in your own data store.

Starting and stopping recordings

You can start and stop recordings any time but for most use scenarios you'll want to run recordings for long periods of time.

Whilst you're exploring PYLON you'll no doubt run short recordings to investigate the data you are recording. When you are serving a customer (for example if you've built a dashboard application for a customer which shows people talking about their brand and their competitors), it's likely you'll want to leave your recording running indefinitely. If you stop your recording and start it again there will be a gap in your data set for that period of time.

You can stop a recording in the dashboard or using the API.

In the dashboard click the PYLON tab, and then click My PYLON Filters. Here you can pause, resume and stop recordings.

When using the API call the pylon/stop endpoint.

from datasift import Client
datasift = Client("your username", "valid identity API key")
datasift.pylon.stop('[id for your recording]')

You can resume a recording in the dashboard or using the pylon/start API endpoint.

Monitoring recording limits

It is important that you monitor your recordings, firstly to stay within your platform limits and secondly to check that the data you are recording is complete and accurate.

The important limits you need to keep in mind are:

  • The maximum number of recordings you can run at any time (as specified in your account package details)
  • The maximum number of interactions you can record in a month which is enforced as a daily limit (as specified in your account package details)
  • A maximum of 1 million interactions can be recorded into an index each day

Platform and account limits you are detailed on the Platform Allowances page.

Concurrent recording limit

Your package will specify the maximum number of recordings you can run simultaneously.

You can monitor the number of recordings you have running by hitting the /pylon/get endpoint. Use your account name and account API key to get a full list of your recordings and count the number that are running.

Account recording limit

Your package also specifies your allowed index capacity per month for your account. This translates to a daily, account-level interaction limit based on the following formula:

Daily account limit* = Index capacity / 30 days

*The daily account limit is always rounded up to the nearest 1 million. For example, if your index capacity is 12 million, your daily account limit would be 1 million.

If you hit your limit then no more interactions will be recorded to any of your indexes for the remainder of the day (until midnight PST).

You can monitor the volume you have recorded in the current day by hitting the /pylon/get endpoint. Regardless of whether you request information for one or all your recordings you will receive data similar to the following:

{
    "id":"abc123abc123abc123abc123abc123ab",
    "volume":2000,
    "reached_capacity":false,
    "remaining_index_capacity":10000,
    "remaining_account_capacity":20000
}

The remaining_account_capacity value is how many more interactions you can record for the current day for your account.

You will be sent notifications when you reach 50%, 90% and 100% of your daily account-level recording limit. You can configure notification options for your account by visiting Notification Preferences in account settings.

Per recording limit - 1 million interactions per day

Regardless of your account package, you can only record 1 million interactions per day in a recording.

If this limit is limiting your application then see our Designing with Filters, Indexes and Queries guide for workarounds.

You can monitor how much data has been stored by a recording using the /pylon/get endpoint.

A typical response might be:

{
    "id":"abc123abc123abc123abc123abc123ab",
    "volume":2000,
    "reached_capacity":false,
    "remaining_index_capacity":10000
}

The key fields to take note of are:

  • reached_capacity - Whether your index has hit the daily limit
  • remaining_index_capacity - The remaining daily limit for the index

Deleting recordings

It is not possible to delete interaction filters or the associated recordings and indexes. All interactions saved to an index during a recording expire after 32 days.

Next steps...

Now that you know more about managing your recordings we recommend you take a look at the following resources: