Sample up to 100 interactions per hour per recording from the Super Public feed.

An HTTP GET or POST request sent to:

A successful call to this endpoint returns: 200 OK plus a JSON object. If you include the optional filter parameter, call this endpoint with POST.

Note that samples are returned in reverse chronological order. The platform will deliver interactions until it reaches one of:

  • the limit you specified with the count parameter.
  • your sample limit (of 100 interactions per hour), if you hit that before you reach your count.
  • the last interaction available, if there are fewer interactions in the cache than you requested.

Although you can only retrieve 2,400 super public posts per day from the API you can use the filter parameter and time ranges (start and end parameters) to find the most relevant posts.

Learn more by reading our making use of super public data guide.

note icon

Blank posts are excluded from sampling. That is, if the only populated value is the media type, we consider the post to be blank and do not include it in the sample cache. This will happen, for example, if an author posts a photo but does not add a comment.


Parameter Description

The id of the recording you want to sample.


The number of interactions you want to sample. If specified, it must be an integer between 10 and 100.

Default: 10.

If there are currently fewer than 100 interactions available and you omit this paramater, you will receive all the available interactions (but remember that you are limited to 100 interations per recording per hour).


Optional start timestamp for sampling by date.

Time ranges are treated as inclusive on the start and exclusive on the end.


  • If you omit the start and end parameters, the platform will deliver the most recent interactions.
  • If you specify a start time but no end time, the platform will deliver the most recent interactions working backwards up to the start time specified.
  • If you require samples from a specific time period we recommend you specify both a start and end time.

Optional end timestamp for your sample.


CSDL for a query filter. You can use any query filter target except* and fb.sentiment.

You do not need to specify a query filter. You can use it to narrow down the scope of your sample. For example, you interaction filter might load posts about automobiles into your index but you could use the query filter to take a sample of that data, restricting the output to just one brand.


Sample request

Retrieve the 50 most recent interactions in the sample queue and delete them from the queue:

curl -X GET '' 
    -H 'Content-type: application/json' 
    -H 'Authorization: username:api_key'

Retrieve up to 100 interactions in the sample queue and delete them from the queue. Note that this example uses a query filter to limit the results:

curl -X POST '' 
    -d '{"id": "d1b7d73b47c639ea3cc290595bca888ca4388afe", "filter":"fb.content contains \"Lamborghini\""}' 
    -H 'Content-type: application/json' 
    -H 'Authorization: username:api_key'


HTTP/1.1 200 OK

  "remaining": 88,
  "reset_at": 1453911838,
  "interactions": [{
        "fb": {
            "content": "I love how the rear seats fold flat in the BMW X5",
            "language": "en",
            "hashtags": [
            "topics": [{
                "name": "BMW",
                "id": 565634324
            "topic_ids": [
        "interaction": {
            "media_type": "photo",
            "subtype": "story",
            "content": "I love how the rear seats fold flat in the BMW X5",
            "created_at": "Thu, 21 Jan 2016 16:36:04 +0000",
            "id": "079701744092c80b6ee07044959243c3"
        "tag_tree": {
            "automotive": {
                "bmw": {
                    "X_Series": [
        "links": {
            "code": [
            "domain": [
            "normalized_url": [
            "url": [

Note that you may see a greater number of topic ids provided for an interaction than topic names. This can occur because the DataSift platform itself maps topic ids to names, these are not provided by the source feed. The list of topics in the Facebook graph is vast and constantly changing. The platform maintains a percentage of the most frequently used topics for this mapping process. Not all topics exist in this map therefore some ids cannot be mapped and you may see this mismatch.

Output Fields

Property Type Description
remaining string The number of interactions you can retrieve this hour for this hash before you hit your sample limit.
reset_at int A Unix timestamp indicating when your sample limit will be reset.


Response code Description
Status 200 OK See above for example of the JSON output.
Status 400 Bad Request

The CSDL is invalid or was not provided:

  "error": "start must be an integer"

  "error": "start must be in the past"

  "error": "end must be an integer"

  "error": "end must be in the past"

  "error": "end must be after start"

404 Not Found
  "error": "Subscription not found"
429 Too Many Requests
  "error": "Exceeded Hourly API rate limit for sample interactions",
  "reset_at": <timestamp>


We offer sampling to help you check that your filters are working as you expect and because sample data is useful in machine learning applications. Stories that are "Super Public" are available to sample. You can use the /pylon/sample endpoint to retrieve the actual content of the stories in JSON format. To qualify as Super Public, a story must:

  • be posted by someone who has “Who can see your future posts?” set to “Public” under their Privacy Settings.
  • be posted by someone who has the Follow setting enabled, allowing non-friends to see their stories.
  • not be posted to someone else’s Timeline.

Interactions available for sampling are held in a queue. When you retrieve sample interactions they are deleted from the queue.

Stories are available for sampling but engagements are not available.

Some data elements are not available in the sample data you receive:

  • fb.sentiment

How many interactions will I receive?

The number of interactions you receive depends on three factors:

  • The count parameter.
  • The actual number of interactions available in the queue.
  • Your sample limit.

The sample limit allows you to receive 100 sampled interactions for each of your recordings per hour.

The sample limit is based on the number of interactions that you retrieve (using the /pylon/sample endpoint) not on the number of interactions that you record. In other words, you don't need to worry about how much data you put into your index, just how many samples you retrieve from it.

Posts retrieved from previous hours can be included in the current hour's sample limit if there are fewer than 100 relevant posts in the current hour. To maximize the number of interactions you received you can programmatically hit the endpoint every hour to make sure you collect 2,400 interactions per day. For example, let's say there were 2,400 Super Public posts during Game of Thrones last night. If you specify the same time range in one request every hour today, you would receive all of those.


  1. All calls to the API must be properly authenticated with a DataSift username and API key.
  2. All calls to the API must be versioned. The current version is v1.3.
  3. Note that the username is the one you use to log in to and to make calls to any of the REST endpoints but the API key is the one that was returned by your call to the  POST /account/identity endpoint. That is, it is an identity-based API key.
  4. Take a look at our Changelog page to review the changes we've made to the DataSift API over time.

Revision history


Up to v1.2 this endpoint required the hash of the CSDL interaction filter you wanted to sample.

From v1.3 it takes the id of the recording you want to sample. The id is supplied in the JSON returned by your call to the /pylon/start endpoint.

Resource information

Rate limit cost: 25

Requires authentication: Yes

Response formats: JSON, JSONP