source/create

Create a Managed Source.

An HTTPS POST request sent to:

https://api.datasift.com/v1.3/source/create

A successful call to this endpoint returns: 201 Created plus a JSON object.

Parameters

Parameter Description
source_type
required

Data source name. A string:

name
required

Your own name for the source. A string.

Example values: Automotive Facebook pages

parameters
required

Source-specific configuration.

The key-value pairs used here will depend on the value of the source_type parameter. Click the links above for full details.

resources
required

An array of source-specific resources. Click the links above for full details.

Remember that resources is an array

The resources parameter is an array so you can combine several definitions in a single JSON block, of the same type or different types. For instance, Instagram takes five different types: user, tag, area, location, and popular, which can all be combined in the same resource array as show in the following example:

auth
required

An array of source-specific identities. Click the links above for full details.

Remember that auth is an array

The auth parameter is an array so you can combine several definitions in a single JSON block. This final example shows how to pass three access tokens at once:

validate
optional

Allows you to suppress validation of your token with the source. Can be:

  • true, t, or 1
  • false, f, or 0

Defaults to true.

If you set this parameter to false you will see a performance improvement when creating a new Managed Source. However, if DataSift does not validate your token you will not see any warning or error messages until you attempt to use this new Managed Source.

Examples

Here is an example call that creates a Facebook Pages managed source to monitor two pages, with one access token used for authentication.

curl -X POST https://api.datasift.com/v1.3/source/create 
    -d 'source_type=facebook_page' 
    -d 'name=news_source' 
    -d 'parameters={"comments": true, "likes": true,"posts_by_others": false, "page_likes": false}' 
    -d 'resources=[{"parameters":{"title":"The Guardian","id":10513336322}},{"parameters":{"title":"BBC News","id":228735667216}}]' 
    -d 'auth=[{"parameters": {"value":"EZBXlFZBUgBYmjHkxc2pPmzLeJJYmAvQkwZCRdm0A1NAjidHy1h"}, "expires_at":2112112110}]' 
    -H 'Authorization: datasift-user:your-datasift-api-key'

Let's break down this call into detail.

We are interested in comments and likes to posts created by the page owners, but not posts by other people on the page's wall. This is configured through the top level source parameters object, whose fields vary depending on the source type:

{
  "comments": true,
  "likes": true,
  "posts_by_others": false,
  "page_likes": false
}

The resources (pages in this case) to be tracked need to be wrapped in a JSON-array, containing one object per page. The Facebook id of the page, along with some useful metadata, is stored under the parameters key of that object:

resources = [
  {
    "parameters": {
      "title": "The Guardian",
      "id": "10513336322"
    }
  },
  {
    "parameters": {
      "title": "BBC News",
      "id": "228735667216"
    }
  }
]

The reason for nesting the resource parameters inside a separate object will become apparent once we inspect the reply we get back. To pass the array along with the request, we need to URL-encode it first.

Similar to resources, the authentication credentials are wrapped in a JSON-array, containing one object per credential. In addition to the access token, we can optionally provide an expiration date for the token as a UNIX timestamp. This will allow DataSift to generate an automated notification five days prior to the expiration day, to alert you of the imminent invalidation of the token in order to replace it.

auth = [
  {
    "expires_at": 2112112110,
    "parameters": {
        "value": "EZBXlFZBUgBYmjHkxc2pPmzLeJJYmAvQkwZCRdm0A1NAjidHy1h"
    }
  }
]

Finally, we need to specify the type and name of the source, as well as our DataSift authentication credentials. Bringing us to the request above.

The response looks like this:

{
  "id": "da4f8df71a0f43698acf9240b5ad668f",
  "name": "news_source",
  "source_type": "facebook_page",
  "created_at": 1391707662,
  "status": "stopped",
  "parameters": {
    "comments": true,
    "likes": true,
    "posts_by_others": false,
    "page_likes": false
  },
  "resources": [
      {
          "parameters": {
              "id": "10513336322",
              "title": "The Guardian"
          },
          "resource_id": "706af7cce1484d098553a2be580fa3bb"
      },
      {
          "parameters": {
              "id": "228735667216",
              "title": "BBC News"
          },
          "resource_id": "b4e2ae889ff346129d687fa4c56caed2"
      }
  ],
  "auth": [
    {
      "expires_at": 2112112110,
      "parameters": {
        "value": "EZBXlFZBUgBYmjHkxc2pPmzLeJJYmAvQkwZCRdm0A1NAjidHy1h"
      },
      "identity_id": "d38e5598142746e19689ddee65ddca55"
    }
  ]
}

You will notice that all the data we sent has been returned with some new fields. In particular, the source itself, along with each resource and authentication credential (identity) has been associated a unique identifier. So we have, id for the source id, resource_id for each resource id, and identity_id for each authentication token.

The nested parameters field for resource and auth objects should now make more sense, as it cleanly separates source specific parameters with fields common across all sources, like the identifiers, or the expiration date.

The source id field is useful for all other calls to the API, while the resource and identity ids are necessary for making calls to the /source/update endpoint.

Finally, the status field indicates the current status of a source, which will be either stopped or running.

Output Fields

Property: Description:
auth The list of source-specific authentication credentials.
created_at The source creation timestamp.
comments Found in the Facebook Pages, Google+, and Instagram sources, inside the parameters element. Present when the source is supposed to deliver interactions that represent comments.
distance The radius (measured in meters) of a circle (centered at lat/lng) for Instagram area or location updates.
event_types The type of Google+ updates provided by the parent resource. When present, found once per resource.
exact_match Search match scope for Instagram updates.
expires_at Expiry time and date for an authentication token.
extract_links Links augmentation algorithm. Found in Yammer resources.
foursq Foursquare id for Instagram updates.
id The source id or the Facebook id of a Facebook Page.
lat Latitude for Instagram updates.
likes Found in the Facebook Pages and Instagram sources, inside the parameters element. Present when the source is supposed to deliver like interactions.
lng Longitude for Instagram updates.
name The name of a source or a set of authentication credentials. Each source has one name element and at least one set of authentication credentials.
parameters Source- or resource-specific parameters. Each source and each resource has one parameters element. Alternatively, parameters of an authentication token.
plus_ones Found in the Google+ source, inside the parameters element. Present when the source is supposed to deliver interactions that represent +1 events.
posts_by_others Found in the Facebook Pages source, inside the parameters element. Present when the source is supposed to deliver interactions related to posts created on the monitored page by the users who interact with that page, but do not administer it.
refresh_token Google+ refresh token.
resources The list of all resources defined for the current source. For additional information, see /source/create.
source_type The source type.
status The status of a source. Each source has one status element.
title The title of a Facebook Page. Found once in each resource listed in the resources element.
type The type of Google+ or Instagram updates provided by the parent resource. Found once in each resource listed in the resources element.
url The URL of a Facebook Page. Found once in each resource listed in the resources element.
user_id The Google+ id of the user whose updates are provided by the parent resource. When present, found once per resource.
search_string The type of Google+ updates provided by the parent resource. When present, found once per resource.
value Can be an Instagram user id, when found in an Instagram resource; or an authentication token when found in the auth element.

Errors:

If there is an error, we return an HTTP 4xx status code, along with a message explaining the error.

If you try to create a source with no resources: 400 Bad Request.

{"original_error": "resources must have at least 1 entry", "error": "Bad request: resources must have at least 1 entry"}

If you try to create a source without having signed the appropriate license: 403 Forbidden

{"error": "The license has not been signed for this source. Please sign the license at datasift.com"}

If you try to create a source with a duplicate token: 409 Conflict

{"error": "A source with the same auth token already exists, or the same resource has been added more than once to this source"}

Managed sources experienced an internal error: 500 Internal Server Error

{"error": "There was an error loading the requested source."}

Notes

  1. All calls to the API must be properly authenticated with a DataSift username and API key.
  2. Source, resource and auth specific fields are provided as part of the various parameters objects described above.
  3. Sources, resources and auth objects are associated with a unique ID in the API response, which can be used to refer to them when retrieving or updating a managed source.
  4. You will encounter problems if you use the Facebook Graph API Explorer to generate tokens. Those are 'short-lived tokens' which are not suitable for use with DataSift. Instead, generate tokens via the Facebook API or in DataSift's UI.
  5. We recommend that you do not submit more than 100 pages or terms at a time when creating or updating sources.
  6. All calls to the API must be versioned. The current version is v1.3.

Resource information

Rate limit cost: 25

Requires authentication: Yes

Response formats: JSON, JSONP