Introduction to the Links Augmentation

Browse the Links augmentation targets for a full list of supported CSDL targets available to filter against.

The Links augmentation looks at any links within the content of a message and resolves them to their final endpoint. At the same time it also aggregates parts of the content of the link so that filtering can be performed against the content of the page that the link was pointing at.

DataSift follows all types of shortened links (for example, bit.ly and Twitter's own t.co shortener) and follows each redirect until the final web page is found. The final resolved link is also visible (as links.url) to be filtered against.

The Link augmentation works in near real time; only links which have not previous been discovered are taken out of the real-time flow and are re-inserted (normally in under two seconds) back into the flow of data.

How it Works

Here are the key points you need to know first:

  • We resolve all links even if they are shortened.
  • We follow all redirects through to the final URL.
  • We do this in real time so any new links are instantly resolved.
  • We fetch some content from the page that a link points to, such as the title, and some <meta> fields.

Use Cases

  1. You can filter against the title of a linked page:

    links.title contains "something"

  2. You can filter against specific domains. We use the in operator here rather than contains because this target is an array of strings:

    links.domain in "yahoo.com, nytimes.com"

Multiple Links in One Input Object

An input object might contain more than one link so the Links augmentation is designed to handle multiple links. The targets for the Links augmentation are arrays of strings or arrays of integers. There is one array element for each link. For example, for an Interaction that contains three links, there will be three array elements.

Below is a basic example showing the structure some of the output fields you could expect to receive if an interaction contained links to the homepages of eBay, Google and LinkedIn:

{
  "links": {
    "title": [
      "Welcome! | LinkedIn",
      "Google",
      "Electronics, Cars, Fashion, Collectibles, Coupons and More | eBay"
    ],
    "url": [
      "https://www.linkedin.com",
      "https://www.google.com",
      "http://www.ebay.com"
    ],
    "domain": [
      "linkedin.com",
      "google.com",
      "ebay.com"
    ],
    ...
  }
}

When filtering using CSDL, you perform operations on these arrays as if they were simple strings or integers. For example, the following filter succeeds if it finds a match on at least one row in the array.

links.title contains "Cincinnati Bengals"