links.normalized_url

The normalized version of the original URL. The normalizer performs the following actions on every link that it finds in every incoming interaction:

  • Removes "www."
  • Converts the URL to lower case
  • Removes any of the following:
    • /default.html
    • /default.htm
    • /default.aspx
    • /default.asp
    • /index.php
    • /index.html
    • /index.htm
    • /index.aspx
    • /index.asp
  • Removes trailing slash from the end of the URL
  • Removes any trailing anchor hash tags
  • Removes Urchin Tracking Module tags

Understand that normalization is performed on interactions before they go into DataSift's filtering engine.

For example, if the original URL was this:

http://www.example.com/data/?utm_source=sourceexample

its normalized version would be:

http://example.com/data

Write your CSDL so that it does not filter for elements that are removed by the normalization process. For example, if you filter in the links.normalized_url target for this:

http://www.example.com/data

you will receive no data at all, because "www." is removed from every link in every interaction prior to filtering.

Examples

  1. Filter for posts that contain links that point to a specified page:

links.normalized_url == "http://nytimes.com/2013/05/01/dining/making-lunch-with-michael-pollan-and-michael-moss.html"

Notes

Remember that some URLs may be subject to redirect services such as Captcha. In such a situation, we recommend that you filter against the links.hops target as well as links.normalized_url. If there is a match, this target contains a record of the specified and redirected normalized URL.

links.normalized_url == "http://example.com/mypage?xyz=42"
or links.hops url_in "http://example.com/mypage?xyz=42"

Resource information

Target service: Augmentation Target: Links

Target object: Links: General

Type: array(string)

Array: Yes

Always exists: No