links.normalized_url

The normalized version of an original URL. The normalizer performs the following actions on every link that it finds in every incoming interaction:

  • Removes "www."
  • Converts the URL to lower case
  • Removes any of the following:
    • /default.html
    • /default.htm
    • /default.aspx
    • /default.asp
    • /index.php
    • /index.html
    • /index.htm
    • /index.aspx
    • /index.asp
  • Removes trailing slash from the end of the URL
  • Removes any trailing anchor hash tags
  • Removes Urchin Tracking Module tags

Understand that normalization is performed on interactions before they go into DataSift's filtering engine.

For example, if the original URL was this:

http://www.example.com/data/?utm_source=sourceexample

its normalized version would be:

http://example.com/data

Write your CSDL so that it does not filter for elements that are removed by the normalization process. For example, if you filter in the links.normalized_url target for this:

http://www.example.com/data

you will receive no data at all, because "www." is removed from every link in every interaction prior to filtering.

Examples

  1. Filter for posts that contain links that point to a specified page:

    links.normalized_url == "http://nytimes.com/2013/05/01/dining/making-lunch-with-michael-pollan-and-michael-moss.html"

Notes

Also see the Filtering by Shared Links example.

The links.* targets contain any link in an interaction. They are more frequently populated than the fb.link and fb.parent.link targets.

Resource information

Target service: PYLON for Facebook Topic Data

Target object: Links

Type: array(string)

Array: Yes

Tokenized for query filters: Yes

Interaction filter: Yes

Analysis target: Yes

Query filter: Yes

Child analysis target: No