Links Filtering Questions

sfeinstein's picture

I have a few questions about filtering by links. I want to consume facebook and twitter content meeting certain criteria but where any links in the content do NOT have certain substrings in the domain. I have 300 or so of these blacklist substrings.

1) Am I right that the link augmentation and links.domain is the way to go

2) In the content, to be identifeid as a link does a substring have to start with http:// or https://? Will it find and filter things like "blah.com", "www.blah.com", "www.blah.com/foo/bar?param=baz", without the protocol part?

3) I do not need to filter on or process the landing page content for links...will it be included in the output if I use the links augmentation and if so can that be disabled?

4) Is it possible to filter on expanded links (e.g. on whatever a bitly expands to) but NOT the actual final link if redirects are taken into account?

Thank you!

Comments

Jason's picture

1. The Links Augmentation and a combination of links.domain and links.url or links.normalized_url will be the best way to filter. If I shorten a link using something like bit.ly, then post that short link in my Tweet, the twitter.links field returned in the interaction will be the bit.ly link - not the page that this link resolves to, which the links augmentation will do.

2. Some of this processing is actually done on Twitter's side. If you enter 'blah.com' into a Tweet (without any protocol data), Twitter will identify this as a link, and pass it on to us as a link. This is another reason why our links augmentation can be so useful: In cases where people include strings which can be incorrectly identified as a link, for example 'will.i.am', we use our Links Augmentation to let you know that although this may be a link, we could not resolve it. In a case where we failed to resolve a link, links.code would be populated with a 500 or 404, rather than the standard 200 HTTP response code.

3. Using the regular links augmentation, we just return basic information about the link, but unfortunately if a data source is enabled, you can not currently prevent it from returning this information to you - you can however ignore it when you receive it.

4. Take a look at the links.hops target - thats exactly what this is for. It returns every link in the redirect chain excluding the final resolved URL.

sfeinstein's picture

Thanks Jason, for replying to this and another of my questions to quickly and thoroughly. Tell your boss that your customers appreciate the great support you provide!

emeeecom's picture

Jason, Does that mean that using augmentation with links and domain would be the best option to filter posts pointing to a specific domain to include all links including shortened urls. Will this cover all posts including facebook/twitter etc

thanks,

Jason's picture

Yes, the Links Augmentation is perfect for tracking any links that resolve to a certain link or domain. The links augmentation does cover links from both Twitter and Facebook.

corabalaw's picture

the link augmentation it can resolve the cover links of every sites

get fit

Jason's picture

The links augmentation can resolve links to any site. Searching for something like: links.domain in "getfitgal.com" will track any links shared to your site.