We've moved





Our Discussion Forums have moved here.




Links Filtering Questions

sfeinstein's picture
Posted by sfeinstein

I have a few questions about filtering by links. I want to consume facebook and twitter content meeting certain criteria but where any links in the content do NOT have certain substrings in the domain. I have 300 or so of these blacklist substrings.

1) Am I right that the link augmentation and links.domain is the way to go

2) In the content, to be identifeid as a link does a substring have to start with http:// or https://? Will it find and filter things like "blah.com", "www.blah.com", "www.blah.com/foo/bar?param=baz", without the protocol part?

3) I do not need to filter on or process the landing page content for links...will it be included in the output if I use the links augmentation and if so can that be disabled?

4) Is it possible to filter on expanded links (e.g. on whatever a bitly expands to) but NOT the actual final link if redirects are taken into account?

Thank you!

1 year 11 months ago