Tagging Links - Content Publishers

One of the quickest ways to add value to data in PYLON is classification using tags.

In this simple example we'll take a look at building a classifier which groups domains of shared links in to useful classes, relating to movie content.

tip icon

Tagging allows you to add labels to interactions as they are recorded which can be used later in your analysis.

Our developer guide gives you an introduction to the concepts of classification.

Identifying Classes

To build a classifier the first step is to spend time identifying the classes you want to identify.

This example classifier groups shared content into movie-focused content publishers based on domain. It might take you a few attempts to perfect your set of classes, but this is time well spent as it is key to the quality of your analysis.

For this classifier we identified the following classes of interest:

  • Video - Video sharing sites such as YouTube and Vimeo
  • Social Networks - Social networks such as Twitter and Google+
  • File Sharing - File sharing sites
  • Link Shortening - Link shortening services (where the link is not resolved to a final useful site)
  • Link Publishing Service - Services such as Feedburner and dlvr.it
  • Music Streaming - Music services such as Spotify and Last.fm
  • Shopping - Online retailers such as Amazon and Etsy
  • Broadcasters - Sites relating to TV broadcasters
  • News - Online sites for major newspapers
  • Online News / Blogs - Online news sites and blogs
  • Hollywood/Movie/Celeb/Gossip - Magazine and news sites for celebrity news
  • Fashion - Fashion-related magazine and news sites
  • Sports - Sports news sites
  • Anime/Manga/Comic/Fun - Comics and fun content
  • Reviews - Popular movie review sites

We decided on this list after creating an initial recording for people discussing box office movies and analyzing the links being shared.

Building the Classifier

This simple classifier groups domains from shared links. Here we can make use of the links.* targets. The links targets expose any links shared in stories or engagements.

Note that if a link is shared in a story or an engagement it will be surfaced in the links.* targets. So there is no need to specify different targets to capture both stories and engagements here.

Each of the tags will follow this syntax:

tag.movies "[class]" { links.url contains_any "[list of domains]" }

Looking at the syntax in detail:

  • tag - declares this this a tag rule
  • .movies - states this tag is in the 'movies' namespace
  • [class] - is the label for the class
  • links.url contains - looks for the listed links in the content of both stories and engagements

Looking at the 'Video' class as an example, if we want to look for YouTube domains the tag would be:

tag.movies "Video" { links.url any "youtube.com,youtu.be"}

Repeating the process for all tags and including all our domains, our final classifier becomes:

tag.movies "Video" { links.url any "wazzalau.com,theladbible.com,youtube.com,vimeo.com,youtu.be,musiclove.fm,buzcast.com,hulu.com,glovishare.com,netflix.com"} 
tag.movies "Social Networks" { links.url any "path.com,en.m.wikipedia.org,m.yahoo.com,docs.google.com,fb.me,en.wikipedia.org,bing.com,search.yahoo.com,plus.google.com,rebelmouse.com,twitter.com,t.co,ask.fm,tvshowtime.com,reddit.com,pic.twitter.com,swarmapp.com,shots.com,flickr.com,pinterest.com,beta.twitmusic.com,uk.yahoo.com,instagram.com,tumblr.com,vine.co,twimg.com,google.co.uk,weheartit.com"} 
tag.movies "File Sharing" { links.url any "promodj.com,dpsradio.com,1063.mobi,theonlinewatchmovies.com,mediafire.com,streamdb3web.securenetsystems.net,makeavoice.com,paper.li,ustream.tv,gifini.com,kutthroatrecordz.com,i.imgur.com,dailymotion.com,listen.radionomy.com,stream.ngaradio.org,blogtalkradio.com,cotnradio.com,top-collections.try-before-you-buy.com,rootdownfm.com,twitch.tv,top-songs.try-before-you-buy.com,datpiff.com,stationportal.com,ff2-us.funplusgame.com,newzcard.com,imabigfanof.criminalcasegame.com,imgur.com,streamdb4web.securenetsystems.net,hiphopencounter.com,monstermmorpg.com,streamlicensing.com,azmovielist.net,streamdb3.securenetsystems.net,streamdb5web.securenetsystems.net,player.radioloyalty.com,top-movies.try-before-you-buy.com,live365.com,luufy.com,trailer7.com,fileparadox.com,twitpic.com"} 
tag.movies "Link Shortening" { links.url any "mvp.to,rss2twi.com,cur.lv,po.st,tiny.cc,bitly.com,fw.to,bit.ly,smarturl.it,wp.me,linkis.com,blogtrottr.com,adf.ly,gmodules.com,ziddu.com,ht.ly,zurl.ir,api.twitter.com,getm.pt,ow.ly,tinyurl.com,goo.gl"} 
tag.movies "Link Publishing Service" { links.url any "readfulapp.com,feeds.feedburner.com,dlvr.it,snsanalytics.com,weeder.org,publisher.vegmomos.com,ift.tt"} 
tag.movies "Music Streaming" { links.url any "kisselpaso.com,somafm.com,radionomy.com,mixcloud.com,soundcloud.com,last.fm,play.anghami.com,play.spotify.com,tunein.com,urbanradio.leanplayer.com"} 
tag.movies "Shopping" { links.url any "epicmobonline.com,ebay.co.uk,ebay.com,gekoo.co,amzn.to,deals.ebay.com,ebay.to,itunes.apple.com,etsy.com,amazon.co.uk,play.google.com,listia.com,beatport.com,500px.com,amightygirl.com,amazon.com,sunfrogshirts.com,search.itunes.apple.com,theatermania.com,ebay.ca,paypal.com,ebsp.co.uk"} 
tag.movies "Broadcasters" { links.url any "nbcnews.com,edition.cnn.com,msnbc.com,hlntv.com,money.cnn.com,cnbc.com,tvline.com,mtv.co.uk,uk.eonline.com,nbc.com,mtv.com,bbcamerica.com,www.today.com,itv.com,cbsnews.com,pbs.org,abc.net.au,abcnews.go.com,vh1.com,gmanetwork.com,cbs.com,bbc.co.uk,bbc.com,aljazeera.com"} 
tag.movies "News" { links.url any "telegraph.co.uk,washingtontimes.com,usatoday.com,nytimes.com,independent.co.uk,latimes.com,wsj.com,nydailynews.com,time.com,dailymail.co.uk,bostonglobe.com,theguardian.com,washingtonpost.com"} 
tag.movies "Online News / Blogs" { links.url any "di.sn,onceuponatimeabc.wikia.com,businessinsider.com,huffingtonpost.com,thenerve.us,examiner.com,reuters.com,comingsoon.net,mediamatters.org,capcy.com,blog.vh1.com,canewse.com,blogs.indiewire.com,theroot.com,planetnewsworld.wordpress.com,upworthy.com,news-junkies.com,salon.com,cbc.ca,news24.com,sciencealert.com,aquavibes.blogspot.co.uk,ora.tv,gizmodo.com,buzzfeed.com,gawker.com,uk.businessinsider.com,hellogiggles.com,newslocker.com,essence.com,zenexp.com,moviepilot.com,faithtap.com,loveactual.net,inthenews.thetalkingsloth.com,nblo.gs,dorovibes.com,ihorror.com,thefreethoughtproject.com,flyheight.com,io9.com,vox.com,news.sky.com,uniladmag.com,dailygeeky.com,news.yahoo.com,uk.ign.com,popwatch.ew.com,business2community.com,hollywoodelevator.com,atlantablackstar.com,aol.it,bbcrepeater.tumblr.com,thedailybeast.com,dailykos.com,rt.com,newsbusters.org,twnewsjp.com,mashable.com,news.google.com,globalgrind.com,littlethings.com,blogs.disney.com,techcrunch.com,westernjournalism.com,thebingbing.com,on.cc.com,dailydot.com,npr.org,viralphotos.kathelpedmegain.com,usnewse.com,breitbart.com,uknewse.com,superindykingsblog.com,labnol.asia,needtocheck.com,rawstory.com,theblaze.com,metalinjection.net,empirenews.net,dragplus.com,theverge.com,boredpanda.com,ww.itimes.com,mediaite.com,javaskop.com,omgfacts.com,addictinginfo.org,huffingtonpost.co.uk"} 
tag.movies "Hollywood/Movie/Celeb/Gossip" { links.url any "etonline.com,vanityfair.com,zimbio.com,whosay.com,twilight-gossip.com,dramafever.com,hollywoodlife.com,bet.us,gossipgawker.com,tmz.com,celebritybabies.people.com,sugarscape.com,jezebel.com,koreaboo.com,uproxx.com,tryhairstyle.com,popsugar.com,thewrap.com,gossipdawg.com,wet.pt,ikwiz.com,perezhilton.com,justjared.com,dailyexo.tumblr.com,usmagazine.com,ohkpop.com,mirror.co.uk,hollywoodreporter.com,allkpop.com,smb2stfinitesubs1.blogspot.co.uk,kpopstarz.com,vulture.com,zirkovi.com,people.com"} 
tag.movies "Fashion" { links.url any "instyle.com,luckymag.com,papermag.com,vogue.fr,esquire.com,gq.com,on.allure.com,anothermag.com,elle.co.jp,on.elle.com,cosmopolitan.co.uk,seventeen.com,oceandrive.com,dazeddigital.com,vogue.co.uk,vmagazine.com,wmagazine.com,vogue.it,tumblr.instyle.com,harpersbazaar.com,complex.com,marieclaire.com,video.vogue.com,media.wwd.com,tv.esquire.com,marieclaire.co.uk,madamenoire.com,vogue.com,elle.com,video.gq.com,flare.com,origin-elle.com,refinery29.com,vogue.es,elle.fr,en.vogue.fr,style.com,hintmag.com,glamourmagazine.co.uk,ellecanada.com,wwd.com,lucire.com,nylonguysmag.com,teenvogue.com,uk.complex.com,fashionallure.com,glamour.com,vogue.com.au,ca.complex.com,swinglifestyle.com,cosmopolitan.com,allure.com,gq-magazine.co.uk,dailyelle.fr,news.instyle.com"} 
tag.movies "Sports" { links.url any "sbnation.com,nfl.com,nba.com,espn.go.com,m.espn.go.com,skysports.com,ninerfans.com,scores.espn.go.com,bleacherreport.com,betprepared.com,wwe.com,cbssports.com"} 
tag.movies "Fundraising/Campaign/Community/Religion" { links.url any "jackandjackthemovie.vhx.tv,change.org,conservativetribune.com,kickstarter.com,secure.avaaz.org,mojahedin.com,tpnn.com,fahlo.me,ndtv.com,violencefreefamilies.org.au,patreon.com,mankind.org.uk,puritandownloads.com,mediahoarders.com,indiegogo.com,hotpeachpages.net,conservativereport.org,gofundme.com,thehotline.org,blog.theveteranssite.com"} 
tag.movies "Anime/Manga/Comic/Fun" { links.url any "collective-evolution.com,i.ntere.st,action.18mr.org,goodanime.net,brainfall.com,quizfreak.com,myanimelist.net,comicbookmovie.com,animenewsnetwork.com,avclub.com,comicbook.com"} 
tag.movies "Review" { links.url any "openculture.com,slate.com,forbes.com,shazam.com,slashfilm.com,spoilertv.com,newyorker.com,dorkly.com,polygon.com,wired.com,playbill.com,insidemovies.ew.com,screenrant.com,fandango.com,mentalfloss.com,insidetv.ew.com,reason.com,cinemablend.com,engadget.com,radiotimes.com,cnet.com,imdb.com,billboard.com,denofgeek.com,rollingstone.com,deadline.com,variety.com"}

You can see the final classifier in the library.

Applying the Classifier to a Recording

It's easy to apply a set of tags to an interaction filter and create a recording.

For this example let's say we have an interaction filter already defined with the following CSDL:

return { 
    fb.topics.category in "Movie,Film,TV Network,TV Programme,TV Channel,TV Show,TV/Movie Award" 
    OR fb.parent.topics.category in "Movie,Film,TV Network,TV Programme,TV Channel,TV Show,TV/Movie Award" 
}

Firstly you need to ensure your filter conditions are encapsulated in a return statement. A return statement is mandatory when using tags in a filter.

Then you can include your tags, either by adding them before the return statement:

tag.movies "Video" { links.url any "wazzalau.com,theladbible.com,youtube.com,vimeo.com,youtu.be,musiclove.fm,buzcast.com,hulu.com,glovishare.com,netflix.com"} 
tag.movies "Social Networks" { links.url any "path.com,en.m.wikipedia.org,m.yahoo.com,docs.google.com,fb.me,en.wikipedia.org,bing.com,search.yahoo.com,plus.google.com,rebelmouse.com,twitter.com,t.co,ask.fm,tvshowtime.com,reddit.com,pic.twitter.com,swarmapp.com,shots.com,flickr.com,pinterest.com,beta.twitmusic.com,uk.yahoo.com,instagram.com,tumblr.com,vine.co,twimg.com,google.co.uk,weheartit.com"} 
tag.movies "File Sharing" { links.url any "promodj.com,dpsradio.com,1063.mobi,theonlinewatchmovies.com,mediafire.com,streamdb3web.securenetsystems.net,makeavoice.com,paper.li,ustream.tv,gifini.com,kutthroatrecordz.com,i.imgur.com,dailymotion.com,listen.radionomy.com,stream.ngaradio.org,blogtalkradio.com,cotnradio.com,top-collections.try-before-you-buy.com,rootdownfm.com,twitch.tv,top-songs.try-before-you-buy.com,datpiff.com,stationportal.com,ff2-us.funplusgame.com,newzcard.com,imabigfanof.criminalcasegame.com,imgur.com,streamdb4web.securenetsystems.net,hiphopencounter.com,monstermmorpg.com,streamlicensing.com,azmovielist.net,streamdb3.securenetsystems.net,streamdb5web.securenetsystems.net,player.radioloyalty.com,top-movies.try-before-you-buy.com,live365.com,luufy.com,trailer7.com,fileparadox.com,twitpic.com"} 
tag.movies "Link Shortening" { links.url any "mvp.to,rss2twi.com,cur.lv,po.st,tiny.cc,bitly.com,fw.to,bit.ly,smarturl.it,wp.me,linkis.com,blogtrottr.com,adf.ly,gmodules.com,ziddu.com,ht.ly,zurl.ir,api.twitter.com,getm.pt,ow.ly,tinyurl.com,goo.gl"} 
tag.movies "Link Publishing Service" { links.url any "readfulapp.com,feeds.feedburner.com,dlvr.it,snsanalytics.com,weeder.org,publisher.vegmomos.com,ift.tt"} 
tag.movies "Music Streaming" { links.url any "kisselpaso.com,somafm.com,radionomy.com,mixcloud.com,soundcloud.com,last.fm,play.anghami.com,play.spotify.com,tunein.com,urbanradio.leanplayer.com"} 
tag.movies "Shopping" { links.url any "epicmobonline.com,ebay.co.uk,ebay.com,gekoo.co,amzn.to,deals.ebay.com,ebay.to,itunes.apple.com,etsy.com,amazon.co.uk,play.google.com,listia.com,beatport.com,500px.com,amightygirl.com,amazon.com,sunfrogshirts.com,search.itunes.apple.com,theatermania.com,ebay.ca,paypal.com,ebsp.co.uk"} 
tag.movies "Broadcasters" { links.url any "nbcnews.com,edition.cnn.com,msnbc.com,hlntv.com,money.cnn.com,cnbc.com,tvline.com,mtv.co.uk,uk.eonline.com,nbc.com,mtv.com,bbcamerica.com,www.today.com,itv.com,cbsnews.com,pbs.org,abc.net.au,abcnews.go.com,vh1.com,gmanetwork.com,cbs.com,bbc.co.uk,bbc.com,aljazeera.com"} 
tag.movies "News" { links.url any "telegraph.co.uk,washingtontimes.com,usatoday.com,nytimes.com,independent.co.uk,latimes.com,wsj.com,nydailynews.com,time.com,dailymail.co.uk,bostonglobe.com,theguardian.com,washingtonpost.com"} 
tag.movies "Online News / Blogs" { links.url any "di.sn,onceuponatimeabc.wikia.com,businessinsider.com,huffingtonpost.com,thenerve.us,examiner.com,reuters.com,comingsoon.net,mediamatters.org,capcy.com,blog.vh1.com,canewse.com,blogs.indiewire.com,theroot.com,planetnewsworld.wordpress.com,upworthy.com,news-junkies.com,salon.com,cbc.ca,news24.com,sciencealert.com,aquavibes.blogspot.co.uk,ora.tv,gizmodo.com,buzzfeed.com,gawker.com,uk.businessinsider.com,hellogiggles.com,newslocker.com,essence.com,zenexp.com,moviepilot.com,faithtap.com,loveactual.net,inthenews.thetalkingsloth.com,nblo.gs,dorovibes.com,ihorror.com,thefreethoughtproject.com,flyheight.com,io9.com,vox.com,news.sky.com,uniladmag.com,dailygeeky.com,news.yahoo.com,uk.ign.com,popwatch.ew.com,business2community.com,hollywoodelevator.com,atlantablackstar.com,aol.it,bbcrepeater.tumblr.com,thedailybeast.com,dailykos.com,rt.com,newsbusters.org,twnewsjp.com,mashable.com,news.google.com,globalgrind.com,littlethings.com,blogs.disney.com,techcrunch.com,westernjournalism.com,thebingbing.com,on.cc.com,dailydot.com,npr.org,viralphotos.kathelpedmegain.com,usnewse.com,breitbart.com,uknewse.com,superindykingsblog.com,labnol.asia,needtocheck.com,rawstory.com,theblaze.com,metalinjection.net,empirenews.net,dragplus.com,theverge.com,boredpanda.com,ww.itimes.com,mediaite.com,javaskop.com,omgfacts.com,addictinginfo.org,huffingtonpost.co.uk"} 
tag.movies "Hollywood/Movie/Celeb/Gossip" { links.url any "etonline.com,vanityfair.com,zimbio.com,whosay.com,twilight-gossip.com,dramafever.com,hollywoodlife.com,bet.us,gossipgawker.com,tmz.com,celebritybabies.people.com,sugarscape.com,jezebel.com,koreaboo.com,uproxx.com,tryhairstyle.com,popsugar.com,thewrap.com,gossipdawg.com,wet.pt,ikwiz.com,perezhilton.com,justjared.com,dailyexo.tumblr.com,usmagazine.com,ohkpop.com,mirror.co.uk,hollywoodreporter.com,allkpop.com,smb2stfinitesubs1.blogspot.co.uk,kpopstarz.com,vulture.com,zirkovi.com,people.com"} 
tag.movies "Fashion" { links.url any "instyle.com,luckymag.com,papermag.com,vogue.fr,esquire.com,gq.com,on.allure.com,anothermag.com,elle.co.jp,on.elle.com,cosmopolitan.co.uk,seventeen.com,oceandrive.com,dazeddigital.com,vogue.co.uk,vmagazine.com,wmagazine.com,vogue.it,tumblr.instyle.com,harpersbazaar.com,complex.com,marieclaire.com,video.vogue.com,media.wwd.com,tv.esquire.com,marieclaire.co.uk,madamenoire.com,vogue.com,elle.com,video.gq.com,flare.com,origin-elle.com,refinery29.com,vogue.es,elle.fr,en.vogue.fr,style.com,hintmag.com,glamourmagazine.co.uk,ellecanada.com,wwd.com,lucire.com,nylonguysmag.com,teenvogue.com,uk.complex.com,fashionallure.com,glamour.com,vogue.com.au,ca.complex.com,swinglifestyle.com,cosmopolitan.com,allure.com,gq-magazine.co.uk,dailyelle.fr,news.instyle.com"} 
tag.movies "Sports" { links.url any "sbnation.com,nfl.com,nba.com,espn.go.com,m.espn.go.com,skysports.com,ninerfans.com,scores.espn.go.com,bleacherreport.com,betprepared.com,wwe.com,cbssports.com"} 
tag.movies "Fundraising/Campaign/Community/Religion" { links.url any "jackandjackthemovie.vhx.tv,change.org,conservativetribune.com,kickstarter.com,secure.avaaz.org,mojahedin.com,tpnn.com,fahlo.me,ndtv.com,violencefreefamilies.org.au,patreon.com,mankind.org.uk,puritandownloads.com,mediahoarders.com,indiegogo.com,hotpeachpages.net,conservativereport.org,gofundme.com,thehotline.org,blog.theveteranssite.com"} 
tag.movies "Anime/Manga/Comic/Fun" { links.url any "collective-evolution.com,i.ntere.st,action.18mr.org,goodanime.net,brainfall.com,quizfreak.com,myanimelist.net,comicbookmovie.com,animenewsnetwork.com,avclub.com,comicbook.com"} 
tag.movies "Review" { links.url any "openculture.com,slate.com,forbes.com,shazam.com,slashfilm.com,spoilertv.com,newyorker.com,dorkly.com,polygon.com,wired.com,playbill.com,insidemovies.ew.com,screenrant.com,fandango.com,mentalfloss.com,insidetv.ew.com,reason.com,cinemablend.com,engadget.com,radiotimes.com,cnet.com,imdb.com,billboard.com,denofgeek.com,rollingstone.com,deadline.com,variety.com"} 

return { 
    fb.topics.category in "Movie,Film,TV Network,TV Programme,TV Channel,TV Show,TV/Movie Award" 
    OR fb.parent.topics.category in "Movie,Film,TV Network,TV Programme,TV Channel,TV Show,TV/Movie Award" 
}

Or to make your code more maintainable, we recommend saving your tags definition as a filter, then including these in your filters using the tags keyword.

Analyzing Classified Data

Once you've recorded data based upon a classifier you can make use of the tags in your analysis queries.

For instance you can ask for a frequency distribution of the classes across your index using the following analysis query:

{
    "analysis_type": "freqDist",
    "parameters": {
        "target": "interaction.tag_tree.movies",
        "threshold": 5
    }
}

Or you could filter to just conversations relating to videos by specifying the following as your filter parameter:

interaction.tag_tree.movies== "Video"

Of course you can then add further filter conditions and dig down by demographic or other tags you choose to add to the data.