The language a Tweet was written in, as identified by Twitter's machine language detection algorithms. The values are valid BCP 47 language identifiers, and may represent any of the languages listed on Twitter's advanced search page, or "und" if no language could be detected.
DataSift already has a language detection mechanism, of course, offered by our Language augmentation. But remember that there is a third way to find out which language a user prefers, by examining the language an author selected in their Settings page on Twitter. You can filter against this using twitter.user.lang, twitter.retweet.user.lang, or twitter.retweeted.user.lang. Take care, though, because users select their language from a drop-down list. They might make a mistake, or select a langauge that is not their own, or perhaps Tweet in more than one language. The bottom line here us that there might be a discrepancy between the language of the Tweet and the main language the user specified in their profile.
- Filter for Tweets written in a language other than English:
twitter.lang != "en" and interaction.sample < 1 // Limit our example to 1% of the Twitter firehose
Target service: Twitter
Target object: Twitter: Tweet