Can I apply interaction.sample to only a portion of a stream?

argusinsights's picture

I want to get a sample of the re-tweets that a stream is pulling in, but not necessarily limit the stream as a whole? Is there a variation of interaction.sample that I can use? Or an obscure argument that exists?

Comments

Jason's picture

I assume from this you would still like to receive all regular Tweets, but just receive a sample of the retweets sent. There are a couple of ways you could achieve this:

1. Receive all Tweets, but only receive retweets which have been retweeted 'x' times

So, this CSDL statement will receive any normal Tweets, and any Tweets which have been retweeted 10 or 50 times:

twitter.text contains "coffee" or
twitter.retweet.text contains "coffee" and
  twitter.retweet.count in "10, 50" )

 

2. Receive all Tweets, and receive 10% of all retweets

twitter.text contains "coffee" or
twitter.retweet.text contains "coffee" and
  interaction.sample <= 10 )

argusinsights's picture

That sounds great, and number 2 is what I am looking for. I have a follow up question though. To take your example above what happens if the stream itself is being sampled as a whole?
twitter.text contains "coffee" or
( twitter.retweet.text contains "coffee" and
interaction.sample <= 10 )
and interaction.sample < 50

Jason's picture

First off, I would recommend wrapping some brackets around the first part of the sattement like so:

twitter.text contains "coffee" or
  ( twitter.retweet.text contains "coffee" and
    interaction.sample <= 10 )
and interaction.sample 50

Each interaction is randomly assigned an interaction.sample value, which is a floating point number from 0 to 100. If we say "interaction.sample < 50", we will receive any interactions which have been assigned an interaction.sample value of less than 50. 

So, using the CSDL above, we will receive any interactions which match the filter, and have an interaction.sample value of less than 50. To match on the retweet part of the stream, the interaction.sample value will need to be less than 10.

Essentially, you just need to remember that interaction.sample is not worked out as a percentage of anything. If you nest "interaction.sample < 10" inside "interaction.sample < 50", you will still receive the same number of interactions for the "interaction.sample < 10" sample as you would if you had not nested it inside another interaction.sample - you will still be receiving any interactions with an interaction.sample value of less than 10.