1 Purpose

To extract and visualise tweets and re-tweets of #BluePlanet2 for October-November, 2017.

Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/

We used the Twitter search API to extract ‘all’ tweets with the #dockercon hashtag. As the Twitter search API documentation (sort of) makes clear this may not be all such tweets but merely the most relevant (whatever that means) from within a sample (whatever that means).

“It allows queries against the indices of recent or popular Tweets and behaves similarly to, but not exactly like the Search feature available in Twitter mobile or web clients, such as Twitter.com search. The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.” https://dev.twitter.com/rest/public/search, Accessed 12/5/2017

It is therefore possible that not quite all tweets have been extracted although it seems likely that we have captured most human tweeting which was our main intention. Future work should instead use the Twitter streaming API.

2 Load Data

You can either collect data from scratch (takes a while) or load the pre-collected data (have to remember to re-run it :-)

This produces a data table with the following variables (after some processing):

## [1] "Using browser based authentication"
##  [1] "text"             "favorited"        "favoriteCount"   
##  [4] "replyToSN"        "created"          "truncated"       
##  [7] "replyToSID"       "id"               "replyToUID"      
## [10] "statusSource"     "screenName"       "retweetCount"    
## [13] "isRetweet"        "retweeted"        "longitude"       
## [16] "latitude"         "createdLocal"     "obsDateTimeMins" 
## [19] "obsDateTimeHours" "obsDateTime5m"    "obsDateTime10m"  
## [22] "obsDateTime15m"   "obsDate"          "obsHourMin"      
## [25] "isRetweetLab"

The table has 28,160 tweets (and 30,396 re-tweets) from 31,550 tweeters between 2017-10-28 00:00:16 and 2017-10-30 08:49:45 (Central European Time).

3 Analysis

3.1 Tweets and Tweeters over time

All (re)tweets containing #dockercon 2017-10-28 to 2017-10-30

All (re)tweets containing #dockercon 2017-10-28 to 2017-10-30

Although there had been some low level tweeting about #BluePlanet2 on Saturday, it exploded on Sunday evening as you’d expect.

In the next sections we look day by day.

3.1.1 B-Day -1: Saturday 28/10/2017

This plot is zoomable - try it!

All (re)tweets containing #BluePlanet2 Saturday 28th October 2017

3.1.2 B-Day: Sunday 29/10/2017 (First episode)

This plot is zoomable - try it!

All (re)tweets containing #BluePlanet2 Sunday 29th October 2017

3.1.3 B-Day +1: Monday 30/11/2017 (post-broadcast excitement)

This plot is zoomable - try it!

All (re)tweets containing #BluePlanet2 Monday 30th October 2017

Interesting that it picks up again around 08:30-09:00 as people tweet about the night before’s viewing (& extensively re-tweet).

3.1.4 B-Day +2 - Tuesday 31/11/2017 (more post-broadcast excitement)

If you see nothing, nothing happened yet :-)

This plot is zoomable - try it!

3.2 Location (lat/long)

We wanted to make a nice map but sadly we see that most tweets have no lat/long set.

All logged lat/long values
latitude longitude nTweets
NA NA 58349
18.2233 -66.4289 23
21.0285 105.8048 28
53.4196 -8.2406 65
51.5063 -0.1271 68
51.50711486 -0.12731805 1
53.7951864 -1.56100815 1
53.7448691 -1.9005455 1
53.7448736 -1.9003858 1
52.3731 4.8932 1
51.46575334 -0.37608365 2
54.59337297 -5.81751796 1
51.53175946 -0.05256008 1
53.7447977 -1.9004963 1
53.7447533 -1.9006219 1
51.46581737 -0.37607814 1
52.483 -1.89359 1
53.7449284 -1.9002738 1
53.7448107 -1.9003428 1
51.53202288 -0.05261762 1
51.46577441 -0.37605668 1
53.96097 -1.08595 1
-36.16666667 175.38333333 1
-37.0167 175.85 2
-33.8671 151.207 2

One day we’ll draw a map.

NB: twitteR no longer returns twitter’s best guess at ‘location’ :-(

3.3 Screen name

Next we’ll try by screen name.

Here’s a really bad visualisation of all tweeters tweeting over time. Each row of pixels is a tweeter (the names are illegible) and a green dot indicates a few tweets in the 5 minute period while a red dot indicates a lot of tweets.

N tweets per 5 minutes by screen name

N tweets per 5 minutes by screen name

Yeah, that worked well.

So let’s re-do that for the top 50 tweeters so we can see their tweetStreaks…

Top tweeters:

Top 15 tweeters (all days)
screenName nTweets
MatthewJHorn 216
worththelicence 113
aimasters 103
GrainneBlair 73
ImAnitaSharma 73
trendinaliaGB 68
AnnSmith63 67
EvilAwkeye 67
trendinaliaIE 65
bunnygail1977 64
SerendipitySays 62
_SeaGrassRoots 61
LeeFergusson 58
CRHClover 56
BBCEarth 54

And their tweetStreaks…

N tweets per 5 minutes by screen name (top 50, reverse alphabetical)

N tweets per 5 minutes by screen name (top 50, reverse alphabetical)

Spot the twitterBots…

4 About

Analysis completed in 219.767 seconds ( 3.66 minutes) using knitr in RStudio with R version 3.4.0 (2017-04-21) running on x86_64-apple-darwin15.6.0.

A special mention must go to twitteR (Gentry, n.d.) for the twitter API interaction functions and lubridate (Grolemund and Wickham 2011) which allows time-zone manipulation without too many tears.

Other R packages used:

  • base R - for the basics (R Core Team 2016)
  • data.table - for fast (big) data handling (Dowle et al. 2015)
  • readr - for nice data loading (Wickham, Hester, and Francois 2016)
  • ggplot2 - for slick graphs (Wickham 2009)
  • plotly - fancy, zoomable slick graphs (Sievert et al. 2016)
  • twitteR - twitter API search (Gentry, n.d.)
  • knitr - to create this document (Xie 2016)

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.