@dataknut
)To extract and visualise tweets and re-tweets of #BluePlanet2
for October-November, 2017.
Borrowing extensively from http://thinktostart.com/twitter-authentification-with-r/
We used the Twitter search API to extract ‘all’ tweets with the #dockercon
hashtag. As the Twitter search API documentation (sort of) makes clear this may not be all
such tweets but merely the most relevant
(whatever that means) from within a sample
(whatever that means).
“It allows queries against the indices of recent or popular Tweets and behaves similarly to, but not exactly like the Search feature available in Twitter mobile or web clients, such as Twitter.com search. The Twitter Search API searches against a sampling of recent Tweets published in the past 7 days.” https://dev.twitter.com/rest/public/search, Accessed 12/5/2017
It is therefore possible that not quite all tweets have been extracted although it seems likely that we have captured most human
tweeting which was our main intention. Future work should instead use the Twitter streaming API.
You can either collect data from scratch (takes a while) or load the pre-collected data (have to remember to re-run it :-)
This produces a data table with the following variables (after some processing):
## [1] "Using browser based authentication"
## [1] "text" "favorited" "favoriteCount"
## [4] "replyToSN" "created" "truncated"
## [7] "replyToSID" "id" "replyToUID"
## [10] "statusSource" "screenName" "retweetCount"
## [13] "isRetweet" "retweeted" "longitude"
## [16] "latitude" "createdLocal" "obsDateTimeMins"
## [19] "obsDateTimeHours" "obsDateTime5m" "obsDateTime10m"
## [22] "obsDateTime15m" "obsDate" "obsHourMin"
## [25] "isRetweetLab"
The table has 28,160 tweets (and 30,396 re-tweets) from 31,550 tweeters between 2017-10-28 00:00:16 and 2017-10-30 08:49:45 (Central European Time).
Although there had been some low level tweeting about #BluePlanet2 on Saturday, it exploded on Sunday evening as you’d expect.
In the next sections we look day by day.
This plot is zoomable - try it!
This plot is zoomable - try it!
This plot is zoomable - try it!
Interesting that it picks up again around 08:30-09:00 as people tweet about the night before’s viewing (& extensively re-tweet).
If you see nothing, nothing happened yet :-)
This plot is zoomable - try it!
We wanted to make a nice map but sadly we see that most tweets have no lat/long set.
latitude | longitude | nTweets |
---|---|---|
NA | NA | 58349 |
18.2233 | -66.4289 | 23 |
21.0285 | 105.8048 | 28 |
53.4196 | -8.2406 | 65 |
51.5063 | -0.1271 | 68 |
51.50711486 | -0.12731805 | 1 |
53.7951864 | -1.56100815 | 1 |
53.7448691 | -1.9005455 | 1 |
53.7448736 | -1.9003858 | 1 |
52.3731 | 4.8932 | 1 |
51.46575334 | -0.37608365 | 2 |
54.59337297 | -5.81751796 | 1 |
51.53175946 | -0.05256008 | 1 |
53.7447977 | -1.9004963 | 1 |
53.7447533 | -1.9006219 | 1 |
51.46581737 | -0.37607814 | 1 |
52.483 | -1.89359 | 1 |
53.7449284 | -1.9002738 | 1 |
53.7448107 | -1.9003428 | 1 |
51.53202288 | -0.05261762 | 1 |
51.46577441 | -0.37605668 | 1 |
53.96097 | -1.08595 | 1 |
-36.16666667 | 175.38333333 | 1 |
-37.0167 | 175.85 | 2 |
-33.8671 | 151.207 | 2 |
One day we’ll draw a map.
NB: twitteR no longer returns twitter’s best guess at ‘location’ :-(
Next we’ll try by screen name.
Here’s a really bad visualisation of all tweeters tweeting over time. Each row of pixels is a tweeter (the names are illegible) and a green dot indicates a few tweets in the 5 minute period while a red dot indicates a lot of tweets.
Yeah, that worked well.
So let’s re-do that for the top 50 tweeters so we can see their tweetStreaks…
Top tweeters:
screenName | nTweets |
---|---|
MatthewJHorn | 216 |
worththelicence | 113 |
aimasters | 103 |
GrainneBlair | 73 |
ImAnitaSharma | 73 |
trendinaliaGB | 68 |
AnnSmith63 | 67 |
EvilAwkeye | 67 |
trendinaliaIE | 65 |
bunnygail1977 | 64 |
SerendipitySays | 62 |
_SeaGrassRoots | 61 |
LeeFergusson | 58 |
CRHClover | 56 |
BBCEarth | 54 |
And their tweetStreaks…
Spot the twitterBots…
Analysis completed in 219.767 seconds ( 3.66 minutes) using knitr in RStudio with R version 3.4.0 (2017-04-21) running on x86_64-apple-darwin15.6.0.
A special mention must go to twitteR
(Gentry, n.d.) for the twitter API interaction functions and lubridate
(Grolemund and Wickham 2011) which allows time-zone manipulation without too many tears.
Other R packages used:
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Gentry, Jeff. n.d. TwitteR: R Based Twitter Client. http://lists.hexdump.org/listinfo.cgi/twitter-users-hexdump.org.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.
Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.