1 TL;DR

Just a bit of dataknut fun woven around the day job.

You’ll be wanting Section 6 for the trending hashtags…

2 Terms of re-use

2.1 License

CC-BY unless otherwise noted.

2.2 Citation

3 Purpose

The idea is to extract and visualise tweets and re-tweets of #schoolstrike4climate (see https://www.schoolstrike4climate.com/).

Why? Err…. Just. Because.

4 How it works

Code borrows extensively from https://github.com/mkearney/rtweet

The analysis used rtweet to ask the Twitter search API to extract ‘all’ tweets containing the #schoolstrike4climate hashtags in the ‘recent’ twitterVerse.

It is therefore possible that not quite all tweets have been extracted although it seems likely that we have captured most recent human tweeting which was the main intention. Future work should instead use the Twitter streaming API.

## [1] "Found 7 files matching #schoolstrike4climate in ~/Data/twitter/"

The data has:

5 Analysis

5.1 Tweets and Tweeters over time

Number of tweets and tweeters

Figure 5.1: Number of tweets and tweeters

Figure 5.1 shows the number of tweets and tweeters in the data extract by day. The quotes, tweets and re-tweets have been separated.

If you are in New Zealand and you are wondering why there are no tweets today (2019-03-16) the answer is that twitter data (and these plots) are working in UTC and (y)our today() may not have started yet in UTC. Don’t worry, all the tweets are here - it’s just our old friend the timezone… :-)

5.2 Who’s tweeting?

Next we’ll try by screen name.

N tweets per day by screen name

Figure 5.2: N tweets per day by screen name

Figure 5.2 is a really bad visualisation of all tweeters tweeting over time. Each row of pixels is a tweeter (the names are probably illegible) and a green dot indicates a few tweets in the given day while a red dot indicates a lot of tweets.

So let’s re-do that for the top 50 tweeters so we can see their tweetStreaks (tm)…

Top tweeters:

Table 5.1: Top 15 tweeters (all days)
screen_name nTweets
NoahsArkCrew 157
D_Melissa2 84
pezmico 84
Glo_man 69
buoyancybackup 69
DawnRoseTurner 66
lin_nah 63
Beccabluesky 63
GreenpeaceNZ 58
NoAdaniOz 56
ClimateStrikeGL 51
Feenwald 48
heidi_k_edmonds 45
FibrodisKo 44
daniel_scholler 42

And their tweetStreaks are shown in Figure 5.3

N tweets per day by screen name (top 50, reverse alphabetical)

Figure 5.3: N tweets per day by screen name (top 50, reverse alphabetical)

Any twitterBots…?

5.3 Which hashtags are mentioned the most?

This is very quick and dirty but… to calculate this we have to do a bit of string processing first.

This is how I have tidied the hashtags (make other suggestions here):

# First we make everything lower case
htLongDT <- htLongDT[, `:=`(htLower, tolower(htOrig))]  # lower case

# Next we remove the macrons just in case h/t:
# https://twitter.com/Thoughtfulnz/status/1046685305569345536
htLongDT <- htLongDT[, `:=`(htClean, stringr::str_replace_all(htLower, "[āēīōū]", 
    dkUtils::deMacron))]

# we might need to do other things here depending on the the context

Table 5.2 shows the total count of each #hashtag by (re)tweet type.

Table 5.2: Top 20 hashtags
hashTag type count
schoolstrike4climate Re-tweet 15043
climatestrike Re-tweet 8901
fridaysforfuture Re-tweet 7084
schoolstrike4climate Tweet 4013
fridayforfuture Re-tweet 1699
fridaysforfurture Re-tweet 1517
climatestrike Tweet 1474
scientistsforfuture Re-tweet 1338
fridaysforfuture Tweet 1145
schoolstrike4climate Quote 1042
ss4cnz Re-tweet 980
climatechange Re-tweet 685
climate Re-tweet 584
climechange Re-tweet 534
earthstrike Re-tweet 517
fridays4future Re-tweet 485
climateaction Re-tweet 475
australia Re-tweet 374
climatestrike Quote 357
fridaysforfuture Quote 352

Figure 5.4 plots the daily occurence of these hashtags after removing variants of #schoolstrike4climate and selecting only those which have more than 100 mentions on any day. For clarity tweets and re-tweets are aggregated. See Section 7 for the problems with this #hashTag counting approach.

Most mentioned #hashtags per day (only > 100 per day shown)

Figure 5.4: Most mentioned #hashtags per day (only > 100 per day shown)

5.4 Location (lat/long)

We wanted to make a nice map but sadly we see that most tweets have no lat/long set.

Table 5.3: All logged lat/long values
geo_coords nTweets
| 54702
-34.6089|-58.4397 1
-33.86751|151.20797 1
-33.8731575|151.2061157 1
-37.8|144.967 1
4.60987|-74.082 1
40.78100519|-73.97325538 1
-37.81328358|144.97403895 2
-41.2889|174.777 1
19.4156206|-99.1913432 1
Table 5.3: All logged coord values
coords_coords nTweets
| 54702
-58.4397|-34.6089 1
151.20797|-33.86751 1
151.2061157|-33.8731575 1
144.967|-37.8 1
-74.082|4.60987 1
-73.97325538|40.78100519 1
144.97403895|-37.81328358 2
174.777|-41.2889 1
-99.1913432|19.4156206 1

5.5 Location (textual)

This appears to be pulled from the user’s profile although it may also be a ‘guestimate’ of current location.

Top country locations for tweets:

Table 5.4: Top 15 locations for tweeting
location nTweets
NA 14050
Australia 1250
New Zealand 466
London 443
Melbourne, Victoria 378
Melbourne, Australia 371
Sydney, New South Wales 305
Sydney 305
United States 292
Auckland, New Zealand 285
Sydney, Australia 269
Earth 257
Canada 257
London, England 243
Melbourne 239

Top locations for tweeters:

Table 5.5: Top 15 locations for tweeters
location nTweeters
NA 8542
Australia 493
London 251
Melbourne, Victoria 209
United States 194
New Zealand 191
London, England 187
Sydney, New South Wales 176
Melbourne, Australia 167
Sydney, Australia 145
Canada 143
Sydney 134
Melbourne 131
Earth 113
United Kingdom 113

Now try the full place name - rarely available.

Table 5.6: Top 15 locations for tweeting
place_full_name nTweets
NA 54415
Auckland, New Zealand 37
Sydney, New South Wales 29
Melbourne, Victoria 21
Wellington City, New Zealand 14
Adelaide, South Australia 13
Miami Beach, FL 10
Manhattan, NY 8
Brisbane, Queensland 7
Walthamstow, London 7
Vancouver, British Columbia 6
Viña del Mar, Chile 6
Canberra, Australian Capital Territory 5
Old Treasury Building 4
Newcastle, New South Wales 4

6 Most popular hashtags over time

There are a lot of problems with this approach (see Section 7) but Figure 6.1 shows trends over time (watch for lines of apparently dis-similar hashtags where the macron fix has failed) and Figure 6.2 shows the totals to date.

Figure 6.1 uses plotly to avoid having to render a large legend - just hover over the lines to see who is who…

Figure 6.1: Cumulative hashtag counts over time (only total count >100 shown)

Total hashtag counts to date (only total count > 100 shown)

Figure 6.2: Total hashtag counts to date (only total count > 100 shown)

7 Problems

Loads of them. But primarily:

8 About

As ever, #YMMV.

Analysis completed in 67.664 seconds ( 1.13 minutes) using knitr in RStudio with R version 3.5.1 (2018-07-02) running on x86_64-redhat-linux-gnu.

A special mention must go to https://github.com/mkearney/rtweet (Kearney 2018) for the twitter API interaction functions.

Other R packages used:

References

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Kearney, Michael W. 2018. Rtweet: Collecting Twitter Data. https://cran.r-project.org/package=rtweet.

R Core Team. 2016. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.

Wickham, Hadley. 2007. “Reshaping Data with the reshape Package.” Journal of Statistical Software 21 (12): 1–20. http://www.jstatsoft.org/v21/i12/.

———. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

———. 2016. Stringr: Simple, Consistent Wrappers for Common String Operations. https://CRAN.R-project.org/package=stringr.

Wickham, Hadley, Jim Hester, and Romain Francois. 2016. Readr: Read Tabular Data. https://CRAN.R-project.org/package=readr.

Xie, Yihui. 2016. Knitr: A General-Purpose Package for Dynamic Report Generation in R. https://CRAN.R-project.org/package=knitr.

———. 2018. Bookdown: Authoring Books and Technical Documents with R Markdown. https://github.com/rstudio/bookdown.

Zhu, Hao. 2019. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.