@dataknut
)Please note that authorship is alphabetical. Contributions are listed below - see github for details and who to blame for what :-).
@dataknut
)If you wish to refer to any of the material from this report please cite as:
Report circulation:
Report purpose:
This work is (c) 2019 the University of Southampton.
dfW <- openair::importAURN(
site = "SA33",
year = 2019,
pollutant = "all",
hc = FALSE,
meta = TRUE,
to_narrow = FALSE, # produces long form data yay!
verbose = TRUE # for now
)
# fails. it worked before
# dfL <- openair::importAURN(
# site = "SA33",
# year = 2019,
# pollutant = "all",
# hc = FALSE,
# meta = TRUE,
# to_narrow = TRUE, # produces long form data yay!
# verbose = TRUE
# )
dtW <- data.table::as.data.table(dfW) # we like data.tables
Data downloaded from http://uk-air.defra.gov.uk/openair/R_data/ using ōpenair::importAURN()
.
Southampton City Council collects various forms of air quality data at the sites shown in 2.1. WHO publishes information on the health consequences and “acceptable” exposure levels for each of these.
lDT <- data.table::melt(dtW, id.vars = c("site", "date", "code", "latitude", "longitude",
"site_type"), measure.vars = c("no", "no2", "nox", "pm10", "nv10", "v10", "ws",
"wd"), value.name = "value" # varies
)
# remove NA
lDT <- lDT[!is.na(value)]
t <- lDT[, .(from = min(date), to = max(date), nObs = .N), keyby = .(site, variable)]
kableExtra::kable(t, caption = "Dates data available by site and measure", digits = 2) %>%
kable_styling()
site | variable | from | to | nObs |
---|---|---|---|---|
Southampton A33 | no | 2019-01-01 | 2019-12-20 23:00:00 | 8211 |
Southampton A33 | no2 | 2019-01-01 | 2019-12-20 23:00:00 | 8211 |
Southampton A33 | nox | 2019-01-01 | 2019-12-20 23:00:00 | 8211 |
Southampton A33 | pm10 | 2019-01-01 | 2019-12-20 22:00:00 | 7724 |
Southampton A33 | nv10 | 2019-01-01 | 2019-12-20 22:00:00 | 7724 |
Southampton A33 | v10 | 2019-01-01 | 2019-12-20 22:00:00 | 7724 |
Southampton A33 | ws | 2019-01-01 | 2019-12-20 23:00:00 | 8064 |
Southampton A33 | wd | 2019-01-01 | 2019-12-20 23:00:00 | 8064 |
Summarise previously downloaded and processed data… Note that this may not be completely up to date.
skimr::skim(dfW)
Name | dfW |
Number of rows | 8760 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
character | 3 |
numeric | 10 |
POSIXct | 1 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
code | 0 | 1 | 4 | 4 | 0 | 1 | 0 |
site | 0 | 1 | 15 | 15 | 0 | 1 | 0 |
site_type | 0 | 1 | 13 | 13 | 0 | 1 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
no | 549 | 0.94 | 27.95 | 38.75 | -0.22 | 4.12 | 14.20 | 35.60 | 444.91 | ▇▁▁▁▁ |
no2 | 549 | 0.94 | 32.74 | 22.47 | 0.00 | 15.11 | 28.03 | 45.55 | 158.31 | ▇▅▁▁▁ |
nox | 549 | 0.94 | 75.57 | 76.58 | 0.84 | 22.77 | 52.03 | 100.15 | 771.31 | ▇▁▁▁▁ |
pm10 | 1036 | 0.88 | 16.74 | 10.98 | -1.40 | 9.60 | 13.70 | 20.50 | 95.00 | ▇▃▁▁▁ |
nv10 | 1036 | 0.88 | 13.66 | 9.48 | -9.60 | 7.40 | 11.30 | 17.42 | 99.70 | ▇▆▁▁▁ |
v10 | 1036 | 0.88 | 3.07 | 3.09 | -14.00 | 1.30 | 2.70 | 4.30 | 22.80 | ▁▂▇▁▁ |
ws | 696 | 0.92 | 3.76 | 2.03 | 0.00 | 2.30 | 3.30 | 4.90 | 13.00 | ▆▇▃▁▁ |
wd | 696 | 0.92 | 201.12 | 104.40 | 0.00 | 116.77 | 229.75 | 284.10 | 360.00 | ▅▃▃▇▆ |
latitude | 0 | 1.00 | 50.92 | 0.00 | 50.92 | 50.92 | 50.92 | 50.92 | 50.92 | ▁▁▇▁▁ |
longitude | 0 | 1.00 | -1.46 | 0.00 | -1.46 | -1.46 | -1.46 | -1.46 | -1.46 | ▁▁▇▁▁ |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
date | 0 | 1 | 2019-01-01 | 2019-12-31 23:00:00 | 2019-07-02 11:30:00 | 8760 |
Table 3.1 gives an indication of the availability of the different measures.
In this section we present graphical analysis of the previoulsy downloaded data. Note this is just a snapshot of the data available.
yLab <- "Nitrogen Dioxide (ug/m3)"
t <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE), sd = sd(value, na.rm = TRUE),
min = min(value, na.rm = TRUE), max = max(value, na.rm = TRUE)), keyby = .(site)]
kableExtra::kable(t, caption = "Summary of no2 data") %>% kable_styling()
site | mean | sd | min | max |
---|---|---|---|---|
Southampton A33 | 32.73677 | 22.47269 | 0 | 158.3072 |
Table 4.1 suggests that there may be a few (0) negative values. These are summarised in 4.2 while Figure 4.1 shows the availability and levels of the pollutant data over time.
t <- head(lDT[variable == "no2" & value < 0], 10)
kableExtra::kable(t, caption = "Negative no2 values (up to first 6)") %>% kable_styling()
site | date | code | latitude | longitude | site_type | variable | value |
---|---|---|---|---|---|---|---|
t <- table(lDT[variable == "no2" & value < 0, .(site)])
kableExtra::kable(t, caption = "Negative no2 values (count by site)") %>% kable_styling()
Freq |
---|
# dt,xvar, yvar,fillVar, yLab
p <- makeTilePlot(lDT[variable == "no2"], xVar = "date", yVar = "site", fillVar = "value",
yLab = yLab)
p
# p <- ggplot2::ggplot(dt, aes(x = obsDateTime, y = nox2, colour = site, alpha =
# 0.1)) + geom_point(shape=4, size = 1)
t <- lDT[variable == "no2" & value > 200][order(-value)]
kableExtra::kable(caption = paste0("Values greater than WHO threshold (NO2 > ", hourlyno2Threshold_WHO,
")"), head(t, 10)) %>% kable_styling()
site | date | code | latitude | longitude | site_type | variable | value |
---|---|---|---|---|---|---|---|
p <- makeDotPlot(lDT[variable == "no2"], xVar = "date", yVar = "value", byVar = "site",
yLab = yLab)
p <- p + geom_hline(yintercept = hourlyno2Threshold_WHO) + labs(caption = "Reference line = WHO hourly guideline threshold")
if (doPlotly) {
p
plotly::ggplotly(p + xlim(xlimMinDateTime, xlimMaxDateTime)) # interactive, xlimited
} else {
p
}
Figure 4.2 shows hourly values for all sites. In the study period there were 0 hours when the hourly Nitrogen Dioxide level breached WHO guidelines. The worst 10 cases are shown in Table 4.3.
lDT[, obsDate := lubridate::date(date)]
plotDT <- lDT[variable == "no2", .(mean = mean(value, na.rm = TRUE)),
keyby = .(obsDate, site)]
p <- makeDotPlot(plotDT,
xVar = "obsDate",
yVar = "mean",
byVar = "site",
yLab = yLab)
p <- p +
geom_smooth() + # add smoothed line
labs(caption = "Trend line = Generalized additive model (gam) with integrated smoothness estimation")
if(doPlotly){
p
plotly::ggplotly(p + xlim(xlimMinDate, xlimMaxDate)) # interactive, xlimited # interactive
} else {
p
}
Figure 4.3 shows daily mean values for all sites over time and includes smoother trend lines for each site.
Clearly the mean daily values show less variance (and less extremes) than the hourly data and there has also been a decreasing trend over time.
Wind rose
openair::windRose(dfW)
Pollution rose
openair::pollutionRose(dfW, pollutant = "no2")
We get a slightly higher % of high measures when the wind is from the SE?
Report generated using knitr in RStudio with R version 3.6.2 (2019-12-12) running on x86_64-apple-darwin15.6.0 (Darwin Kernel Version 17.7.0: Sun Dec 1 19:19:56 PST 2019; root:xnu-4570.71.63~1/RELEASE_X86_64).
t <- proc.time() - startTime
elapsed <- t[[3]]
Analysis completed in 7.88 seconds ( 0.13 minutes).
R packages used:
Arino de la Rubia, Eduardo, Hao Zhu, Shannon Ellis, Elin Waring, and Michael Quinn. 2017. Skimr: Skimr. https://github.com/ropenscilabs/skimr.
Carslaw, David C., and Karl Ropkins. 2012. “Openair — an R Package for Air Quality Data Analysis.” Environmental Modelling & Software 27–28 (0): 52–61. https://doi.org/10.1016/j.envsoft.2011.09.008.
Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.
Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.
Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.
Sievert, Carson, Chris Parmer, Toby Hocking, Scott Chamberlain, Karthik Ram, Marianne Corvellec, and Pedro Despouy. 2016. Plotly: Create Interactive Web Graphics via ’Plotly.js’. https://CRAN.R-project.org/package=plotly.
Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.
Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.