This vignette explores the mts_monitor data model used throughout the AirMonitor package to store and work with monitoring data.
The AirMonitor package is designed to provide a compact, full-featured suite of utilities for working with PM2.5 data. A uniform data model provides consistent data access across monitoring data available from different agencies. The core data model in this package is defined by the mts_monitor object used to store data associated with groups of individual monitors.
To work efficiently with the package it is important that you
understand the structure of this data object and the functions that
operate on it. Package functions whose names begin with
monitor_
, expect objects of class mts_monitor as
their first argument. (‘mts’ stands for ‘Multiple Time
Series’)
The AirMonitor package uses the mts data model defined in the MazamaTimeSeries package.
In this data model, each unique time series is referred to as a
“device-deployment” – a time series collected by a particular
device at a specific location. Multiple device-deployments are stored in
memory as a mts_monitor object, typically called
monitor
. Each monitor
is just an list with two
dataframes.
monitor$meta
– rows = unique device-deployments; cols =
device/location metadata
monitor$data
– rows = UTC times; cols =
device-deployment data (plus an additional datetime
column)
A key feature of this data model is the use of the
deviceDeploymentID
as a “foreign key” that allows
data
columns to be mapped onto the associated spatial and
device metadata in a meta
row. The following will always be
true:
identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
Each column of monitor$data
represents a time series
associated with a particular device-deployment while each row of
monitor$data
represents a synoptic snapshot of all
measurements made at a particular time.
In this manner, software can create both time series plots and maps
from a single monitor
object in memory.
The data
dataframe contains all hourly measurements
organized with rows (the ‘unlimited’ dimension) as unique timesteps and
columns as unique device-deployments. The very first column is always
named datetime
and contains the POSIXct
datetime in Coordinated Universal Time (UTC). This time axis is
guaranteed to be a regular hourly axis with no gaps.
The meta
dataframe contains all metadata associated with
device-deployments and is organized with rows as unique
device-deployments and columns containing both location and device
metadata. The following columns are guaranteed to exist in the
meta
dataframe. Those marked with “(optional)” may contain
NA
s. Additional columns may also be present depending on
the data source.
deviceDeploymentID
– unique ID associated with a time
seriesdeviceID
– unique location IDdeviceType
– (optional) device typedeviceDescription
– (optional) human readable device
descriptiondeviceExtra
– (optional) additional human readable
device informationpollutant
– pollutant name from
AirMonitor::pollutantNames
units
– one of "PPM|PPB|UG/M3"
dataIngestSource
– (optional) source of datadataIngestURL
– (optional) URL used to access datadataIngestUnitID
– (optional) instrument identifier
used at dataIngestSource
dataIngestExtra
– (optional) human readable data ingest
informationdataIngestDescription
– (optional) human readable data
ingest instructionslocationID
– unique location ID from
MazamaLocationUtils::location_createID()
locationName
– human readable location namelongitude
– longitudelatitude
– latitudeelevation
– (optional) elevationcountryCode
– ISO 3166-1 alpha-2 country codestateCode
– ISO 3166-2 alpha-2 state codecountyName
– US county nametimezone
– Olson time zonehouseNumber
– (optional)street
– (optional)city
– (optional)zip
– (optional)AQSID
– (optional) EPA AQS unique identifierfullAQSID
– (optional) EPA AQS unique identifierExample 1: Exploring mts_monitor objects
We will use the built-in “NW_Megafires” dataset and various
monitor_filter~()
functions to subset a
mts_monitor object which we then examine.
library(AirMonitor)
# Recipe to select Washington state monitors in August of 2014:
<-
monitor
# 1) start with NW Megafires
%>%
NW_Megafires
# 2) filter to only include Washington state
monitor_filter(stateCode == "WA") %>%
# 3) filter to only include August
monitor_filterDate(20150801, 20150901) %>%
# 4) remove monitors with all missing values
monitor_dropEmpty()
# 'mts_monitor' objects can be identified by their class
class(monitor)
## [1] "mts_monitor" "mts" "list"
# They alwyas have two elements called 'meta' and 'data'
names(monitor)
## [1] "meta" "data"
# Examine the 'meta' dataframe
dim(monitor$meta)
## [1] 67 82
names(monitor$meta)
## [1] "deviceDeploymentID" "deviceID" "deviceType"
## [4] "deviceDescription" "deviceExtra" "pollutant"
## [7] "units" "dataIngestSource" "dataIngestURL"
## [10] "dataIngestUnitID" "dataIngestExtra" "dataIngestDescription"
## [13] "locationID" "locationName" "longitude"
## [16] "latitude" "elevation" "countryCode"
## [19] "stateCode" "countyName" "timezone"
## [22] "houseNumber" "street" "city"
## [25] "zip" "AQSID" "fullAQSID"
## [28] "airnow_stationID" "airnow_parameterName" "airnow_monitorType"
## [31] "airnow_siteCode" "airnow_status" "airnow_agencyID"
## [34] "airnow_agencyName" "airnow_EPARegion" "airnow_GMTOffsetHours"
## [37] "airnow_CBSA_ID" "airnow_CBSA_Name" "airnow_stateAQSCode"
## [40] "airnow_countyAQSCode" "airnow_MSAName" "address"
## [43] "wrcc_type" "wrcc_serialNumber" "wrcc_monitorName"
## [46] "wrcc_monitorType" "deploymentType" "airnow_countryCode"
## [49] "airnow_stateCode" "airnow_timezone" "airnow_houseNumber"
## [52] "airnow_street" "airnow_city" "airnow_zip"
## [55] "airsis_Alias" "airsis_dataFormat" "airsis_provider"
## [58] "airsis_unitID" "aqs_address" "siteEstablishedDate"
## [61] "siteClosedDate" "GMTOffset" "owningAgency"
## [64] "cityName" "CBSAName" "tribeName"
## [67] "parameterCode" "parameterName" "POC"
## [70] "firstYearOfData" "lastSampleDate" "monitorType"
## [73] "reportingAgency" "PQAO" "collectingAgency"
## [76] "exclusions" "monitoringObjective" "lastMethodCode"
## [79] "lastMethod" "measurementScale" "NAAQSPrimaryMonitor"
## [82] "QAPrimaryMonitor"
# Examine the 'data' dataframe
dim(monitor$data)
## [1] 744 68
# This should always be true
identical(names(monitor$data), c('datetime', monitor$meta$deviceDeploymentID))
## [1] TRUE
Example 2: Basic manipulation of mts_monitor objects
The AirMonitor package has numerous functions that
work with mts_monitor objects, all of which begin with
monitor_
. If you need to do something that the package
functions do not provide, you can manipulate mts_monitor
objects directly as long as you retain the structure of the data
model.
Functions that accept and return mts_monitor objects include:
monitor_aqi()
monitor_collapse()
monitor_combine()
monitor_dailyStatistic()
monitor_dailyThreshold()
monitor_dropEmpty()
monitor_filter()
( aka
monitor_filterMeta()
)monitor_filterByDistance()
monitor_filterDate()
monitor_filterDatetime()
monitor_mutate()
monitor_nowcast()
monitor_replaceValues()
monitor_select()
( aka
monitor_reorder()
)monitor_selectWhere()
monitor_trimDate()
These functions can be used with the magrittr
package pipe operator (%>%
) as in the following
example:
# First, Obtain the monitor ids by clicking on dots in the interactive map:
%>% monitor_leaflet() NW_Megafires
# Calculate daily means for the Methow Valley from monitors in Twisp and Winthrop
<- "99a6ee8e126ff8cf_530470009_04"
TwispID <- "123035bbdc2bc702_530470010_04"
WinthropID
# Recipe to calculate Methow Valley August Means:
<-
Methow_Valley_AugustMeans
# 1) start with NW Megafires
%>%
NW_Megafires
# 2) select monitors from Twisp and Winthrop
monitor_select(c(TwispID, WinthropID)) %>%
# 3) average them together hour-by-hour
monitor_collapse(deviceID = 'MethowValley') %>%
# 4) restrict data to August
monitor_filterDate(20150801, 20150901) %>%
# 5) calculate daily mean
monitor_dailyStatistic(mean, minHours = 18) %>%
# 6) round data to one decimal place
monitor_mutate(round, 1)
# Look at the first week
$data[1:7,] Methow_Valley_AugustMeans
## datetime c2de3yc0jc_MethowValley
## 1 2015-08-01 20.3
## 2 2015-08-02 30.7
## 3 2015-08-03 12.1
## 4 2015-08-04 9.0
## 5 2015-08-05 3.7
## 6 2015-08-06 3.2
## 7 2015-08-07 11.0
Example 3: Advanced manipulation of mts_monitor objects
The following code demonstrates user creation of a custom function to
manipulate the data
tibble from a mts_monitor
object with monitor_mutate()
.
# Monitors within 100 km of Spokane, WA
<-
Spokane %>%
NW_Megafires monitor_filterByDistance(-117.42, 47.70, 100000) %>%
monitor_filterDate(20150801, 20150901) %>%
monitor_dropEmpty()
# Show the daily statistic for one week
%>%
Spokane monitor_filterDate(20150801, 20150808) %>%
monitor_dailyStatistic(mean) %>%
monitor_getData()
## # A tibble: 7 × 11
## datetime `70de0a70970655a0_530630047_04` a79f97f86cb2a7d7_1605500…¹
## <dttm> <dbl> <dbl>
## 1 2015-08-01 00:00:00 18.2 9.83
## 2 2015-08-02 00:00:00 47.1 31.4
## 3 2015-08-03 00:00:00 37.1 33.7
## 4 2015-08-04 00:00:00 7.31 9.70
## 5 2015-08-05 00:00:00 5.82 9.25
## 6 2015-08-06 00:00:00 3.74 7.46
## 7 2015-08-07 00:00:00 4.50 5.79
## # ℹ abbreviated name: ¹a79f97f86cb2a7d7_160550003_03
## # ℹ 8 more variables: `7216002af320d683_530650002_04` <dbl>,
## # `8c0517d4b648fe54_530750006_04` <dbl>,
## # `345833eaf05eac18_160090011_03` <dbl>, `9b8e3d84ace997b6_wrcc.e925` <lgl>,
## # `7eb0c7f361adfacb_160090010_03` <dbl>, b31e89974a3db049_160790017_04 <dbl>,
## # e7dee084705d75eb_160170003_03 <dbl>, c891e750c3fc35ef_530010003_04 <dbl>
# Custom function to convert from metric ug/m3 to imperial grain/gallon
<- function(x) { return( x * 15.43236 / 0.004546 ) }
my_FUN %>%
Spokane monitor_filterDate(20150801, 20150808) %>%
monitor_mutate(my_FUN) %>%
monitor_dailyStatistic(mean) %>%
monitor_getData()
## # A tibble: 7 × 11
## datetime `70de0a70970655a0_530630047_04` a79f97f86cb2a7d7_1605500…¹
## <dttm> <dbl> <dbl>
## 1 2015-08-01 00:00:00 61812. 33381.
## 2 2015-08-02 00:00:00 159990. 106509.
## 3 2015-08-03 00:00:00 125944. 114289.
## 4 2015-08-04 00:00:00 24824. 32914.
## 5 2015-08-05 00:00:00 19746. 31401.
## 6 2015-08-06 00:00:00 12688. 25319.
## 7 2015-08-07 00:00:00 15262. 19661.
## # ℹ abbreviated name: ¹a79f97f86cb2a7d7_160550003_03
## # ℹ 8 more variables: `7216002af320d683_530650002_04` <dbl>,
## # `8c0517d4b648fe54_530750006_04` <dbl>,
## # `345833eaf05eac18_160090011_03` <dbl>, `9b8e3d84ace997b6_wrcc.e925` <lgl>,
## # `7eb0c7f361adfacb_160090010_03` <dbl>, b31e89974a3db049_160790017_04 <dbl>,
## # e7dee084705d75eb_160170003_03 <dbl>, c891e750c3fc35ef_530010003_04 <dbl>
Understanding that monitor$data
is a just a dataframe of
measurements prepended with a datetime
column, we can pull
out the measurements and do analyses independent of the
mts_monitor data model. Here we look for correlations among the
PM2.5 time series.
# Pull out the time series data to calculate correlations
<-
Spokane_data %>%
Spokane monitor_getData() %>%
::select(-1) # omit 'datetime' column
dplyr
# Provide human readable names
names(Spokane_data) <- Spokane$meta$locationName
# Find correlation among monitors
cor(Spokane_data, use = "complete.obs")
## Spokane - Monroe St
## Spokane - Monroe St 1.0000000
## Coeur D'alene - Lancaster Rd. 0.6069221
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.5697110
## Rosalia - Josephine St 0.5230965
## us.16_345833 0.5377632
## Mobile_White_Salmon 0.4789670
## St. Maries 0.5269798
## Pinehurst 0.3571274
## Sandpoint 0.6250559
## Ritzville - Alder St 0.2926409
## Coeur D'alene - Lancaster Rd.
## Spokane - Monroe St 0.6069221
## Coeur D'alene - Lancaster Rd. 1.0000000
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.7166013
## Rosalia - Josephine St 0.6398958
## us.16_345833 0.6977911
## Mobile_White_Salmon 0.5974232
## St. Maries 0.8067682
## Pinehurst 0.7513249
## Sandpoint 0.7680706
## Ritzville - Alder St 0.3193411
## Spokane - Wellpinit Ford Rd (Spokane Tribe)
## Spokane - Monroe St 0.5697110
## Coeur D'alene - Lancaster Rd. 0.7166013
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 1.0000000
## Rosalia - Josephine St 0.5669491
## us.16_345833 0.6838587
## Mobile_White_Salmon 0.5944653
## St. Maries 0.6516175
## Pinehurst 0.5916583
## Sandpoint 0.7859552
## Ritzville - Alder St 0.3089826
## Rosalia - Josephine St
## Spokane - Monroe St 0.5230965
## Coeur D'alene - Lancaster Rd. 0.6398958
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.5669491
## Rosalia - Josephine St 1.0000000
## us.16_345833 0.8296218
## Mobile_White_Salmon 0.2921147
## St. Maries 0.6990201
## Pinehurst 0.5092533
## Sandpoint 0.5076184
## Ritzville - Alder St 0.7953586
## us.16_345833 Mobile_White_Salmon
## Spokane - Monroe St 0.5377632 0.4789670
## Coeur D'alene - Lancaster Rd. 0.6977911 0.5974232
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.6838587 0.5944653
## Rosalia - Josephine St 0.8296218 0.2921147
## us.16_345833 1.0000000 0.4046114
## Mobile_White_Salmon 0.4046114 1.0000000
## St. Maries 0.7906070 0.5136138
## Pinehurst 0.7222093 0.5176399
## Sandpoint 0.5780285 0.7119107
## Ritzville - Alder St 0.5874713 0.1093694
## St. Maries Pinehurst Sandpoint
## Spokane - Monroe St 0.5269798 0.3571274 0.6250559
## Coeur D'alene - Lancaster Rd. 0.8067682 0.7513249 0.7680706
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.6516175 0.5916583 0.7859552
## Rosalia - Josephine St 0.6990201 0.5092533 0.5076184
## us.16_345833 0.7906070 0.7222093 0.5780285
## Mobile_White_Salmon 0.5136138 0.5176399 0.7119107
## St. Maries 1.0000000 0.7679051 0.6232813
## Pinehurst 0.7679051 1.0000000 0.5626732
## Sandpoint 0.6232813 0.5626732 1.0000000
## Ritzville - Alder St 0.3896507 0.2514426 0.2851257
## Ritzville - Alder St
## Spokane - Monroe St 0.2926409
## Coeur D'alene - Lancaster Rd. 0.3193411
## Spokane - Wellpinit Ford Rd (Spokane Tribe) 0.3089826
## Rosalia - Josephine St 0.7953586
## us.16_345833 0.5874713
## Mobile_White_Salmon 0.1093694
## St. Maries 0.3896507
## Pinehurst 0.2514426
## Sandpoint 0.2851257
## Ritzville - Alder St 1.0000000
This introduction to the mts_monitor data model should be enough to get you started. Lots more examples are available in the package documentation.
Best of luck exploring and understanding PM 2.5 air quality data!