Welcome to the neonSoilFlux
package! This vignette will
guide you through the process of using this package to acquire and
compute soil CO\(_{2}\) fluxes at
different sites in the National Ecological Observatory Network.
You can think about this package working in two primary phases:
acquire_neon_data
). This includes:
compute_neon_flux
).We split these two functions in order to optimize time and that both
were fundamentally different processes. Acquiring the NEON data makes
use of the neonUtilities
package.
This package takes the guess work out of which data products to
collect, hoping to reduce the workflow needed. We rely very much on the
tidyverse
philosophy for computation and coding here.
Load up the relevant libraries:
Let’s say we want to acquire the NEON soil data at the
SJER
site during the
month June in 2021:
The output out_env_data
for
acquire_neon_data
is a list of lists:
site_data
, a nested data frame
containing measurements for the required flux gradient model during the
given time period.site_megapit
, a nested frame
containing specific information about soils at the site (for bulk
density calculations, etc)Two required inputs are needed to run the function acquire_neon_data:
time_frequency
, which is 30
minutes (the default) or the 1 minute data (currently untested) and if
we download provisional
NEON data.As the data are acquired various messages from the
loadByProduct
function from the neonUtilities
package are shown - this is normal. Products are acquired from each
spatial location (horizontalPosition
) or vertical depth
(verticalPosition
) at a NEON site
Outputs for acquire_neon_data
are two nested data
frames:
site_data
This contains three variables: the
measurement name (one of soilCO2concentration
,
VSWC
(soil water content), soilTemp
(soil
temperature), and staPres
(atmospheric pressure)),
monthly_mean
contains the mean value of the measurement at
each horizontal and vertical depth. We compute the monthly mean using a
bootstapped technique. data
which contains the stacked
variables acquired from neonUtilities - the horizontal and vertial
positions, timestamp (in UTC), associated values, the QF flag (0 = pass,
1 = fail, LINK)site_megapit
: the nested data frame of the soil
sampling data, found here (LINK). This data table is essential what is
reported back from acquiring the data product from NEON.For each data product, the acquire_neon_data
function
also performs two additional checks:
swc_correct
. Information about regarding this
correction is found here: LINK.
Once updated sensors are installed in the future we will depreciate this
function.The monthly mean is utilized when a given measurement fails final QF
checks. This function is provided by code
from Zoey Werbin. For a
location (horizontalPosition
) given depth and A monthly
mean is computed when there are at least 15 days of measurements. Assume
you have a vector of measurements \(\vec{y}\), standard errors \(\vec{\sigma}\), and expanded uncertainty
\(\vec{\epsilon}\) (all of length \(M\)) that passes the QF checks in a given
month. The expanded uncertainty \(\vec{\epsilon}\) is generated by NEON to be
include the 95%
confidence interval. We have that \(\vec{\sigma}_{i}\leq\vec{\epsilon}_{i}\).
We define the bias \(\vec{b}=\sqrt{\left(\vec{\epsilon}\right)^{2}-\left(\vec{\sigma}\right)^{2}}\)
to be the quadrature difference between the expanded uncertainty and the
standard error.
We generate a bootstrap sample of the mean \(\overline{y}\) and standard error \(\overline{s}\) the following ways. Here we set the number of bootstrap samples \(N\) to be 5000. Entries for \(\overline{y}_{i}\) and \(\overline{s}_{i}\) are determined by the following:
R
will recycle the
vector \(\vec{y}\) so that this sample
is of length \(M\). We will call the
sample of \(\vec{y}\) as \(\vec{x}\).Once that is complete, the reported monthly mean and standard deviation is \(\overline{\overline{y}}\) and \(\overline{s}\).
With the resulting output from acquire_neon_data
, you
can then unnest the different data frames to make plots, for
example:
Once we have out_env_data
from
acquire_neon_flux
, we then compute the fluxes at this
site:
out_fluxes <- compute_neon_flux(input_site_env = out_env_data$site_data,
input_site_megapit = out_env_data$site_megapit
)
The resulting data frame out_fluxes
has the following
variables:
startDateTime
: Time period of measurement (as
POSIXct)horizontalPosition
: Sensor location where flux is
computedflux_compute
: A nested tibble with variables (1)
flux
, flux_err
, and method
(one
of 4 implemented)diffusivity
: Computation of surface diffusivityVSWCMeanQF
: QF flag for soil water content across all
vertical depths at the given horizontal position: 0 = no issues, 1 =
monthly mean used in measurement, 2 = QF failsoilTempMeanQF
: QF flag for soil temperature across all
vertical depths at the given horizontal position: 0 = no issues, 1 =
monthly mean used in measurement, 2 = QF failsoilCO2concentrationMeanQF
: QF flag for soil CO2
concentration across all vertical depths at the given horizontal
position: 0 = no issues, 1 = monthly mean used in measurement, 2 = QF
failstaPresMeanQF
: QF flag for atmospheric pressure across
all vertical depths at the given horizontal position: 0 = no issues, 1 =
monthly mean used in measurement, 2 = QF failA QF measurement fails when there is a monthly mean could not be computed for a measurement. Note that this would cause all flux calculations to fail at that given horizontal position.
You can see the distribution the QF flags for each environmental
measurement with env_fingerprint_plot
:
Similarly, you can see the distribution of QF flags for each
diffusivity and flux computation with
flux_fingerprint_plot
: