%\VignetteIndexEntry{An Introduction to MBHdesign} %\VignettePackage{MBHdesign} %\VignetteEngine{knitr::knitr} \documentclass[article,shortnames,nojss]{jss} %% almost as usual \author{Scott D. Foster\\Data61, CSIRO, Hobart, Tasmania, Australia} \title{An Introduction to \pkg{MBHdesign}} \date{\itshape\today} %% for pretty printing and a nice hypersummary also set: \Plainauthor{Scott D. Foster} %% comma-separated \Plaintitle{An Introduction/Tutorial to MBHdesign} %% without formatting \Shorttitle{MBHdesign} %% a short title (if necessary) \Abstract{ A robust scientific conclusion is the result of a rigorous scientific process. In observational ecology, this process involves making inferences about a population from a sample. The sample is crucial, and is the result of implementing a \textit{survey design}. A good survey design ensures that the data from the survey is capable of answering the research question. Better designs, such as spatially balanced designs, will also be as precise as possible given the constraints of the budget. The MBHdesign package is useful for creating spatially balanced designs. There are four tasks that it is intended to address: 1) designing spatially-balanced surveys using Balanced Adaptive Sampling \citep[BAS][]{rob13,rob17b}; 2) designing and analysing spatially-balanced surveys that incorporate existing legacy sites, using \citet{fos17a}; 3) designing spatially-balanced transect-based surveys using the methods described in \citet{fos20}, and; 4) designing spatially-balanced cluster designs \citet{fos23}. The first example in this tutorial generates a point-based design and shows how legacy sites can be incorporated. There are three steps to this process: \begin{enumerate} \item Altering inclusion probabilities for spatial balance, taking into account the location of legacy sites. This is done using the function \texttt{alterInclProbs}; \item Generating spatially balanced designs for a given set of inclusion probabilities, through the function \texttt{quasiSamp}; and \item Analysing some (made up) data using model-based methods (using \texttt{modEsti}). \end{enumerate} The second example in this tutorial generates a transect-based design over the same inclusion probabilities. This consists of just one substantive step: calling the \texttt{transectSamp} function. The third example in this tutorial generates a clustered sample, where groups of sites are chosen to be spatially similar but these clusteres are spatially segregated. This process relies on use of the \texttt{quasSamp.cluster} function. } \Keywords{Spatially-Balanced Survey Design, Balanced Adaptive Sampling, Transect, Cluster, \proglang{R}} \Plainkeywords{Spatially-Balanced Survey Design, Balanced Adaptive Sampling, Transect, Cluster, R} %% without formatting %% at least one keyword must be supplied \Address{ Scott D. Foster\\ CSIRO\\ Marine Laboratories\\ GPObox 1538\\ Hobart 7001\\ Australia E-mail: \email{scott.foster@csiro.au} } %\usepackage{fullpage} \usepackage{natbib} %\usepackage{url} \usepackage{hyperref} %\usepackage{nicefrac} \renewcommand{\thefootnote}{\fnsymbol{footnote}} \begin{document} %<>= <>= library( knitr) opts_chunk$set(cache=TRUE, message = FALSE, comment = "", dev="pdf", dpi=300, fig.show = "hold", fig.align = "center") @ %\begin{figure}[b] % \centering % \includegraphics[keepaspectratio=true, width=0.3\textwidth]{MB_Hub_MED.jpg} %\end{figure} \section*{First Things First} Before starting with this introduction to \texttt{MBHdesign}, we need to make sure that everything is set up properly. Much of this will vary from computer to computer, but you must have a working version of R installed (preferably the recent release). This vignette was created using \Sexpr{sessionInfo()$R.version$version.string}. It does not matter whether you prefer to use R through a development environment (such as RStudio) or through the command line -- the results will be the same. There are two ways to install \texttt{MBHdesign}. The first is through the \texttt{CRAN} repository. This version will not be updated very frequently, so it will not have the very latest developments (nor the associated bugs). This version can be installed by starting R and then submitting (at command line): <>= install.packages( "MBHdesign") ## or ## devtools::install_github( repo="Scott-Foster/MBHdesign", build_vignettes=FALSE) @ \noindent If you install from CRAN, you will be asked which repository you want to use. Just use one that is geographically close to where you are (or where your computer is). Installing from the github repos will give you the most up to date version. However, do take note of the \texttt{build\_vignettes\=FALSE} argument: the vignette can take a while to build (and has already been pre-built). Whichever way the package is installed, you'll have to load it: <>= library( MBHdesign) @ \noindent For illustration is is also good to fix the random number seed, so that this document is reproducible \textit{exactly}. However, and due to R evoloving, the results may differ between different versions of R. In particular, R/3.6 could be quite different from R/3.5. <>= set.seed( 747) #a 747 is a big plane @ \noindent Technical note about version on CRAN. CRAN has some strict rules for packages that it hosts. One of these is that the package must not take long to build. This vignette breaks this rule... As a work-around the version of this vignette in the CRAN package uses precompiled data (with the usePrecompiledData variable set to TRUE). However, on GitHub the package does not use precompiled data. You'll see locations in the code where the running of certain functions is conditional on this variable. Please don't be confused by it. <>= #TRUE if you want some of the examples shortened by using pre-saved output usePrecompiledData <- FALSE @ \noindent Now, we are good to go with the rest of the introduction. \section*{A Point-Based Design} Let's pretend that we want to generate $n=10$ samples on a grid of points (representing the centres of a tessellation). The grid of points consists of $N=100\times100=10000$ points in 2-dimensional space (spanning the interval $[0,1]$ in both dimensions). Let's also pretend that there are 3 legacy sites, that have been sampled in previous survey efforts, and we wish to revisit them in the current survey. The legacy sites are located at random throughout the study area. Here, I have generated it all in R (painstakingly), but in a real application, most of this information could be read in from file. <>= #number of samples n <- 10 #number of points to sample from N <- 100^2 #the sampling grid (offset so that the edge locations have same area) offsetX <- 1/(2*sqrt( N)) my.seq <- seq( from=offsetX, to=1-offsetX, length=sqrt(N)) X <- expand.grid( my.seq, my.seq) #the legacy sites (three of them) legacySites <- matrix( runif( 6), ncol=2, byrow=TRUE) #names can be useful colnames( X) <- colnames( legacySites) <- c("X1","X2") @ \subsection*{Inclusion Probabilities} Key to this whole design process is the concept of inclusion probabilities. Inclusion probabilities define the chance that any particular site will be part of the sample. So, if a site's inclusion probability is small, then the site is unlikely to be included into the sample. Specifying inclusion probabilities can improve efficiency of the sampling design. That is, standard errors can be reduced for a given number of samples. The `trick' is to specify inclusion probabilities so that the sites that should have highly variable observations are sampled more often \citep[e.g.][]{gra13}. In ecology, variance often increases with abundance \citep[due to Taylor's Power Law;][]{tay61}, so inclusion probabilities could be increased with abundance. If there is no knowledge about the area being sampled, then all sites should be given equal inclusion probabilities. Frequently, a formal requirement is to constrain the inclusion probabilities so that they sum to $n$. While \texttt{MBHdesign} does not require this constraint, it can be useful for communication purposed. Here, we are going to pretend that there is some gradient in the variance of the population under study. We stress that this is illustrative only. <>= #non-uniform inclusion probabilities inclProbs <- 1-exp(-X[,1]) #scaling to enforce summation to n inclProbs <- n * inclProbs / sum( inclProbs) #uniform inclusion probabilities would be inclProbs <- rep( n/N, times=N) #visualise image( x=unique( X[,1]), y=unique( X[,2]), z=matrix( inclProbs, nrow=sqrt(nrow(X)), ncol=sqrt(nrow( X))), main="(Undadjusted) Inclusion Probabilities", ylab=colnames( X)[2], xlab=colnames( X)[1]) #The legacy locations points( legacySites, pch=21, bg=grey(0.75), cex=1.5) @ \subsection*{Accommodating Legacy Sites} To generate a design that is spatially balanced \textit{in both the $n$ new sample sites and the legacy sites}, we adjust the inclusion probabilities. The adjustment \citep[see][]{fos17a} reduces the inclusion probabilities so that sites near legacy sites are less likely to be chosen in the new sample. <>= #alter inclusion probabilities # so that new samples should be well-spaced from legacy altInclProbs <- alterInclProbs( legacy.sites=legacySites, potential.sites=X, inclusion.probs = inclProbs) #visualise image( x=unique( X[,1]), y=unique( X[,2]), z=matrix( altInclProbs, nrow=sqrt(nrow(X)), ncol=sqrt(nrow( X))), main="Adjusted Inclusion Probabilities", ylab=colnames( X)[2], xlab=colnames( X)[1]) #The legacy locations points( legacySites, pch=21, bg=grey(0.75), cex=1.5) @ \noindent So, the inclusion probabilities have been reduced around the legacy sites. It is perhaps worth noting that the reduction in inclusion probabilities, due to the legacy sites, can be viewed as \textit{sequential}. This means that the reduction for any legacy site is in addition to the reduction of all of the other legacy sites -- there is no extra joint effect. Also, the adjustment is proportional to the original inclusion probability, so that a small inclusion probability and a large inclusion probability are both adjusted proportionally to the same amount. There are some other arguments to the \texttt{altInclProbs()} function (omitted for clarity here). These can be seen to refine the call and/or to make the computer to do its work quicker. Type \texttt{?altInclProbs} for more details. \subsection*{Generating the Design} Irrespective of how the inclusion probabilities were obtained, we can now use them to generate a spatially balanced design. <>= #generate the design according to the altered inclusion probabilities. samp <- quasiSamp( n=n, dimension=2, study.area=matrix( c(0,0, 0,1, 1,1, 1,0),ncol=2, byrow=TRUE), potential.sites=X, inclusion.probs=altInclProbs) #for faster sampling (large problems), # consider using quasiSamp.raster, which utilises SpatRaster formats #visualise image( x=unique( X[,1]), y=unique( X[,2]), z=matrix( altInclProbs, nrow=sqrt(nrow(X)), ncol=sqrt(nrow( X))), main="Adjusted Inclusion Probabilities", ylab=colnames( X)[2], xlab=colnames( X)[1]) #The legacy locations points( legacySites, pch=21, bg=grey(0.75), cex=1.5) points( samp[,1:2], pch=21) @ \noindent Voil\`{a}! A spatially balanced design that incorporates legacy sites. It is contained in the object \texttt{samp}, which looks like: <>= print( samp, row.names=FALSE) @ \noindent The columns of \texttt{samp} are: \begin{itemize} \item The sample locations in the \texttt{X1} and \texttt{X2} dimensions; \item The inclusion probability for that sampling location; and \item The row number (ID), of the original list of potential sites (X). \end{itemize} \subsection*{Analysis} After finalising the design, time comes to go and undertake the survey. For illustration, we do this \textit{in silico} and generate observations according to a pre-defined function \citep[following][amongst others]{fos17a}. <>= #generate some `observations' for the new sites Z <- 3*( X[samp$ID,1]+X[samp$ID,2]) + sin( 6*( X[samp$ID,1]+X[samp$ID,2])) #and some for the legacy sites Zlegacy <- 3*( legacySites[,1]+legacySites[,2]) + sin( 6*( legacySites[,1]+legacySites[,2])) @ \noindent These data can be analysed in two ways: 1) design-based, which uses minimal assumptions about the data; and 2) model-based, which attempts to describe more aspects of the data. See \citet{fos17a} for a more complete description. For design-based analysis we take a weighted average of the estimator for the legacy sites and the estimator for the new sites. In both cases the estimates follow \citet{hor52}. Please do read the section in \citet{fos17a} for comments on estimation, it could save you some grief. <>= #the proportion of legacy sites in the whole sample fracLegacy <- nrow( legacySites) / (n+nrow( legacySites)) #inclusion probabilities for legacy sites # (these are just made up, from uniform) LegInclProbs <- rep( nrow( legacySites) / N, nrow( legacySites)) #estimator based on legacy sites only legacyHT <- (1/N) * sum( Zlegacy / LegInclProbs) #estimator based on new sites only newHT <- (1/N) * sum( Z / inclProbs[samp$ID]) mean.estimator <- fracLegacy * legacyHT + (1-fracLegacy) * newHT #print the mean print( mean.estimator) @ \noindent This is pretty close to the true value of $2.9994$. To get a standard error for this estimate, we use the \texttt{cont\_analysis()} function from the \texttt{spsurvey}\footnote{In versions of spsurvey, prior to version 5, this was acievied through \texttt{total.est()}} \citep{dum21}, which implements the neighbourhood estimator of \citet{ste03}. <>= #load the spsurvey package library( spsurvey) #rescale the inclusion probs # (the sample frames are the same in legacy and new sites) tmpInclProbs <- ( c( inclProbs[samp$ID], LegInclProbs) / n) * (n+nrow(legacySites)) #create a temporary data frame tmpDat <- data.frame( siteID= c( samp$ID, paste0( "legacy", 1:nrow(legacySites))), wgt=1/tmpInclProbs, xcoord=c(samp$X1,legacySites[,1]), ycoord=c(samp$X2,legacySites[,2]), Z=c(Z,Zlegacy)) #calculate the standard error se.estimator <- cont_analysis( tmpDat, vars="Z", weight="wgt", xcoord="xcoord", ycoord="ycoord")$Mean$StdError #print it print( se.estimator) @ For model-based mean and standard errors we follow the `GAMdist' approach in \citet{fos17a}. <>= tmp <- modEsti( y=c( Z, Zlegacy), locations=rbind( X[samp$ID,], legacySites), includeLegacyLocation=TRUE, legacyIDs=n + 1:nrow( legacySites), predPts=X, control=list(B=1000)) print( tmp) @ In this case, the standard error estimates are quite different. On average, they tend to be (when there are only a few legacy sites). Even so, this level of difference is unusual. \subsection*{Last Things Last} The only remaining thing to do is to tidy up our workspace. First, to export our sample locations. Second, to remove all objects for this analysis from your workspace. <>= ##write csv if wanted #write.csv( samp, file="pointSample1.csv", row.names=FALSE) #tidy rm( list=c( "samp", "tmp", "se.estimator","tmpDat","tmpInclProbs", "mean.estimator","newHT","legacyHT","LegInclProbs", "fracLegacy","Zlegacy","Z","X","altInclProbs", "n", "N", "my.seq","inclProbs", "offsetX", "legacySites")) @ \newpage \section*{Transect-Based Surveys} Let's now pretend that we want to generate $n=10$ \textbf{transect} locations over a grid of points (representing the centres of a tessellation). We will use the methods described in \citet{fos20}. The transects are linear and are each of length 0.15, and the grid of points consists of $N=100\times100=10000$ points in 2-dimensional space (spanning the interval $[0,1]$ in both dimensions). For this example, we generate all information about the survey area and inclusion probabilities but in a real application, most of this information is likely to come from file (e.g. a shapefile). A small word of warning: this example will take a little while to run. Sorry. Whilst choices have been made to reduce the computation, at the expense of accuracy, the computation is still quite intensive. Another note of warning: while the random number seed is set, different versions of R may produce different results. This is due to R evoloving. In particular, R/3.6 could be quite different from R/3.5. <>= set.seed( 747) #I'm currently on a 787, so it *almost* seems appropriate #number of transects n <- 10 #number of points to sample from N <- 100^2 #the sampling grid (offset so that the edge locations have same area) offsetX <- 1/(2*sqrt( N)) my.seq <- seq( from=offsetX, to=1-offsetX, length=sqrt(N)) X <- expand.grid( my.seq, my.seq) colnames( X) <- c("X1","X2") @ \subsection*{Inclusion Probabilities} Here, like in the previous example, we are going to pretend that there is some gradient in the variance of the population under study. We stress that this is illustrative only. The transect probabilities are set-up to reflect this. <>= #non-uniform inclusion probabilities inclProbs <- 1-exp(-X[,1]) #scaling to enforce summation to n inclProbs <- n * inclProbs / sum( inclProbs) #uniform inclusion probabilities would be inclProbs <- rep( n/N, times=N) #visualise image( x=unique( X[,1]), y=unique( X[,2]), z=matrix( inclProbs, nrow=sqrt(nrow(X)), ncol=sqrt(nrow( X))), main="(Undadjusted) Inclusion Probabilities", ylab=colnames( X)[2], xlab=colnames( X)[1]) @ We will need to tell the transect sampling function some information about the size and shape of the transects. For this purpose, the transects should be thought of as a set of points (arranged on a line for a linear transect). For the particular case of linear transects the sampling function \texttt{transectSamp} arranges this for us, but we still need to give some information. The choices made here are compromised in favour of speed-of-execution rather than accuracy. In a real (and final) design, you may want to consider finer resolutions. These are the only control options needed in this example, but we note that there are others, which mostly control different aspects of the algorithm (not the transect setup itself). <>= #my.control is a list that contains my.control <- list( #the type of transect transect.pattern="line", #the length of transect line.length=0.15, #the number of points that define the transect transect.nPts=15, #the number of putative directions that a transect can take nRotate=9 ) @ \subsection*{Take the Transect Sample} The sample is obtained by a relatively straight-forward call. They are also plotted over the inclusion probabilities. <>= #take the transect sample if( !usePrecompiledData){ samp <- transectSamp( n=n, potential.sites=X, inclusion.probs=inclProbs, control=my.control) } else{ samp <- readRDS( system.file( "extdata", "transectSamp1.RDS", package="MBHdesign"))} image( x=unique( X[,1]), y=unique( X[,2]), z=matrix( inclProbs, nrow=sqrt(nrow(X)), ncol=sqrt(nrow( X))), main="(Undadjusted) Inclusion Probabilities", sub="10 Transects", ylab=colnames( X)[2], xlab=colnames( X)[1]) points( samp$points[,5:6], pch=20, cex=0.6) @ \subsection*{Last Things Last} The only remaining thing to do is to tidy up our workspace. First, to export our sample locations. Second, to remove all objects for this analysis from your workspace. <>= ##write csv #write.csv( samp$transect, file="transectSample1.csv", row.names=FALSE) #tidy rm( list=c( "X", "inclProbs", "samp", "my.control", "my.seq", "offsetX", "n", "N")) @ \newpage \section*{A Harder/Constrained Transect-Based Survey} The design problem described in the previous section is pretty straight-forward as the sample space doesn't have any constraints upon it. Here, we make things more complicated in this respect, by forcing transects to go down-hill. This mimics towed-platforms, such as underwater image tools, and was enforced in the seamount example in \citet{fos20}. As an illustration, we utilise the volcano altitude dataset from the excellent \texttt{MASS} package \citep{ven02}. The inclusion probabilities here are assumed to be uniform, so that the transects will be spatially-balanced, and even, throughout the study area. Forcing down-hill transects isn't the only type of constraint. However, it is illustrative. Another type of constraint that we regularly encounter is, when sampling some certain types of seamounts, the desire to run the transects from `peak-to-base' so that all transects must start at the highest point of the seamount. This kind of constraint can be incorporated within \texttt{MBHdesign} using the function \texttt{findTransFromPoint}, which only allows transects from a specified set of points. <>= library( MASS) #for the data library( fields) #for image.plot #library( MBHdesign) #for the spatial design and constraints set.seed( 717) #Last plan I was on #number of transects n <- 20 #load the altitude data data( volcano) #this is a matrix n.x <- nrow( volcano) n.y <- ncol( volcano) image.plot( x=1:n.x, y=1:n.y, z=volcano, main="Mountain Height (m)", asp=1) #format for MBHdesign functions pot.sites <- expand.grid( x=1:n.x, y=1:n.y) pot.sites$height <- as.vector( volcano) #details of the transects (see Details section in ?transectSamp) vol.control <- list( transect.pattern="line", transect.nPts=10, line.length=7, nRotate=11, mc.cores=1) #In a real application, transect.nPts and nRotate may need to be increased #1 cores have been used to ensure generality for all computers. # Use more to speed things up @ So, that is the data. Note that there are locations where transects could descent and then ascend. The `hole' (crater) in the middle of the mountain is the largest and most obvious example. To avoid transects that ascend and descend, we use an \texttt{MBHdesign} function <>= if( !usePrecompiledData){ vol.constraints <- findDescendingTrans( potential.sites = pot.sites[,c("x","y")], bathy=pot.sites$height, in.area=rep( TRUE, nrow( pot.sites)), control=vol.control) } else{ vol.constraints <- readRDS( system.file( "extdata", "transectConstraints1.RDS", package="MBHdesign"))} #this is a matrix with nrow given by the number of sites and ncol by # the number of rotations around each site print( dim( vol.constraints)) #The contents describe how the transect lays over the landscape #So, there are 15592 putative transects that ascend and descend # (and can't be used in the sample) table( as.vector( vol.constraints)) #convert to TRUE/FALSE #Note that the final possible transect type ('descendAndNA') is # not present in these data #If present, we would have to decide to sample these or not vol.constraints.bool <- matrix( FALSE, nrow=nrow( vol.constraints), ncol=ncol( vol.constraints)) vol.constraints.bool[vol.constraints %in% c("descend")] <- TRUE #Let's get a visual to see what has just been done. tmpMat <- matrix( apply( vol.constraints.bool, 1, sum), nrow=n.x, ncol=n.y) image.plot( x=1:n.x, y=1:n.y, z=tmpMat, main="Number of Transects", sub="Transects centered at cell (max 11)", asp=1) #There aren't any transects that are centred on ridges or depressions. @ The task is to now sample the reduced sampling frame. Fortunately, with the constraint matrix just produced, this is pretty easy. <>= #take the sample if( !usePrecompiledData){ volSamp <- transectSamp( n=n, potential.sites=pot.sites[,c("x","y")], control=vol.control, constrainedSet=vol.constraints.bool) } else{ volSamp <- readRDS( system.file( "extdata", "transectSamp2.RDS", package="MBHdesign"))} #visualise the sample image.plot( x=1:n.x, y=1:n.y, z=volcano, main="Uniform Probability Transect Sample", asp=1) points( volSamp$points[,c("x","y")], pch=20) @ The only remaining thing to do is to tidy up our workspace. First, to export our sample locations. Second, to remove all objects for this analysis from your workspace. <>= ##write csv #write.csv( volSamp$transect, file="volcanoSample1.csv", row.names=FALSE) #tidy rm( list=c( "volSamp", "tmpMat", "vol.constraints.bool", "vol.constraints", "vol.control", "pot.sites", "n", "n.x", "n.y", "volcano")) @ \newpage \section*{Clustered Surveys} The very nature of spatially-balanced designs implies that the sampling locations will be well spread out in space. Whilst this is statistically appealing (as it is usually efficient) it can also create logistic difficulties -- for example transit time between sampling locations is large. To overcome this logistical constraint, the function \texttt{quasiSamp.cluster} has the ability to group sampling locations into clusters. Spatial balance is maintained at both the between-cluster and within-cluster levels \citep[see][for details]{fos23}. A clustered sample, consisting of 10 clusters or 5 samples, will now be generated using an exaggerated version of data representing Soil Moisture at the Root Zone (SMRZ) over a 1km raster of the Australian Capital Territory \citep[see][]{fro18}. <>= #need raster functions library( terra) #import example data egDat <- rast(system.file( "extdata", "ACT_DemoData.grd", package="MBHdesign"))$soilMoisture values( egDat) <- ( values( egDat) - min( values( egDat), na.rm=TRUE)) * 5 @ \subsection*{Take the Cluster Sample} The process of taking a clustered sample is almost as straightforward as a regular point-based sample, and the process mostly follows that of taking a point sample. However, some extra information is needed about the clusters themselves. The function quasiSamp.cluster performs all the necessary computations and produces an obvious output. We note that quasiSamp.cluster, like quasiSamp.raster, accepts only raster objects for the inclusion probabilities. <>= set.seed( 727) #take the cluster sample #increase mc.cores for faster processing if( !usePrecompiledData){ samp <- quasiSamp.cluster( nCluster=10, clusterSize=5, clusterRadius=5, inclusion.probs = egDat, mc.cores=1) } else{ samp <- readRDS( system.file( "extdata", "clusterSamp1.RDS", package="MBHdesign"))} #plot it over the egData data plot( egDat) #the sample points points( samp$x, samp$y, pch=20, cex=0.5) #the centres of the clusters # (not sample points but potentially useful nevertheless) points( attr( samp, "clusterDes")[,c("x","y")], pch=1, col='red', cex=0.5) @ \subsection*{Take a Clustered Oversample} If/when an over-sample is required \citep[e.g.][lar08,van18], so that a survey can be expanded at future dates, including replacement of non-response sites (missing values), then an extra step is required. This is because the working inclusion probabilities, which are calculated within \texttt{quasiSamp.clust}, are calculated for the number of \textit{planned observations} and not the size of the \textit{planned over-sample}. Within the framework of MBHdesign this is achieved by first calculating the working inclusion probabilities \citep[see][]{fos23} for the `correctly sized' design and then useing these as the basis for an over-sample. In the design that follows, an over sample of clusters (15 drawn for 10 initial target) is taken. The planned cluster size is 5, but 10 are taken for each cluster to allow for drop-out. A property of BAS is that any contiguous set of samples is spatially-balanced, and this applies within each cluster and between clusters too. <>= #Create the working probabilties for the correct sized cluster. if( !usePrecompiledData){ workProbs <- alterInclProbs.cluster( nCluster=15, clusterSize=5, mc.cores=1, clusterRadius=5, inclusion.probs=egDat) } else{ workProbs <- readRDS( system.file( "extdata", "clusterWorkProbs1.RDS", package="MBHdesign"))} #take the (over-sample) set.seed( 747) overSamp <- quasiSamp.cluster( nCluster=15, clusterSize=10, clusterRadius=5, working.inclusion.probs = workProbs) #plot the results par( mfrow=c(1,2)) plot( egDat, main="Planned and Spare points") #the planned sample points( overSamp[overSamp$cluster<=10 & overSamp$point<=5,c("x","y")], cex=0.5) #the over-sample (within clusters 1:10) points( overSamp[overSamp$cluster<=10 & overSamp$point>5,c("x","y")], cex=0.5, col='red') plot( egDat, main="Over-sampled clusters") #the overs-sampled clusters (themselves oversampled) points( overSamp[overSamp$cluster>10 & overSamp$point<=5,c("x","y")], cex=0.5) points( overSamp[overSamp$cluster>10 & overSamp$point>5,c("x","y")], cex=0.5, col='red') @ \subsection*{Last Things Last} The only remaining thing to do is to tidy up our workspace. First, to export our sample locations. Second, to remove objects for this analysis from your workspace. <>= ##write csv #write.csv( as.data.frame( overSamp), # file="clusterSamp1.csv", row.names=FALSE) #tidy rm( list=c("egDat","overSamp","workProbs","samp","usePrecompiledData")) @ \section*{Acknowledgements} This package started life as part of the Marine Biodiversity Hub, a collaborative partnership supported through funding from the Australian Government's National Environmental Science Program. Our goal is to assist decision-makers to understand, manage and conserve Australia's environment by funding world-class biodiversity science. NESP Marine Biodiversity Hub partners include the Institute for Marine and Antarctic Studies, University of Tasmania; CSIRO, Geoscience Australia, Australian Institute of Marine Science, Museum Victoria, Charles Darwin University, University of Western Australia, NSW Office of Environment and Heritage, NSW Department of Primary Industries and the Integrated Marine Observing System. You can check out the project \href{https://www.nespmarine.edu.au/project/project-d2-standard-operating-procedures-survey-design-condition-assessment-and-trend}{here}. Since this initial project, its life has been moulded by the needs of other projects -- notably the \href{https://www.csiro.au/en/research/indigenous-science/managing-country/koala-monitoring-program}{National Koala Monitoring Program}. %\bibliographystyle{authordate1} %\bibliography{/home/fos085/zoteroFiles/JabRefPDF/SDFReferences} \bibliography{./MBHdesign} \section*{Appendix} \subsection*{Computational details} This vignette was created using the following R and add-on package versions <>= toLatex(sessionInfo()) @ \end{document}