\documentclass[letterpaper]{article} %\documentclass[a4paper]{article} \usepackage{Sweave} \usepackage{amsmath} \usepackage[round]{natbib} %\usepackage{hyperref} %\usepackage{url} %\usepackage[left=2.5cm,right=2.5cm,top=2.5cm]{geometry} \usepackage{geometry} \newcommand{\code}[1]{\texttt{#1}} \newcommand{\pkg}[1]{\textsf{#1}} \newcommand{\R}{\pkg{R}} \newcommand{\subsectionF}[1]{ \subsection*{Function \code{#1}} \addcontentsline{toc}{subsection}{Function \code{#1}} } %%\VignetteIndexEntry{Vignette for the BHH2 package} %%\VignetteDepends{BHH2} %% headings ======================================================== %\pagestyle{myheadings} \markright{\hfill BHH2 \hfill} %% headings ======================================================== \begin{document} \SweaveOpts{engine=R,eps=TRUE,pdf=TRUE,width=5,height=3,strip.white=true} \setkeys{Gin}{width=\textwidth} \title{Some examples using the \pkg{BHH2} package\footnote{ \pkg{BHH2} current version: 0.2-1}} \author{Ernesto Barrios\\ University of Wisconsin-Madison} \date{March, 2005} \maketitle \tableofcontents <>= options(width=76) library(BHH2) @ \section{Introduction} Many of the calculations and figures in \emph{Statistics for Experimenters II} (\citet*{BHH2-2005}) can be done calling only a few number of \R\ functions. There are some plots and computations more complex though. The purpose of the functions in the \pkg{BHH2} package is to make easier some of these more elaborated procedures. We have left apart, however, a set of functions for Bayesian screening and follow-up designs, discussed in chapter 7 of the book, that we collected in the \pkg{BsMD} package (\citet{R:BsMD-2005}). Most of the functions in the package are still under construction and can be improved for computational efficiency and to extend their applicability. It is assumed that the reader knows the basics of \R, some elements of plotting and the use of the function \texttt{lm} for the fitting of linear models. Various documents for the understanding and exploitation of the \R\ language are available at CRAN (\code{http://cran.r-project.org}). In particular, \emph{An Introduction to R} by \citet{VS-2005}, is enough to understand all the functions in the \pkg{BHH2} package and if necessary to modify them to better fit your needs. In this document we introduce most of the functions included in the \pkg{BHH2} package: \code{dotPlot}, \code{anovaPlot}, \code{lambdaPlot}, \code{permtest} and \code{ddDesMatrix}. A list of all the functions and data sets can be found in the help pages. To illustrate the use of the functions just mentioned, we work out some of the examples in \emph{Statistics for Experimenters II}, referred here after as BHH2. \section{Permutation Test} \subsectionF{permtest} In the two-samples problem ($(x_1,\dots,x_n)$ and $(y_1,\dots,y_m)$), the statistics $t$ and $F$ are computed for location and dispersion comparisons, and their $p$-values determined. Spooled variance is used in the equal means test. Next, the observations are combined in different samples as \[ (x_1^{(i)},\dots,x_n^{(i)})\ \text{and} \ (y_1^{(i)},\dots,y_m^{(i)}),\ \ \ \ i=1,\dots,N \] where $N=\binom{n+m}{n}$ is the total number of possible combinations. For each samples pair, $t^{(i)}$ and $F^{(i)}$ statistics are computed and then it is determined how many $t^{(i)}\leq t$ and $F^{(i)}\leq F$, to calculate the corresponding percentiles from the permutation (randomization) distribution. In the one-sample case, all the $N=2^n$ samples $(\pm x_1,\dots,\pm x_n)$ are generated and later determine how many $t^{(i)} \leq t$, to calculate the sample's percentile from the permutation distribution. Function \code{permtest} computes the observed $t$ and $F$ statistics and reports the corresponding $p$-values from the theoretical and permutation distributions. The following example corresponds to the Modified Fertilizer Mixtures for Tomato Plants example (BHH2, Ch.~3) . Samples from 2 different fertilizer mixtures are compared. In addition to the statistics and percentiles, the function \code{permtest} reports \code{N}, the number of different samples generated to build the permutation distribution. <>= # Permutation test for Tomato Data #cat("Tomato Data (not paired):\n") data(tomato.data) attach(tomato.data) a <- pounds[fertilizer=="A"] b <- pounds[fertilizer=="B"] permtest(b,a) detach() @ The Boy's Shoes example is a randomized paired comparison of materials $A$ and $B$ (BHH2, Ch.~3). It is of interest to test whether there is a significant difference between materials or not. <>= # Permutation test for Boy's Shoes Example #cat("Shoes Data (paired):\n") data(shoes.data) permtest(shoes.data$matB-shoes.data$matA) @ The function is limited to small sample sizes ($N<20$). See also \pkg{exactRankTests} and \pkg{DAAG} packages for other functions for permutation tests. \section{Dot Diagrams} \subsectionF{dotPlot} The \code{dotPlot} function produces a scatter plot similar to a stem-and-leaf plot or a histogram. Stacked or partially overlapped dots are used to display the frequency distribution. The dot plot has the nice property of displaying all the individual observations for moderated size data sets, useful to visualize reference distributions. In this example we display the dot plot of the yield data in the Industrial Process Example. \begin{center} <>= par(mar=c(4,1,2,1),mgp=c(2,1,0),cex=0.7) data(tab03B1) #stem(tab03B1$yield) dotPlot(tab03B1$yield,main="Dot plot: Industrial Process Example",xlab="yield") @ \end{center} The following plot represents the reference distribution of 191 differences between averages of disjoint sets of 10 observations of the Industrial Process Example. See figures 3.3 and 3.5a and tables 3B of BHH2. \begin{center} <>= par(mar=c(4,1,2,1),mgp=c(2,1,0),cex=0.7) data(tab03B2) plt <- dotPlot(tab03B2$diff10,xlim=2.55*c(-1,+1),xlab="differences", main="Dot plot: Reference Distribution of Differences") segments(1.3,0,1.3,max(plt$y),lty=2) #vertical line at x=1.3 text(1.3,max(plt$y),labels=" 1.30",adj=0) @ \end{center} \subsectionF{dots} The function \code{dots} is used by the functions \code{dotPlot} and \code{anovaPlot} to display the dots. See the help pages for details. \section{Graphical Anova} \subsectionF{anovaPlot} The \emph{anova plots} display in the same plot scatter diagrams of the scaled factor effects and the residuals, as reference distribution, so visual inspection of the plot would suggest which factor effects may be different from the experimental and model error. To make a fair comparison between the factor effects and the residuals distribution, it is necessary to scale the effects properly. For example, for the factor $A$, the effect is multiplied by $\sqrt{n/k}$, where $k$ and $n$ are the degrees of freedom of factor $A$ and the residuals respectively, taken from the table of analysis of variance. The next example is the Penicillin Yield experiment, in BHH2, Ch.~4. A randomized block experiment is considered in the study of the process of manufacturing penicillin under different treatments ($A$, \dots, $D$). The yield of the process is the response and the different product blends are used as blocking factor. A anova model (\code{aov}) is fitted and its anova plot displayed. \begin{center} <>= par(mfrow=c(1,1),mar=c(3,1,2,1),cex=0.7) data(penicillin.data) penicillin.aov <- aov(yield~blend+treat,data=penicillin.data) anovaPlot(penicillin.aov,main="Anova plot: Penicillin Manufacturing Example", labels=TRUE,cex.lab=0.6) @ \end{center} The following example corresponds to the Toxic Agents example in BHH2, chapter 8, taken from \citet{Box&Cox-1964}. The response is the survival time of groups of animals. The kind of poison and the type of treatment used with the animals are the experimental factors. The anova plot, similar to figure 8.5a. in BHH2 is generated. \begin{center} <>= par(mfrow=c(1,1),mar=c(3,1,2,1),cex=0.7) data(poison.data) poison.lm <- lm(y~poison*treat,data=poison.data) anovaPlot(poison.lm,main="Anova plot: Toxic Agents Example",cex.lab=0.6) @ \end{center} The function \code{anovaPlot} can be used in simple cases of split-plot experiments. Consider for example the Corrosion Resistance Study in BHH2, Ch 9. The response is the resistance against corrosion of steel bars treated under different temperatures (\code{heats}) and with distinct coatings (\code{coating}) allocated in 6 castings (\code{run}). See table 9.1. The factor \code{heats} is allocated in the whole plot while the \code{coatings} factor and the interaction \code{heats:coating} in the sub-plot. The proper comparison of \code{heats} levels should be done with respect to the castings (\code{run}) as whole-plot errors. The factors \code{coating} and \code{heats:coating} should be compared against the residuals distribution (the sub-plot errors). Compare the plot below with figure 9.1 in BHH2. \begin{center} <>= par(mfrow=c(1,1),mar=c(3,1,2,1),cex=0.7) data(corrosion.data) corrosion.aov <- aov(resistance~heats+run+coating+heats:coating,data=corrosion.data) anovaPlot(corrosion.aov,main="Anova plot: Corrosion Resistance Example", cex.lab=0.6) @ \end{center} \section{Lambda Plot} \subsectionF{lambdaPlot} The one-parameter Box-Cox transformation of the response $y$ is defined as \[ Y=\frac{y^\lambda-1}{\lambda} \] An empirical determination of the parameter $\lambda$ is presented in chapter 8 of BHH2. \emph{Lambda plots} are also introduced as an alternative way for finding appropriate values of $\lambda$. Basically, a lambda plot of a linear model is the trace over various values of $\lambda$ of the relevant statistics: coefficients' $t$ or $F$ statistics, or the model's global $F$-ratio. In the next example it is constructed the lambda plot for the model fitted for the Toxic Agents example, introduced in the last section. In this case, the coefficients' $F$-statistic are traced for $-2 \leq \lambda \leq 1$. \begin{center} <>= par(mar=c(4,3,2,1),mgp=c(2,1,0),cex=0.7) # Lambda Plot tracing F values. data(poison.data) lambdaPlot(poison.lm,lambda=seq(-2,1,by=.1),stat="F",global=FALSE,cex=0.6, main="Lambda Plot: Toxic Agents Example") @ \end{center} To illustrate the lambda plots when tracing the coefficients' $t$-statistic, consider the Woolen Thread experiment, presented in chapter 12 and used previously in \citet{Box&Cox-1964}. A $3^3$ factorial design was run with lifetime as response (\code{y}) affected by 3 factors: length (\code{x1}), amplitude (\code{x2}) and load (\code{x3}). A second degree model was fitted to the original response and its Box-Cox transformation for various values of $\lambda$ ($-1 \leq \lambda \leq 1$). Compare the plot below with figure 12.7 in BHH2. The printed table will allow you to match the labels in the figure with the terms in the model. \begin{center} <>= # Lambda Plot tracing t values. par(mar=c(4,3,2,1),mgp=c(2,1,0),cex=0.7) data(woolen.data) woolen.lm <- lm(y~x1+x2+x3+I(x1^2)+I(x2^2)+I(x3^2)+I(x1*x2)+I(x1*x3)+I(x2*x3)+I(x1*x2*x3),data=woolen.data) lambdaPlot(woolen.lm,main="Lambda plot: Woolen Thread Example (2nd order model)", stat="t",cex=0.6) @ \end{center} \noindent The "global-$F$" lambda plot for the corresponding first order model in the Woolen Thread example is shown next. (See BHH2, figure 12.8.) \begin{center} <>= par(mar=c(4,3,2,1),mgp=c(2,1,0),cex=0.7) # Lambda Plot tracing F values. lambdaPlot(lm(y~x1+x2+x3,data=woolen.data),lambda=seq(-1,1,length=31), main="Lambda plot: Woolen Thread Example (1st order model)",stat="F",global=TRUE) @ \end{center} \section{Design Matrices} \subsectionF{ffDesMatrix} Function \code{ffDesMatrix} is used to build 2-level (-1,+1) factorial designs. If generators different from \code{NULL} are provided to the function, the corresponding fractional factorial is built. (See BHH2, Table 6.22, p 272.) For example, the design matrix of a $2^{5-1}$ fractional factorial design with defining relation $I=12345$, is built by <>= print(X <- ffDesMatrix(5,gen=list(c(-5,1,2,3,4)))) @ See the \code{ffDesMatrix} help pages for details. \subsectionF{ffFullMatrix} The function \code{ffFullMatrix} is a nice companion to \code{ffDesMatrix}. Provided what factors to consider and the desired interaction order, from a given design matrix, the function \code{ffFullMatrix} builds the corresponding model matrix including columns for the constant term and the blocking factors. For example, from the fractional design just built above, we construct the matrix for a model on factors 1, 2, 3 and 4, with main effects and second order interactions and the 5th factor as blocking factor. <>= ffFullMatrix(X,x=c(1,2,3,4),maxInt=2,blk=X[,5])$Xa @ \noindent The function \code{ffFullMatrix} is useful for identifying confounding patterns in a design. \section{Combinations} \subsectionF{subsets} The function \code{subsets}, by Bill Venables, generates all the different subsets of size $r$ chosen from $n$ different elements $v$. It is used by the \code{permtest} and \code{ffFullMatrix} functions. For example, <>= subsets(n=5,r=3,v=c("x","y","z","A","B")) @ \noindent Function \code{subsets} is a nice example of a recursive function and worth to understand. See also function \code{combinations} of the \pkg{gtools} package. \section{The \pkg{BsMD} package} The package \pkg{BsMD} for Bayesian screening and follow-up designs should be thought as part of \pkg{BHH2}. It includes functions for the generation of Daniel and Lenth's plots, presented in chapters 5 and 6 of BHH2. It also includes functions for the Bayesian analysis of factorial designs and the problem of adding extra runs to resolve unclear factor activities. These topics are presented in chapter 7, \emph{Additional Fractional Factorials and Analyses}. The main functions in \pkg{BsMD} are: \subsectionF{DanielPlot} Displays the normal plot of effects from a two level factorial experiment. \subsectionF{LenthPlot} Plots the factor effects with significance levels based on robust estimation of contrast standard errors. \subsectionF{BsProb} The marginal factor posterior probabilities and model posterior probabilities from designed screening experiments are calculated according to Box and Meyer's Bayesian procedure. Methods functions \code{plot}, \code{print} and \code{summary} are available for function \code{BsProb}. \subsectionF{MD} Best follow-up experiments based on the MD criterion are suggested to discriminate between competing models. Methods functions \code{print} and \code{summary} are available for function \code{MD}. \bigskip Please refer to the help pages of \pkg{BsMD} for a complete list of the functions and data sets included. The package also includes the document \emph{Using BsMD for Bayesian Screening and Model Discrimination} with various examples on the use of the functions. \section{Summary} Most of the computations in \citet*{BHH2-2005} can be done in \R\ with only a small number of function calls. For some of the more elaborated procedures, we have included in the \pkg{BHH2} package the plotting functions: \code{anovaPlot}, \code{dotPlot}, and \code{lambdaPlot}; the function \code{permtest} for randomization tests and \code{ddDesMatrix} for the construction of 2-levels factorial designs. The calculations involved in Bayesian screening and follow-up designs, presented in chapter 7, are more computing intensive and the dedicated package \pkg{BsMD} is available. For details of the functions in \pkg{BHH2} please see the help pages of the package. \bibliographystyle{plainnat} %\addcontentsline{toc}{section}{References} \bibliography{BHH2} \end{document}