--- title: "Data Analyses with Ceiling/Floor data" author: "Qimin Liu" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Data Analyses with Ceiling/Floor data} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Summary Ceiling and floor effects are common in data. Ceiling or floor effects occur when the tests or scales are relatively easy or difficult such that substantial proportions of individuals obtain either maximum or minimum scores and that the true extent of their abilities cannot be determined. Ceiling and floor effects, subsequently, causes problems in data analysis. For example, ceiling or floor effects alone would induce, respectively, attenuation or inflation in mean estimates. And both ceiling and floor effects would result in attenuation in variance estimates. This imposes challenges in mean and variance based data analytic methods. This package implements methods to deal with challenges associated with ceiling/floor effects in the data using paramtric methods that assume normality for the true scores. The current version is capable of mean and variance recovery given data with ceiling/floor effects and of mean comparison tests such as t-test and ANOVA for data with ceiling/floor effects. # Helper functions The package contains a helper function `threeganova.sim` that would generate a three-group anova data with a standard normal control group and positive/negative treatment groups of effect with same magnitudes. In addition, one can specify the standard deviation in positive treatment group. To see the specifics of the function, user can enter `?threeganova.sim` in the R console. Another helper function included in the package is `induce.cfe` where the user can manually induce ceiling and floor effects to healthy data. To see the specifics of the function, user can enter `?induce.cfe` in the R console. Moreover, the function `F.star.test` allows user to conduct a Brown-Forsythe F star test. This is a variant of the commonly used F test. F star test is robust against violations of homogeneity of variance (HOV) assumption for the F test. # Functions for data analyses The current version of the package includes three functions that can facilitate the user to conduct data analyses for data with ceiling/floor effects.`rec.mean.var` estimates the true mean and variance of the data with ceiling/floor effects. That is, as mentioned in the summary, the observed mean and variance of data with ceiling/floor effects are often biased. Thus, `rec.mean.var` aims to help the user to recover the mean and variance of the data were ceiling/floor effects absent. `lw.t.test` conducts a t test that adjusts for ceiling/floor effects in the data. As `lw.t.test` also uses Welch's t test, the adjusted t test is robust against HOV violation. `lw.f.star` conducts a F star test for one-way ANOVA that adjusts for ceiling/floor effects in the data. `lw.f.star` is also robust against HOV violation. For both `lw.f.star` and `lw.t.test`: method `a` is a liberal appraoch that yields accurate effect size estimates but has mildly inflated type I error rates, `b` is a conservative approach with well-controlled type I error rates that have good, but less accurate than `a`, effect estimates. # Example 1: an Aging Example Imagine a scenario where we wish to test the difference in cognitive ability for people of different age groups. In this toy example, we have 1000 participants for three age groups, the younger-aged group has true mean and variance of respectively 30 and 25, the middle-aged group 20 and 25 and the older-aged group 10 and 100. The higher the score, the higher the cognitive ability. We can check the mean and variance of the true mean and variance on the data composed of true scores, `ca.true`. ```{r echo=FALSE} library(DACF) r.1=rnorm(1000,20,5) r.2=rnorm(1000,30,5) r.3=rnorm(1000,10,10) ca.true=matrix(c(r.1,r.2,r.3,rep(c(2,1,3),each=1000)),ncol=2) r.2.cf=induce.cfe(0,.3,r.2) r.3.cf=induce.cfe(.3,0,r.3) ca.cf=matrix(c(r.1,r.2.cf,r.3.cf,rep(c(2,1,3),each=1000)),ncol=2) colnames(ca.cf)=c('score','group') ``` ```{r} # group sample mean aggregate(ca.true[,1],mean,by=list(ca.true[,2])) # group sample variance aggregate(ca.true[,1],var,by=list(ca.true[,2])) ``` Now consider the fact that a substantial proportion of the younger-aged group may score maximum at the cognitive ability test and a substantial proportion of the older-aged group may score minimum. Let both the ceiling and the floor proportions be 15%, we have the dataset `ca.cf`. ```{r} # group sample mean aggregate(ca.cf[,1],mean,by=list(ca.cf[,2])) # grouple sample variance aggregate(ca.cf[,1],var,by=list(ca.cf[,2])) ``` We can see that both the mean and the variance estimates from the younger-aged and the older-aged groups are biased. The function `rec.mean.var` can help recover the mean and variance. In the example of the younger-aged group, we first select all the scores of the younger-aged group and name it as a new variable `young` and then use our function `rec.mean.var` to recover the mean and variance. We can do the same for the older-aged group. ```{r} # younger-aged group young=ca.cf[ca.cf[,2]==1,1] rec.mean.var(young) # true mean and variance are 30 and 25 # the estimated floor and ceiling percentages and the recovered mean and variance estimates are displayed above # older-aged group old=ca.cf[ca.cf[,2]==3,1] rec.mean.var(old) # true mean and variance are 10 and 100 # the estimated floor and ceiling percentages and the recovered mean and variance estimates are displayed above ``` Now we wish to conduct an ANOVA in the data with floor and ceiling effects. We can use the function `lw.f.star`. We can also conduct a t-test between the older-aged and the younger-aged group by using the function `lw.t.test`. Both methods `a` and `b` are used for the illustration purposes. ```{r} # ANOVA lw.f.star(data.frame(ca.cf),score~group,"a") lw.f.star(data.frame(ca.cf),score~group,"b") # t-test lw.t.test(young,old,"a") lw.t.test(young,old,"b") ``` Both the ANOVA and the t-tests returned significant results. # Example 2: Simulation and Testing The following example provides an overview of the helper functions in the package that can aid in simulations and further demonstrates data analytic functions in the package. ```{r} # Simulate healthy data for two groups x.1=rnorm(300,2,4) x.2=rnorm(300,3,5) # check mean and variance for simulated healthy data mean(x.1);var(x.1) mean(x.2);var(x.2) # induce ceiling effects of 20% in group 1 x.1.cf=induce.cfe(.2,0,x.1) # induce floor effects of 10% in group 2 x.2.cf=induce.cfe(0,.1,x.2) # recover the mean and variance for ceiling/floor data rec.mean.var(x.1.cf) rec.mean.var(x.2.cf) # conduct a t test on healthy data t.test(x.1,x.2) t.test(x.1.cf,x.2.cf) # conduct an adjusted t test on ceiling/floor data lw.t.test(x.1.cf,x.2.cf,"a") lw.t.test(x.1.cf,x.2.cf,"b") # generate a dataframe for ANOVA demo testdat=threeganova.sim(10000,.0625,1) # induce ceiling/floor effects in the data testdat.cf=testdat testdat.cf[testdat.cf$group==2,]$y=induce.cfe(.2,0,testdat.cf[testdat.cf$group==2,]$y) # conduct an adjusted F star test on ceiling/floor data lw.f.star(testdat.cf,y~group,"a") lw.f.star(testdat.cf,y~group,"b") ```