title: "Confidence intervals with proportions"

Probably the most useful tools for data analysis is a plot with suitable error bars [@cgh21]. In this vignette, we show how to make confidence intervals for proportions.

## Theory behind Confidence intervals for proportions

For proportions, ANOPA is based on the Anscombe transform \insertCite{a48}{ANOPA}. This measure has a known theoretical standard error which depends only on sampe size $n$:

$$SE_{A}(n) = 1/\sqrt{4(n+1/2)}.$$

Consequently, when the groups' sizes are similar, homogeneity of variances holds. From this, we can decomposed the total test statistic $F$ into a component for each cell of the design. We thus get

$$\left[ A + z_{0.5-\gamma/2} \times SE_{A}(n), \; A + z_{0.5+\gamma/2} \times SE_{A}(n) \right]$$

in which $SE_{A}(n)$ is the theoretical standard error based only on $n$, and $\gamma$ is the desired confidence level (often .95).

This technique returns _stand-alone_ confidence intervals, that is, intervals which can be used to compare the proportion to a fixed point. However, such _stand-alone_ intervals cannot be used to compare one proportion to another proportion [@cgh21]. To compare an observed proportion to another observed proportion, it is necessary to adjust them for pair-wise differences [@b12]. This is achieved by increasing the wide of the intervals by $\sqrt{2}$.

Also, in repeated measure designs, the correlation is beneficial to improve estimates. As such, the interval wide can be reduced when correlation is positive by multiplying its length by $\sqrt{1-\alpha_1}$, where $\alpha_1$ is a measure of correlation in a matrix containing repeated measures (based on the unitary alpha measure).

Finally, the above returns confidence intervals for the *transformed* scores. However, when used in a plot, it is typically more convenient to plot proportions (from 0 to 1) rather than Anscombe-scores (from 0 to $\pi/2 \approx$ 1.57). Thus, it is possible to rescale the vertical axis using the inverse Anscombe transform and be shown proportions.

This is it.

## Complicated?

Well, not really:

```{r, message=FALSE, warning=FALSE, fig.width=5, fig.height=3, fig.cap="**Figure 1**. The proportions as a function of class and Difficulty. Error bars show difference-adjusted 95% confidence intervals."}
library(ANOPA)
w <- anopa( {success;total} ~ Class * Difficulty, twoWayExample)
anopaPlot(w)
```

Because the analyses ``summary(w)`` suggests that only the factor `Difficulty` has a significant effect, you may select only that factors for plotting, with e.g.,

```{r, message=FALSE, warning=FALSE, fig.width=4, fig.height=3, fig.cap="**Figure 2**. The proportions as a function of Difficulty only. Error bars show difference-adjusted 95% confidence intervals."}
anopaPlot(w, ~ Difficulty )
```

As is the case with any ``ggplot2`` figure, you can customize it at will. For example,

```{r, message=FALSE, warning=FALSE, fig.width=4, fig.height=3, fig.cap="**Figure 3**. Same as Figure 2 with some visual improvements."}
library(ggplot2)
anopaPlot(w, ~ Difficulty) +
  theme_bw() +                                                  # change theme
  scale_x_discrete(limits = c("Easy", "Moderate", "Difficult")) #change order
```

As you can see from this plot, Difficulty is very significant, and the most different conditions are Easy vs. Difficult.

Here you go.

# References