--- title: "Measuring functional dependency model-free" author: "Joe Song" date: "Updated September 5, 2019; October 24, 2018; Created April 21, 2016" output: rmarkdown::html_vignette #pdf_document: #latex_engine: xelatex vignette: > %\VignetteIndexEntry{Measuring functional dependency model-free} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Given an input contingency table, `fun.chisq.test()` offers three quantities to evaluate non-parametric functional dependency of the column variable $Y$ on the row variable $X$. They include the functional chi-squared test statistic ($\chi^2_f$), statistical significance ($p$-value), and effect size (function index $\xi_f$). We explain their differences in analogy to those statistics returned from `cor.test()`, the R function for the test of correlation, and the $t$-test. We chose both tests because they are widely used and well understood. Another choice could be the Pearson's chi-squared test plus a statistic called Cramer's V, analogous to correlation coefficient, but not as popularly used. The table below summarizes the differences among the quantities and their analogous counterparts in correlation and $t$ tests. Table: **Comparison of the three quantities returned from `fun.chisq.test()`.** ---------------------------------------------------------------------------------- Measure Affected Affected Measure Counterpart in Counterpart in functional by sample by table statistical correlation two-sample Quantity dependency? size? size? significance? test $t$-test ------------ ------------- ---------- ----------- --------------- --------------- --------------- $\chi^2_f$ Yes Yes Yes No $t$-statistic $t$-statistic $p$-value Yes Yes Yes Yes $p$-value $p$-value $\xi_f$ Yes No No No correlation coefficient mean difference ---------------------------------------------------------------------------------- The test statistic $\chi^2_f$ measures deviation of $Y$ from a uniform distribution contributed by $X$. It is maximized when there is a functional relationship from $X$ to $Y$. This statistic is also affected by sample size and the size of the contingency table. It summarizes the strength of both functional dependency and support from the sample. A strong function supported by few samples may have equal $\chi^2_f$ to a weak function supported by many samples. It is analogous to the test statistic (not to be confused with correlation coefficient) in `cor.test()`, or the $t$ statistic from the $t$-test. The $p$-value of $\chi^2_f$ overcomes the table size factor and making tables of different sizes or sample sizes comparable. However, its null distribution (chi-squared or normalized) is only asymptotically true. It is analogous to the role of the $p$-value of `cor.test()`. The function index $\xi_f$ measures *only* the strength of functional dependency normalized by sample and table sizes without considering statistical significance. When the sample size is small, the index can be unreliable; when the sample size is large, it is a direct measure of functional dependency and is comparable across tables. It is analogous to the role of correlation coefficient in `cor.test()`, or fold change in $t$-test for differential gene expression analysis.