--- title: "semhelpinghands" author: "Shu Fai Cheung" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{semhelpinghands} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` # Introduction At the time of writing, [`semhelpinghands`](https://sfcheung.github.io/semhelpinghands/) has these groups of functions: manipulating parameter estimates tables, comparing results from different methods, bootstrapping, and ... others. This article will only introduce very briefly the groups. Some of them will be introduced in details in forthcoming article dedicated to them. Let's load the package first, and also load `lavaan`. ```{r setup} library(semhelpinghands) library(lavaan) ``` # Manipulate Parameter Estimates Tables In using `lavaan`, I prefer reading the output of `parameterEstimates()`, which is compact and clear to me. I sometimes would like to organize the rows and columns in ways meaningful to a particular research questions. This can certainly be done using base R or `dplyr`. However, I am lazy and want to be able to do things using just one or two functions, with just one or two arguments. This is a sample dataset for illustration, `dvs_ivs`, with 3 predictors (`x1`, `x2`, and `x3`), 3 outcome variables (`y1` `y2`, and `y3`), and a group variable (`gp`). First a single sample model: ```{r} data(dvs_ivs) mod <- " y1 ~ x1 + x2 + x3 y2 ~ x1 + x3 y3 ~ y2 + x2 " fit <- sem(model = mod, data = dvs_ivs, fixed.x = FALSE) ``` The parameter estimates tables: ```{r} est <- parameterEstimates(fit) est ``` A two-sample model is also fitted to the dataset: ```{r} fit_gp <- sem(model = mod, data = dvs_ivs, group = "gp", fixed.x = FALSE) ``` This is the parameter estimates table: ```{r} est_gp <- parameterEstimates(fit_gp) est_gp ``` So, these are what `semhelpinghands` has for now: ## Add Significance Test Results: `add_sig()` Despite the controversy over null hypothesis significance testing, we still need to report them, for now. They are there, but I have to decode them mentally. Here comes `add_sig()`: ```{r} add_sig(est) ``` If bootstrapping was used, it also supports using the confidence intervals, which may yield results different from the *p*-values when bootstrapping confidence intervals are used (but same results in this example): ```{r} add_sig(est, use = c("pvalue", "ci")) ``` It also works on the standardized solution: ```{r} std <- standardizedSolution(fit) add_sig(std) ``` Note: But be careful about the interpretation of the *p*-values in the standardized solution, which are based on the delta method. See `vignette("standardizedSolution_boot_ci")`. See the help page of `add_sig()` for other options. ## Filter by Operators and Some Other Columns: `filter_by()` Its purpose is very simple, selecting rows by commonly used columns: operators (`op`), "dependent variables" (`lhs`), and "independent variables" (`rhs`). ```{r} filter_by(est, op = "~") ``` It also supports filtering by group using group labels: ```{r} filter_by(est_gp, op = "~", group = "gp1", fit = fit_gp) ``` See the help page of `filter_by()` for other options. ## Group By DVs or Group By IVs: `group_by_dvs()` and `group_by_ivs()` Sometimes I want the conventional iv-column-dv-row or dv-column-iv-row format. That's what `group_by_dvs()` and `group_by_ivs()` are for: ```{r} group_by_dvs(est) group_by_ivs(est) ``` They also supports extracting another column: ```{r} group_by_dvs(est, col_name = "pvalue") group_by_ivs(est, col_name = "pvalue") ``` See the help page of `group_by_dvs()` and `group_by_ivs()` for other options. ## Group By Groups: `group_by_groups()` In multiple sample models, one common task is comparing results across groups. I wrote `group_by_groups()` for this task, to compare results side-by-side: ```{r} group_by_groups(est_gp) ``` It also supports extracting several columns: ```{r} group_by_groups(est_gp, col_names = c("est", "pvalue")) ``` If the fit object is used, it can print group labels: ```{r} group_by_groups(fit_gp, col_names = c("est", "pvalue")) ``` See the help page of `group_by_groups()` for other options. ## Compare Models: `group_by_models()` This is inspired by the proposal Rönkkö proposed in a GitHub [issue](https://github.com/simsem/semTools/issues/24#issue-235172313) for `semTools`. I want something simple for a quick overview and so I wrote `group_by_models()`. Suppose this is the other model fitted: ```{r} mod2 <- " y1 ~ x1 + x2 + x3 y2 ~ x1 + x2 y3 ~ y2 + x1 " fit2 <- sem(model = mod2, data = dvs_ivs, fixed.x = FALSE) est2 <- parameterEstimates(fit2) est2 ``` These two models have no nested relationships. To compare the estimates, `group_by_models()` can be used: ```{r} group_by_models(list(Model1 = est, Model2 = est2)) ``` It can also compare several columns: ```{r} group_by_models(list(Model1 = est, Model2 = est2), col_names = c("est", "pvalue")) ``` See the help page of `group_by_models()` for other options. ## Sort Rows: `sort_by()` The function `sort_by()` can be used to sort rows using (a) common fields such as `"op"`, `"lhs"`, and `"rhs"`, (b) operators such as `"~"` and `"~~"`. It can be used on the output of some other functions that manipulate a parameter estimates table. ```{r} out <- group_by_groups(est_gp, col_names = c("est", "pvalue")) out <- filter_by(out, op = c("~", "~~")) sort_by(out, by = c("op", "rhs")) ``` Some functions, such as `group_by_models()`, will automatically call `sort_by()` to sort the results. The default order should be acceptable in most cases, but can also be customized. See the help page of `sort_by()` on customizing the order. ## Piping Though not officially supported, piping using `|>` can be used with most of the functions that manipulate parameter estimates tables. For example: ```{r} est_gp |> add_sig() |> group_by_groups(col_names = c("est", "pvalue", "sig"), group_first = FALSE) |> filter_by(op = c("~")) ``` # Compare Methods Though I believe the choice of the estimation method should be justified, suppose we want to assess the sensitivity of the parameter estimate results to the methods used, `compare_estimators()` can be used as a quick way to compare results from different methods. ```{r} out <- compare_estimators(fit, estimator = c("ML", "GLS", "MLR")) group_by_models(out, col_names = c("se", "pvalue")) ``` It simply refits the models for each estimator and returns the results. They can then be treated as different "models" and processed by `group_by_models()`. `se_ratios()` is a wrapper of `group_by_models()` used to compare the standard errors by different estimators in the output of `compare_estimators()`: ```{r} se_ratios(out, reference = "ML") ``` See the help page of `compare_estimators()` for other options. # Bootstrapping One issue with the standardized solution is the confidence intervals. They are based on the delta method even if `se = "boot"` is used. For indirect effects, for which bootstrap confidence intervals are commonly used, the confidence intervals for them in the standardized solution are not what usually reported other tools for mediation. There are some other powerful tools on the Internet and the [Google Group for lavaan], see [this thread](https://groups.google.com/g/lavaan/c/0RSsh4M6zQg/m/I85YFAiLAAAJ), to address this problem. I wrote `standardizedSolution_boot_ci()` not to replace them (it obviously can't), but to address a very specific case I usually encounter myself: Generating the bootstrap confidence intervals for standardized estimates based on the bootstrap estimates already generated by `se = "boot"`. Please see `vignette("standardizedSolution_boot_ci")` for an illustration on `standardizedSolution_boot_ci()`. Another issue is examine bootstrap estimates, such as the distribution of the estimates. The function `plot_boot()` and related functions can be used to compute bootstrap estimates and plot them. The bootstrap estimates of free parameters (stored by `lavaan`), user-defined parameters (computed by `store_boot_def()`), and the standardized solution (computed by `store_boot_est_std()`) can be plotted. Please see the [article](https://sfcheung.github.io/semhelpinghands/articles/plot_boot.html) on `plot_boot()` on how to use this function. # Others ## Showing the Options in a Model `lavaan::lavaan()` and its wrappers suc has `lavaan::sem()` and `lavaan::cfa()` allow users to set several options using an `estimator`: `ML`, `GLS`, `WLSMV`, and others. However, it is not easy to remember what the options set for an estimator. Instead of finding them from the output of `summary()`, this function shows some of them in one table for a quick overview. This is an example: ```{r} data(dvs_ivs) mod <- " y1 ~ x1 + x2 + x3 y2 ~ x1 + x3 y3 ~ y2 + x2 " fit_default <- sem(model = mod, data = dvs_ivs) show_more_options(fit_default) fit_MLR <- sem(model = mod, data = dvs_ivs, estimator = "MLR") show_more_options(fit_MLR) fit_MLR_fiml <- sem(model = mod, data = dvs_ivs, estimator = "MLR", missing = "fiml") show_more_options(fit_MLR_fiml) ``` ## Recoding Minimization History In structural equation modeling, closed-form solution is rare and optimization (minimization) is used to find the/a solution. Out of curiosity and teaching, I wrote a function to capture the minimization history so I can examine it or even plot the process. The function, `record_history()`, is still in early development but should work for now for common simple scenarios. Please refer the help page of `record_history()` to learn more about it. ## Add Covariances Between "Exogenous" Variables In `lavaan`, I rarely need to manually add covariances between exogenous variables (defined in a loose sense: variables appear on the right-hand side but not on the left-hand side of `~`). However, I came into a situation in which `lavaan` will not do this (for good reasons). For example, when a covariance between a residual term and an exogenous variable is set to free. I wrote two simple functions, `add_exo_cov()` and `auto_exo_cov()`, for this purpose. Please refer to their help pages for further information.