Improvements to lavaanPlot

Alex Lishinski

2024-01-29

I’ve been working on some improvements to the lavaanPlot package, in order to take advantage of updates to the diagrammeR package.

I’ll spare you all the details, but diagrammeR has introduced a way of building graph plots using node and edge defining dataframes, which enable a more extensible way of customizing plots. I am trying to bring the advantages of this flexibility to the lavaanPlot package to enable more customization. The old way that the package was set up is solid, but it’s difficult to add new features, and the goal of the new approach is to unlock the full customization options that the graphViz software and the DOT language have to offer.

I’ve tried to keep things from the old approach to the extent that I could, but there are some new elements of the user interface for this new version of the package. I’m not finished with everything I set out to accomplish yet, but I’m writing this vignette to introduce the new version of the package so people can give it a try and find issues that I can fix. I’m releasing this as version 0.7.0, with the goal of fixing issues and fully fleshing out the functionality that I’d like the package to have over a couple more iterations of the package to get ready for a fully matured 1.0.0 release, where hopefully then I can fully deprecate the old code.

Here are some examples with the new code, the function being called lavaanPlot2.

Starting with a basic model using mtcars which only contains observed variable relationships and no latent variable relationships.

library(lavaan)
library(lavaanPlot)

model <- 'mpg ~ cyl + disp + hp
          qsec ~ disp + hp + wt'

fit <- sem(model, data = mtcars)
summary(fit)
## lavaan 0.6.15 ended normally after 32 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                         9
## 
##   Number of observations                            32
## 
## Model Test User Model:
##                                                       
##   Test statistic                                18.266
##   Degrees of freedom                                 2
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Regressions:
##                    Estimate   Std.Err  z-value  P(>|z|)
##   mpg ~                                                
##     cyl               -0.987    0.738   -1.337    0.181
##     disp              -0.021    0.010   -2.178    0.029
##     hp                -0.017    0.014   -1.218    0.223
##   qsec ~                                               
##     disp              -0.008    0.004   -2.122    0.034
##     hp                -0.023    0.004   -5.229    0.000
##     wt                 1.695    0.398    4.256    0.000
## 
## Covariances:
##                    Estimate   Std.Err  z-value  P(>|z|)
##  .mpg ~~                                               
##    .qsec               0.447    0.511    0.874    0.382
## 
## Variances:
##                    Estimate   Std.Err  z-value  P(>|z|)
##    .mpg                8.194    2.049    4.000    0.000
##    .qsec               0.996    0.249    4.000    0.000
HS.model <- ' visual  =~ x1 + x2 + x3      
textual =~ x4 + x5 + x6
speed   =~ x7 + x8 + x9 
'

fit2 <- cfa(HS.model, data=HolzingerSwineford1939)
summary(fit2)
## lavaan 0.6.15 ended normally after 35 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        21
## 
##   Number of observations                           301
## 
## Model Test User Model:
##                                                       
##   Test statistic                                85.306
##   Degrees of freedom                                24
##   P-value (Chi-square)                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                             Standard
##   Information                                 Expected
##   Information saturated (h1) model          Structured
## 
## Latent Variables:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual =~                                           
##     x1                1.000                           
##     x2                0.554    0.100    5.554    0.000
##     x3                0.729    0.109    6.685    0.000
##   textual =~                                          
##     x4                1.000                           
##     x5                1.113    0.065   17.014    0.000
##     x6                0.926    0.055   16.703    0.000
##   speed =~                                            
##     x7                1.000                           
##     x8                1.180    0.165    7.152    0.000
##     x9                1.082    0.151    7.155    0.000
## 
## Covariances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##   visual ~~                                           
##     textual           0.408    0.074    5.552    0.000
##     speed             0.262    0.056    4.660    0.000
##   textual ~~                                          
##     speed             0.173    0.049    3.518    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .x1                0.549    0.114    4.833    0.000
##    .x2                1.134    0.102   11.146    0.000
##    .x3                0.844    0.091    9.317    0.000
##    .x4                0.371    0.048    7.779    0.000
##    .x5                0.446    0.058    7.642    0.000
##    .x6                0.356    0.043    8.277    0.000
##    .x7                0.799    0.081    9.823    0.000
##    .x8                0.488    0.074    6.573    0.000
##    .x9                0.566    0.071    8.003    0.000
##     visual            0.809    0.145    5.564    0.000
##     textual           0.979    0.112    8.737    0.000
##     speed             0.384    0.086    4.451    0.000
labels2 = c(visual = "Visual Ability", textual = "Textual Ability", speed = "Speed Ability")

You can still plot the model using no additional options:

lavaanPlot2(fit)

You can still add labels to the plot with the labels argument, although it now uses a named character vector instead of a list:

labels <- c(mpg = "Miles Per Gallon", cyl = "Cylinders", disp = "Displacement", hp = "Horsepower", qsec = "Speed", wt = "Weight")

lavaanPlot2(fit, labels = labels)

Graph options, node options, and edge options are supplied via named lists, as previously:

lavaanPlot2(fit, labels = labels, graph_options = list(label = "my first graph", rankdir = "LR"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))

A change to the interface is how one can indicate which model paths to include in the plot, using the include argument. The default option will include just regression and latent variable relationships, include = covs will include model covariances, whereas include = all will also include error variances.

lavaanPlot2(fit, include = "covs", labels = labels, graph_options = list(label = "Including covariates"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))
lavaanPlot2(fit, include = "all", labels = labels, graph_options = list(label = "including error variances"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"))

Coefficient labels can still be included on the edges, and selectively for the different parts of the plot, using the coef_lablels argument:

lavaanPlot2(fit, include = "covs", coef_labels = TRUE, labels = labels, graph_options = list(label = "including coefficient labels"), node_options = list(fontname = "Helvetica"), edge_options = list(color = "grey"))

And significance stars can be added to these coefficient labels using the stars argument, just as with lavaanPlot:

lavaanPlot2(fit, include = "covs", labels = labels, graph_options = list(label = "my first graph with significance stars"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("regress"), coef_labels = TRUE)
lavaanPlot2(fit2, include = "covs", labels = labels2, graph_options = list(label = "my first graph with signficance stars"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("latent"), coef_labels = TRUE)
lavaanPlot2(fit2, include = "covs", labels = labels2, graph_options = list(label = "my first graph, which is being used to illustrate how to use the new code in the lavaanPlot package"), node_options = list( fontname = "Helvetica"), edge_options = list(color = "grey"), stars = c("covs"), coef_labels = TRUE)

The next stage of development is to allow for subset formatting, where different formatting is applied to sets of nodes or edges. The most obvious cases for this are to allow different formatting for the sets of latent vs observed nodes, and the regression, latent, covariance, and error variance edges. Ideally though I want to enable users to be able to apply different formatting to arbitrary subsets of nodes or edges as they see fit.