In this example, we will show how to use lslx
to conduct
semi-confirmatory structural equation modeling. The example uses data
PoliticalDemocracy
in the package lavaan
.
Hence, lavaan
must be installed.
In the following specification, x1
- x3
and
y1
- y8
is assumed to be measurements of 3
latent factors: ind60
, dem60
, and
dem65
.
<- "fix(1) * x1 + x2 + x3 <=: ind60
model_sem fix(1) * y1 + y2 + y3 + y4 <=: dem60
fix(1) * y5 + y6 + y7 + y8 <=: dem65
dem60 <= ind60
dem65 <= ind60 + dem60"
The operator <=:
means that the RHS latent factors is
defined by the LHS observed variables. In particular, the loadings are
freely estimated. In this model, ind60
is measured by
x1
- x3
, dem60
is mainly measured
by y1
- y4
, and dem65
is mainly
measured by y5
- y8
. The operator
<=
means that the regression coefficients from the RHS
variables to the LHS variables are freely estimated. In this model,
dem60
is influenced by ind60
and
dem65
is influenced by dem60
and
ind60
. Details of model syntax can be found in the section
of Model Syntax via ?lslx
.
lslx
is written as an R6
class. Everytime
we conduct analysis with lslx
, an lslx
object
must be initialized. The following code initializes an lslx
object named lslx_sem
.
library(lslx)
<- lslx$new(model = model_sem,
lslx_sem sample_cov = cov(lavaan::PoliticalDemocracy),
sample_size = nrow(lavaan::PoliticalDemocracy))
An 'lslx' R6 class is initialized via 'sample_cov' argument.
Response Variables: x1 x2 x3 y1 y2 y3 y4 y5 y6 y7 y8
Latent Factors: ind60 dem60 dem65
Here, lslx
is the object generator for lslx
object and $new()
is the build-in method of
lslx
to generate a new lslx
object. The
initialization of lslx
requires users to specify a model
for model specification (argument model
) and a sample
moments to be fitted (argument sample_cov
and
sample_size
). The sample moment must contain all the
observed variables specified in the given model.
After an lslx
object is initialized, model can be
respecified by $free_coefficient()
,
$fix_coefficient()
, and
$penalize_coefficient()
methods. The following code sets
y1<->y5
, y2<->y4
,
y2<->y6
, y3<->y7
,
y4<->y8
, and y6<->y8
as penalized
parameters.
$penalize_coefficient(name = c("y1<->y5",
lslx_sem"y2<->y4",
"y2<->y6",
"y3<->y7",
"y4<->y8",
"y6<->y8"))
The relation y5<->y1 under g is set as PENALIZED with starting value = 0.
The relation y4<->y2 under g is set as PENALIZED with starting value = 0.
The relation y6<->y2 under g is set as PENALIZED with starting value = 0.
The relation y7<->y3 under g is set as PENALIZED with starting value = 0.
The relation y8<->y4 under g is set as PENALIZED with starting value = 0.
The relation y8<->y6 under g is set as PENALIZED with starting value = 0.
To see more methods for respecifying model, please check the section
of Set-Related Method via ?lslx
.
After an lslx
object is initialized, method
$fit_lasso()
can be used to fit the specified model into
the given data with LASSO penalty function.
$fit_lasso() lslx_sem
CONGRATS: Algorithm converges under EVERY specified penalty level.
Specified Tolerance for Convergence: 0.001
Specified Maximal Number of Iterations: 100
The $fit_lasso()
requires users to specify the
considered penalty levels (argument lambda_grid
). In this
example, the lambda grid is automatically initialized by default. Note
that MCP with delta = Inf
is equivalent to the LASSO
penalty. All the fitting result will be stored in the
fitting
field of lslx_sem
.
Unlike traditional SEM analysis, lslx
fit the model into
data under all the penalty levels considered. To summarize the fitting
result, a selector to determine an optimal penalty level must be
specified. Available selectors can be found in the section of Penalty
Level Selection via ?lslx
. The following code summarize the
fitting result under the penalty level selected by adjusted Bayesian
information criterion (ABIC).
$summarize(selector = "abic") lslx_sem
General Information
number of observations 75
number of complete observations 75
number of missing patterns none
number of groups 1
number of responses 11
number of factors 3
number of free coefficients 36
number of penalized coefficients 6
Numerical Conditions
selected lambda 0.058
selected delta none
selected step none
objective value 0.806
objective gradient absolute maximum 0.001
objective Hessian convexity 0.027
number of iterations 15.000
loss value 0.593
number of non-zero coefficients 42.000
degrees of freedom 35.000
robust degrees of freedom NaN
scaling factor NaN
Fit Indices
root mean square error of approximation (rmsea) 0.060
comparative fit index (cfi) 0.986
non-normed fit index (nnfi) 0.978
standardized root mean of residual (srmr) 0.049
Likelihood Ratio Test
statistic df p-value
unadjusted 44.468 35.000 0.131
mean-adjusted - - -
Root Mean Square Error of Approximation Test
estimate lower upper
unadjusted 0.060 0.000 0.116
mean-adjusted NaN NaN NaN
Coefficient Test (Std.Error = "observed_information")
Factor Loading
type estimate std.error z-value P(>|z|) lower upper
x1<-ind60 fixed 1.000 - - - - -
x2<-ind60 free 2.181 0.139 15.679 0.000 1.908 2.454
x3<-ind60 free 1.819 0.152 11.945 0.000 1.520 2.117
y1<-dem60 fixed 1.000 - - - - -
y2<-dem60 free 1.318 0.181 7.278 0.000 0.963 1.673
y3<-dem60 free 1.062 0.148 7.155 0.000 0.771 1.353
y4<-dem60 free 1.302 0.149 8.763 0.000 1.011 1.593
y5<-dem65 fixed 1.000 - - - - -
y6<-dem65 free 1.223 0.169 7.250 0.000 0.892 1.553
y7<-dem65 free 1.293 0.159 8.123 0.000 0.981 1.605
y8<-dem65 free 1.301 0.162 8.023 0.000 0.983 1.619
Regression
type estimate std.error z-value P(>|z|) lower upper
dem60<-ind60 free 1.463 0.392 3.737 0.000 0.696 2.230
dem65<-ind60 free 0.537 0.225 2.385 0.017 0.096 0.978
dem65<-dem60 free 0.845 0.099 8.508 0.000 0.650 1.040
Covariance
type estimate std.error z-value P(>|z|) lower upper
y5<->y1 pen 0.534 0.331 1.614 0.107 -0.114 1.183
y4<->y2 pen 0.576 0.604 0.952 0.341 -0.609 1.760
y6<->y2 pen 1.311 0.607 2.160 0.031 0.122 2.501
y7<->y3 pen 0.300 0.590 0.508 0.611 -0.857 1.457
y8<->y4 pen 0.077 0.423 0.182 0.856 -0.753 0.906
y8<->y6 pen 0.896 0.489 1.832 0.067 -0.063 1.855
Variance
type estimate std.error z-value P(>|z|) lower upper
ind60<->ind60 free 0.454 0.088 5.169 0.000 0.282 0.627
dem60<->dem60 free 3.931 0.929 4.232 0.000 2.110 5.752
dem65<->dem65 free 0.158 0.213 0.743 0.458 -0.259 0.575
x1<->x1 free 0.083 0.020 4.137 0.000 0.044 0.122
x2<->x2 free 0.121 0.071 1.702 0.089 -0.018 0.260
x3<->x3 free 0.473 0.090 5.232 0.000 0.296 0.651
y1<->y1 free 1.947 0.425 4.582 0.000 1.114 2.779
y2<->y2 free 6.337 1.085 5.842 0.000 4.211 8.463
y3<->y3 free 5.134 0.933 5.500 0.000 3.305 6.964
y4<->y4 free 2.805 0.649 4.324 0.000 1.534 4.077
y5<->y5 free 2.373 0.454 5.231 0.000 1.484 3.263
y6<->y6 free 4.327 0.727 5.950 0.000 2.902 5.752
y7<->y7 free 3.393 0.681 4.980 0.000 2.058 4.729
y8<->y8 free 2.925 0.603 4.853 0.000 1.744 4.107
Intercept
type estimate std.error z-value P(>|z|) lower upper
x1<-1 free 0.000 0.085 0.000 1.000 -0.166 0.166
x2<-1 free 0.000 0.174 0.000 1.000 -0.342 0.342
x3<-1 free 0.000 0.162 0.000 1.000 -0.318 0.318
y1<-1 free 0.000 0.302 0.000 1.000 -0.592 0.592
y2<-1 free 0.000 0.445 0.000 1.000 -0.872 0.872
y3<-1 free 0.000 0.377 0.000 1.000 -0.739 0.739
y4<-1 free 0.000 0.385 0.000 1.000 -0.755 0.755
y5<-1 free 0.000 0.300 0.000 1.000 -0.589 0.589
y6<-1 free 0.000 0.381 0.000 1.000 -0.747 0.747
y7<-1 free 0.000 0.379 0.000 1.000 -0.742 0.742
y8<-1 free 0.000 0.372 0.000 1.000 -0.729 0.729
In this example, we can see that the PL estimate under the selected
penalty level doesn’t contain any zero value, which indicates that all
of the covariance of measurements are relevant. The
$summarize()
method also shows the result of significance
tests for the coefficients. In lslx
, the default standard
errors are calculated based on sandwich formula whenever raw data is
available. In this example, because raw data is not used for
lslx
object initialization, standard error is calculated by
using observed Fisher information matrix. It may not be valid when the
model is misspecified and the data are not normal. Also, it is generally
invalid after choosing a penalty level.
In lslx
, many quantities related to SEM can be extracted
by extract-related method. For example, the coefficient estimate and its
asymptotic variance can be obtained by
$extract_coefficient(selector = "abic", type = "effective") lslx_sem
x1<-1/g x2<-1/g x3<-1/g y1<-1/g y2<-1/g y3<-1/g
0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
y4<-1/g y5<-1/g y6<-1/g y7<-1/g y8<-1/g dem60<-ind60/g
0.0000 0.0000 0.0000 0.0000 0.0000 1.4631
dem65<-ind60/g dem65<-dem60/g x2<-ind60/g x3<-ind60/g y2<-dem60/g y3<-dem60/g
0.5368 0.8452 2.1809 1.8186 1.3182 1.0620
y4<-dem60/g y6<-dem65/g y7<-dem65/g y8<-dem65/g ind60<->ind60/g dem60<->dem60/g
1.3018 1.2229 1.2933 1.3011 0.4544 3.9313
dem65<->dem65/g x1<->x1/g x2<->x2/g x3<->x3/g y1<->y1/g y5<->y1/g
0.1579 0.0828 0.1209 0.4732 1.9467 0.5342
y2<->y2/g y4<->y2/g y6<->y2/g y3<->y3/g y7<->y3/g y4<->y4/g
6.3369 0.5756 1.3112 5.1343 0.2998 2.8054
y8<->y4/g y5<->y5/g y6<->y6/g y8<->y6/g y7<->y7/g y8<->y8/g
0.0769 2.3734 4.3267 0.8960 3.3933 2.9252
diag(lslx_sem$extract_coefficient_acov(selector = "abic", type = "effective"))
x1<-1/g x2<-1/g x3<-1/g y1<-1/g y2<-1/g y3<-1/g
0.00716 0.03043 0.02635 0.09134 0.19811 0.14220
y4<-1/g y5<-1/g y6<-1/g y7<-1/g y8<-1/g dem60<-ind60/g
0.14822 0.09025 0.14533 0.14327 0.13822 0.15327
dem65<-ind60/g dem65<-dem60/g x2<-ind60/g x3<-ind60/g y2<-dem60/g y3<-dem60/g
0.05066 0.00987 0.01935 0.02318 0.03280 0.02203
y4<-dem60/g y6<-dem65/g y7<-dem65/g y8<-dem65/g ind60<->ind60/g dem60<->dem60/g
0.02207 0.02845 0.02535 0.02630 0.00773 0.86307
dem65<->dem65/g x1<->x1/g x2<->x2/g x3<->x3/g y1<->y1/g y5<->y1/g
0.04524 0.00040 0.00504 0.00818 0.18047 0.10952
y2<->y2/g y4<->y2/g y6<->y2/g y3<->y3/g y7<->y3/g y4<->y4/g
1.17651 0.36540 0.36838 0.87136 0.34834 0.42092
y8<->y4/g y5<->y5/g y6<->y6/g y8<->y6/g y7<->y7/g y8<->y8/g
0.17914 0.20582 0.52871 0.23928 0.46422 0.36338
Here, the type
argument is used to specify which types
of parameters are used to calculate related quantities.
type = "effective"
indicates that only freely estimated and
penalized non-zero parameters are used. By default,
type = "all"