A paper that describes the variable importance measures in more detail should be available soon.
L0-penalization based modified variable importance is defined in the following way
\[mVI(i|X, y, \lambda) = \min_{\beta:\beta_i \neq 0} Q(\beta|X, y, \lambda) - \min_{\beta:\beta_i = 0} Q(\beta|X, y, \lambda) + \lambda |S_i|\] where \(Q(\beta|X, y, \lambda) = -2l(\beta|X, y) + \lambda ||\beta||_0\), \(||\beta||_0\) is the number of nonzero elements in \(\beta\), and \(||S_i||\) is the number of beta parameters associated with the ith set of variables. The number of parameters in the ith set of variables is 1 for continuous variables and is the number of levels minus 1 for categorical variables. \(\lambda\) is defined by the chosen metric where AIC results in \(\lambda = 2\), BIC results in \(\lambda = \log{(n)}\), and HQIC results in \(\lambda = 2\log{(\log{(n)})}\).
These variable importance values are equivalent to the traditional
likelihood ratio test for beta parameters when \(\lambda = 0\). However, when \(\lambda > 0\), the null distribution of
the variable importance values may not be chi-squared distributed.
P-values for the variable importance values may be obtained from the
VariableImportance.boot()
function which uses a parametric
bootstrap approach to approximate the null distribution. This process
entails performing best subset selection many times over, so it is quite
slow.
L0-penalization based variable importance values may be calculated
with the VariableImportance()
function. The
VariableImportance()
function requires an object returned
from calling the VariableSelection()
function. The exact
variable importance values are returned if a branch and bound algorithm
is used with the VariableSelection()
function. If a
heuristic method is used with the VariableSelection()
function, then approximate variable importance values based on the
specified heuristic method are returned.
# Loading BranchGLM package
library(BranchGLM)
# Using iris dataset to demonstrate usage of VI
Data <- iris
Fit <- BranchGLM(Sepal.Length ~ ., data = Data, family = "gaussian", link = "identity")
# Doing branch and bound selection
VS <- VariableSelection(Fit, type = "branch and bound", metric = "BIC",
showprogress = FALSE)
# Getting variable importance
VI <- VariableImportance(VS, showprogress = FALSE)
VI
#> Best Subset Selection VI(BIC)
#> ------------------------------
#> VI mVI df
#> (Intercept) NA NA NA
#> Sepal.Width 21.6472 26.66 1
#> Petal.Length 99.3510 104.36 1
#> Petal.Width -0.0572 4.95 1
#> Species 0.0572 10.08 2
We can visualize the variable importance values with the
barplot()
function.
We can get approximate p-values based on the L0-penalization based
variable importance values from the
VariableImportance.boot()
function. This function uses a
parametric bootstrap approach to create an approximate null distribution
for the variable importance values. This approach is very slow, so it is
not feasible to get these p-values when there are many sets of
variables.
# Getting approximate null distributions
set.seed(59903)
myBoot <- VariableImportance.boot(VI, nboot = 1000, showprogress = FALSE)
myBoot
#> Null Distributions of BSS mVI(BIC)
#> -----------------------------------
#> mVI p.values
#> (Intercept) NA NA
#> Sepal.Width 26.66 0.000
#> Petal.Length 104.36 0.000
#> Petal.Width 4.95 0.028
#> Species 10.08 0.004
We can visualize the results from
VariableImportance.boot()
with the hist()
function or the boxplot()
function. The
boxplot()
approach is convenient because we can look at all
of the results in one plot while the hist()
approach only
contains the results for one set of variables in each plot.