The Ordinary Least Squares (OLS) estimator is \(\hat{\beta} = (X^tX)^{-1}(X^tY)\) for a design matrix \(X\) and an outcome vector \(Y\) (Hansen 2022).
The following code shows how to compute the OLS estimator using Armadillo and sending data from R to C++ and viceversa using cpp11
and cpp11armadillo
(Sanderson and Curtin 2016):
#include <cpp11.hpp>
#include <cpp11armadillo.hpp>
using namespace arma;
using namespace cpp11;
cpp11::register]] doubles_matrix<> ols_mat_(const doubles_matrix<>& x) {
[[double> Y = as_Mat(x); // convert from R to C++
Mat<double> Yinv = inv(Y); // Y^(-1)
Mat<return as_doubles_matrix(Yinv); // convert from C++ to R
}
The previous code includes the cpp11
and cpp11armadillo
libraries (cpp11armadillo calls Armadillo) to allow interfacing C++ with R. It also loads the corresponding namespaces in order to simplify the notation (i.e., using Mat
instead of arma::Mat
), and the function as_Mat()
and as_doubles_mat()
are provided by cpp11armadillo
to pass a matrix
object from R to C++ and that Armadillo can read and then pass it back to R.
The use of const
and &
are specific to the C++ language and allow to pass data from R to C++ without copying the data, and therefore saving time and memory.
cpp11armadillo
provides flexibility and in the case of the resulting vector of OLS coefficients, it can be returned as a matrix or a vector. The following code shows how to create three functions to compute the OLS estimator and return the result as a matrix or a vector avoiding repeated code:
double> ols_(const doubles_matrix<>& y, const doubles_matrix<>& x) {
Mat<double> Y = as_Mat(y); // Col<double> Y = as_Col(y); also works
Mat<double> X = as_Mat(x);
Mat<
double> XtX = X.t() * X; // X'X
Mat<double> XtX_inv = inv(XtX); // (X'X)^(-1)
Mat<double> beta = XtX_inv * X.t() * Y; // (X'X)^(-1)(X'Y)
Mat<
return beta;
}
cpp11::register]] doubles_matrix<> ols_mat_(const doubles_matrix<>& y,
[[const doubles_matrix<>& x) {
double> beta = ols_(y, x);
Mat<return as_doubles_matrix(beta);
}
cpp11::register]] doubles ols_dbl_(const doubles_matrix<>& y,
[[const doubles_matrix<>& x) {
double> beta = ols_(y, x);
Mat<return as_doubles(beta);
}
In the previous code, the ols_mat_()
function receives inputs from R and calls ols_()
to do the computation on C++ side, and ols_dbl_()
does the same but it returns a vector instead of a matrix.
A proper benchmark is to compute eigenvalues for large matrices. Both cpp11armadillo
and RcppArmadillo
use Armadillo as a backend, and the marginal observed differences are because of how cpp11 and Rcpp pass data from R to C++ and viceversa. The computation times are identical.
Input | Median time cpp11armadillo | Median time RcppArmadillo |
---|---|---|
500x500 | 35.07ms | 36.4ms |
1000x1000 | 260.28ms | 263.21ms |
1500x1500 | 874.62ms | 857.31ms |
2000x2000 | 2.21s | 2.21s |
Input | Memory allocation cpp11armadillo | Memory allocation RcppArmadillo |
---|---|---|
500x500 | 17.1KB | 4.62MB |
1000x1000 | 21KB | 4.62MB |
1500x1500 | 24.9KB | 4.63MB |
2000x2000 | 28.8KB | 4.63MB |
The cpp11armadillo
computation was obtained with the following function:
cpp11::register]] doubles_matrix<> eigen_sym_mat(const doubles_matrix<>& x) {
[[double> X = as_Mat(x);
Mat<double> y = eig_sym(X);
Mat<return as_doubles_matrix(y);
}
The RcppArmadillo
computation was obtained with the following function:
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
using namespace Rcpp;
// [[Rcpp::export]]
const arma::mat& x) {
arma::mat eigen_sym_mat(
arma::mat y = eig_sym(x);return y;
}
In order to get the RcppArmadillo
function to work, we had to dedicate time to search online about the error function 'enterRNGScope' not provided by package 'Rcpp'
, which required to include // [[Rcpp::depends(RcppArmadillo)]]
for the function to work.
The package repository includes the directory cpp11armadillotest
, which contains an R package that uses Armadillo, and that provides additional examples for eigenvalues, Cholesky and QR decomposition, and linear models.
Hansen, Bruce. 2022. Econometrics. Princeton University Press.
Sanderson, Conrad, and Ryan Curtin. 2016. “Armadillo: A Template-Based C++ Library for Linear Algebra.” Journal of Open Source Software 1 (2): 26. https://doi.org/10.21105/joss.00026.