--- title: "actfts" output: rmarkdown::html_vignette: self_contained: true toc: true vignette: > %\VignetteIndexEntry{actfts} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\VignetteTheme{journal} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r, include = FALSE} library(actfts) ``` # actfts: Autocorrelation Tools Featured for Time Series # Motivation The actfts package consolidates various R time series functions related to econometric and financial analysis, integrating these tools into a single package. Educators and researchers can understand time series analysis without the burden of fragmented code spread across multiple resources, creating coherent and efficient results reading. The actfts package simplifies and manages scattered code in an educational environment, allowing students to focus on data analysis and result interpretation. Moreover, gain a deeper and more practical understanding of the concepts without being overwhelmed by navigating multiple tools and syntaxes. # Demo Data The actfts package provides three time series datasets that are automatically updated from the FRED (Federal Reserve Economic Data) database in the United States. These datasets are designed to help you practice using the package's functions. Below is a brief overview of each: **Gross Domestic Product**, This measure quantifies the total monetary value of all goods and services produced within a country over a specific period, typically a quarter or a year. It provides a comprehensive overview of a nation’s economic activity, reflecting its size and economic health. Economists often use GDP to compare economic performance across countries or regions and assess the impact of economic policies. Below is a practical code example to retrieve GDP data. ```{r} GDP_data <- actfts::GDPEEUU head(GDP_data) ``` **Personal Consumption Expenditures** This represents the total value of goods and services consumed by households and nonprofit institutions serving households within an economy. As a critical component of GDP, PCE reveals consumer behavior and spending patterns. It covers expenditures on durable goods, non durable goods, and services, providing insights into consumer confidence and living standards. Below is a handy code that allows you to retrieve PCE data ```{r} PCEC_data <- actfts::PCECEEUU head(PCEC_data) ``` **Disposable Personal Income** refers to the amount of money households have available for spending and saving after deducting taxes and other mandatory charges. It is an indicator of consumer purchasing power and financial health. DPI significantly impacts consumer spending and saving behaviors, affecting overall economic growth. Analysts frequently study DPI to identify trends in personal savings rates and consumption patterns. Below is a practical code snippet for obtaining DPI data. ```{r} DPI_data <- actfts::DPIEEUU head(DPI_data) ``` # Previews Concepts: Before beginning with the examples, define the concepts that the `acfinter()` is the function used to calculate essentials. ### ACF (Autocorrelation Function) ACF measures the correlation between a time series and its lags at different intervals. It helps identify patterns of temporal dependence (Box & Jenkins, 1976). ### PACF (Partial Autocorrelation Function) PACF measures the correlation between a time series and its lags, removing the influence of the intermediate lags. It is essential to determine the appropriate order in an AR(p) model (Box & Jenkins, 1976). ### Ljung-Box Test The Ljung-Box Test statistically evaluates whether the autocorrelations of a time series differ from zero. Analysts commonly use it to check the independence of residuals in ARIMA models (Ljung & Box, 1978) ### Box-Pierce Test Similar to the Ljung-Box test, the Box-Pierce Test is a statistical test that evaluates the independence of residuals in a time series model. It is a simpler and less refined version compared to the Ljung-Box test (Box & Pierce, 1970) ### KPSS Test The KPSS Test (Kwiatkowski-Phillips-Schmidt-Shin) is a statistical test that checks the stationarity of a time series, mainly whether a series is stationary around its mean or a deterministic trend (Kwiatkowski et al., 1992) ### Box-Cox Transformation The Box-Cox Transformation is a data transformation technique that applies a power function to stabilize variance and make the time series more closely resemble a normal distribution (Box & Cos, 1964) ### Kolmogorov-Smirnov Test The Kolmogorov-Smirnov Test is a nonparametric test that compares a sample distribution with a reference probability distribution or compares two sample distributions to assess whether they differ significantly (Kolmogorov, 1933) (Smirnov, 1948) ### Shapiro-Wilk Test The Shapiro-Wilk Test is a statistical test that evaluates the normality of a data set. It is particularly effective for small sample sizes and assesses whether a sample comes from a normally distributed population (Shapiro & Wilk, 1965) # Applied Example ## Example one: basics testing In this first example, we will analyze the United States GDP time series. We will use the `acfinter()` function to see the ACF, PACF, Box_Pierce, Pv_Box, Ljung_Box, and Pv_Ljung. We can also see the normality analysis and stationarity analysis. Finally, a plot with ACF, PACF, and Pv LJUNG BOX. Therefore, we will use the following code to analyze the first ten lags and explain each result of the `acfinter()` function. ```{r} result <- acfinter(GDP_data, lag = 10) print(result) ``` ```{r echo=FALSE, fig.align='center', message=FALSE, warning=FALSE, , fig.dim=c(7.6, 4)} library(dplyr) library(plotly) ldata <- nrow(GDP_data) result <- acfinter(GDP_data, lag = 10) table <- result$`ACF-PACF Test` get_clim1 <- function(x, ci=0.95, ci.type="white"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$acf^2))) return(clim[-length(clim)]) } else { lineci1 <- rep(clim0, NROW(table$acf)) return(lineci1) } } get_clim2 <- function(x, ci=0.95, ci.type="white"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$pacf^2))) return(clim[-length(clim)]) } else { lineci2 <- rep(clim0, NROW(table$pacf)) return(lineci2) } } saveci1 <- get_clim1(table$acf) saveci2 <- get_clim2(table$pacf) lag <- 10 fig1 <- plot_ly( x = table$lag, y = table$acf, type = "bar", name = "acf", color = I("slategray"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig1 <- fig1 %>% add_trace( y = saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig1 <- fig1 %>% add_trace( y = -saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- plot_ly( x = table$lag, y = table$pacf, type = "bar", name = "pacf", color = I("dimgrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig2 <- fig2 %>% add_trace( y = saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- fig2 %>% add_trace( y = -saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) hline <- function(y = 0, color = "black") { list(type = "line", x0 = 0, x1 = 1, xref = "paper", y0 = y, y1 = y, line = list(color = color, dash = "dash") ) } fig3 <- plot_ly( x = table$lag, y = table$Pv_Ljung, type = "scatter", mode = "markers", name = "Pv Ljung Box", color = I("lightslategrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(shapes = hline(0.05), xaxis = list(range = c(0.5,lag+0.5))) fig <- subplot(fig1, fig2, fig3, nrows = 3, shareX = TRUE, margin = 0.07) %>% layout( xaxis = list( title = "lags", dtick = 1, tick0 = 1, tickmode = "linear" ) ) #htmltools::tagList(fig) fig ``` The outcome variable encompasses three primary tests: ACF-PACF, Stationarity, and Normality. The ACF-PACF component shows multiple lags significant correlations tested with Box-Pierce and Ljung-Box tests. Regarding stationarity tests, the function `acfinter()` provides ADF, KPSS-Level, KPSS-Trend, and PP results. While each test has nuances, their common aim is to determine if the series is stationary. Both the ADF and PP tests yield p-values of 0.99. Since these values exceed the significance level of 0.95, we do not reject the null hypothesis of stationarity, confirming the presence of a unit root and a trend component in the GDP time series. In contrast, the KPSS-Level and KPSS-Trend tests both return p-values of 0.01, indicating the rejection of the null hypothesis in both cases. These tests show a non-stationary time series at any level or trend. For the third component, the Shapiro-Wilk and Kolmogorov-Smirnov normality tests are used to conclude a non-normal distribution time series. These results lead us to reject the null hypothesis of normality. Finally, a Box-Cox transformation test determine if a transformation was needed to stabilize the time series variance. Based on the test results, the transformations are: * 1: No transformation. * 0: Natural logarithm. * 0.5: Square root. * 2: Square. * -1: Inverse. The results in both tests are near 0, so the tests show we should apply to natural logarithm transformation. ## Example two: use the confidence interval The output of the function `acfinter()` has been discussed. The following examples will explore how to utilize each argument of the function and provide important considerations for its proper application. The argument `ci.method` allows for the selection of constant confidence intervals ("white") or dynamic confidence intervals ("ma"). The `ci` argument specifies any desired value between 0 and 0.99. When utilizing these arguments, one would set the ci.method argument to "ma" and the ci argument to a confidence interval of 0.98. Thus, the following code can be used: ```{r} result <- acfinter(GDP_data, lag = 10, ci.method = "ma", ci = 0.98) print(result) ``` ```{r echo=FALSE, fig.align='center', message=FALSE, warning=FALSE, , fig.dim=c(7.6, 4)} library(dplyr) library(plotly) ldata <- nrow(GDP_data) result <- acfinter(GDP_data, lag = 10, ci.method = "ma", ci = 0.98) table <- result$`ACF-PACF Test` get_clim1 <- function(x, ci=0.95, ci.type="ma"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$acf^2))) return(clim[-length(clim)]) } else { lineci1 <- rep(clim0, NROW(table$acf)) return(lineci1) } } get_clim2 <- function(x, ci=0.95, ci.type="ma"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$pacf^2))) return(clim[-length(clim)]) } else { lineci2 <- rep(clim0, NROW(table$pacf)) return(lineci2) } } saveci1 <- get_clim1(table$acf) saveci2 <- get_clim2(table$pacf) lag <- 10 fig1 <- plot_ly( x = table$lag, y = table$acf, type = "bar", name = "acf", color = I("slategray"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig1 <- fig1 %>% add_trace( y = saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig1 <- fig1 %>% add_trace( y = -saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- plot_ly( x = table$lag, y = table$pacf, type = "bar", name = "pacf", color = I("dimgrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig2 <- fig2 %>% add_trace( y = saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- fig2 %>% add_trace( y = -saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) hline <- function(y = 0, color = "black") { list(type = "line", x0 = 0, x1 = 1, xref = "paper", y0 = y, y1 = y, line = list(color = color, dash = "dash") ) } fig3 <- plot_ly( x = table$lag, y = table$Pv_Ljung, type = "scatter", mode = "markers", name = "Pv Ljung Box", color = I("lightslategrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(shapes = hline(0.05), xaxis = list(range = c(0.5,lag+0.5))) fig <- subplot(fig1, fig2, fig3, nrows = 3, shareX = TRUE, margin = 0.07) %>% layout( xaxis = list( title = "lags", dtick = 1, tick0 = 1, tickmode = "linear" ) ) #htmltools::tagList(fig) fig ``` As illustrated, the function `acfinter()` with the arguments ci.method and ci adjusts the results according to the specified confidence interval. Additionally, it can be observed that the confidence intervals in the plot are not constant but display dynamic behavior. ## Example 3: Use the differences Continuing GDP analysis, the "delta" argument can be employed to examine the first three differences in the time series. This analysis will focus on the first difference, and the following code will be utilized. ```{r} result <- acfinter(GDP_data, lag = 10, delta = "diff1") print(result) ``` ```{r echo=FALSE, fig.align='center', message=FALSE, warning=FALSE, , fig.dim=c(7.6, 4)} library(dplyr) library(plotly) ldata <- nrow(GDP_data) result <- acfinter(GDP_data, lag = 10, delta = "diff1") table <- result$`ACF-PACF Test` get_clim1 <- function(x, ci=0.95, ci.type="white"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$acf^2))) return(clim[-length(clim)]) } else { lineci1 <- rep(clim0, NROW(table$acf)) return(lineci1) } } get_clim2 <- function(x, ci=0.95, ci.type="white"){ if (!ci.type %in% c("white", "ma")) stop('`ci.type` must be "white" or "ma"') clim0 <- qnorm((1 + ci)/2) / sqrt(ldata) if (ci.type == "ma") { clim <- clim0 * sqrt(cumsum(c(1, 2 * table$pacf^2))) return(clim[-length(clim)]) } else { lineci2 <- rep(clim0, NROW(table$pacf)) return(lineci2) } } saveci1 <- get_clim1(table$acf) saveci2 <- get_clim2(table$pacf) lag <- 10 fig1 <- plot_ly( x = table$lag, y = table$acf, type = "bar", name = "acf", color = I("slategray"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig1 <- fig1 %>% add_trace( y = saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig1 <- fig1 %>% add_trace( y = -saveci1, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- plot_ly( x = table$lag, y = table$pacf, type = "bar", name = "pacf", color = I("dimgrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(bargap = 0.7, xaxis = list(range = c(1,lag))) fig2 <- fig2 %>% add_trace( y = saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) fig2 <- fig2 %>% add_trace( y = -saveci2, type = 'scatter', mode = 'lines', showlegend = FALSE, cliponaxis = FALSE, line = list(width = 0.8, dash = "dash", color="black") ) hline <- function(y = 0, color = "black") { list(type = "line", x0 = 0, x1 = 1, xref = "paper", y0 = y, y1 = y, line = list(color = color, dash = "dash") ) } fig3 <- plot_ly( x = table$lag, y = table$Pv_Ljung, type = "scatter", mode = "markers", name = "Pv Ljung Box", color = I("lightslategrey"), cliponaxis = FALSE, showlegend = FALSE ) %>% layout(shapes = hline(0.05), xaxis = list(range = c(0.5,lag+0.5))) fig <- subplot(fig1, fig2, fig3, nrows = 3, shareX = TRUE, margin = 0.07) %>% layout( xaxis = list( title = "lags", dtick = 1, tick0 = 1, tickmode = "linear" ) ) #htmltools::tagList(fig) fig ``` By setting the argument "delta" to "diff1", the function `acfinter()` will display the first difference of the time series. Firstly, the ACF-PACF test indicates significant autocorrelation at the 10th lag. The Box-Pierce and Ljung-Box tests confirm this finding supported by the plot. The stationarity tests confirm the first difference in stationarity for the GDP time series; the null hypothesis is rejected. However, the KPSS test results, both at the level and trend, suggest that the first difference of the GDP time series is not stationary. Conversely, the normality test indicates that the first difference of the GDP time series is not normally distributed, as the null hypothesis is rejected in both tests. It should be noted that the Box-Cox test for variance stabilization applies only to level time series and not to their differences. ## Example 4: Interactive mode Finally, the function `acfinter()` lets users view the results interactively. The interactive argument should be used, and `acftable` should be specified. Additionally, dynamic visualization of the results is possible. Moreover, results from a time series analysis can be downloaded by setting the download argument to TRUE, generating an Excel file containing the numerical results and a PNG image with a resolution of 300 dpi, which includes the ACF, PACF, and Ljung-Box Pv graphs. The files will be saved in the user's Documents folder. Below is an example of the code to use: ```r result <- acfinter(GDP_data, lag = 10, interactive = "acftable", download = TRUE) ``` # Final considerations You can analyze time series in xts, ts, integer, and vector (numeric) formats. So, if you use a different format, the `acfinter()` function won't work. In this case, you must convert your data to any format we initially showed you. Finally, the packages that `acfinter()` uses for its operation are: xts (Ryan & Ulrich, 2020), tseries (Trapletti & Hornik, 2020), reactable (Glaz, 2023), openxlsx (Walker, 2023), plotly (Sievert, 2020), forecast (Hyndman & Khandakar, 2008) and stats (R Core Team, 2023). # References 1. Box, G. E. P., & Cox, D. R. (1964). An Analysis of Transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211-243. 2. Box, G. E. P., & Jenkins, G. M. (1976). Time Series Analysis: Forecasting and Control. Revised Edition, Holden Day, San Francisco. 3. Box, G. E. P., & Pierce, D. A. (1970). Distribution of Residual Autocorrelations in Autoregressive-Integrated Moving Average Time Series Models. Journal of the American Statistical Association, 65(332), 1509-1526. 4. Glaz, R. (2023). reactable: Interactive Data Tables Based on 'React Table' (R package version 0.4.4) [Software]. Retrieved from https://CRAN.R-project.org/package=reactable 5. Hyndman, R. J., & Khandakar, Y. (2008). forecast: Forecasting Functions for Time Series and Linear Models (R package version 8.15) [Software]. Retrieved from https://CRAN.R-project.org/package=forecast 6. Kolmogorov, A. N. (1933). Sulla Determinazione Empirica di una Legge di Distribuzione. Giornale dell'Istituto Italiano degli Attuari, 4, 83-91. 7. Kwiatkowski, D., Phillips, P. C. B., Schmidt, P., & Shin, Y. (1992). Testing the Null Hypothesis of Stationarity against the Alternative of a Unit Root: How Sure Are We that Economic Time Series Have a Unit Root? Journal of Econometrics, 54(1-3), 159-178. 8. Ljung, G. M., & Box, G. E. P. (1978). On a Measure of Lack of Fit in Time Series Models. Biometrika, 65(2), 297-303. 9. R Core Team. (2023). stats: A package for statistical calculations in R (R package version 4.3.1) [Software]. Retrieved from https://stat.ethz.ch/R-manual/R-devel/library/stats/html/00Index.html 10. Ryan, J. A., & Ulrich, J. M. (2020). xts: eXtensible Time Series (R package version 0.12-0) [Software]. Retrieved from https://CRAN.R-project.org/package=xts 11. Shapiro, S. S., & Wilk, M. B. (1965). An Analysis of Variance Test for Normality (Complete Samples). Biometrika, 52(3/4), 591-611. 12. Sievert, C. (2020). plotly: Create Interactive Web Graphics via 'plotly.js' (R package version 4.9.2.1) [Software]. Retrieved from https://CRAN.R-project.org/package=plotly 13. Smirnov, N. V. (1948). Table for Estimating the Goodness of Fit of Empirical Distributions. The Annals of Mathematical Statistics, 19(2), 279-281. 14. Trapletti, A., & Hornik, K. (2020). tseries: Time Series Analysis and Computational Finance (R package version 0.10-48) [Software]. Retrieved from https://CRAN.R-project.org/package=tseries 15. U.S. Bureau of Economic Analysis, Disposable Personal Income (DPI). Retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/DPI 16. U.S. Bureau of Economic Analysis, Gross Domestic Product (GDP). Retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/GDP 17. U.S. Bureau of Economic Analysis, Personal Consumption Expenditures (PCEC). Retrieved from FRED, Federal Reserve Bank of St. Louis; https://fred.stlouisfed.org/series/PCEC 18. Walker, A. (2023). openxlsx: Read, Write and Edit xlsx Files (R package version 4.2.5) [Software]. Retrieved from https://CRAN.R-project.org/package=openxlsx