--- title: "Introduction to RRRR" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to RRRR} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(RRRR) ``` ## Introduction The R package *RRRR* provides methods for estimating online Robust Reduced-Rank Regression. If you use any code from the `RRRR` package in a publication, please use the following citation: > Yangzhuoran Yang and Ziping Zhao (2020). RRRR: Online Robust Reduced-Rank Regression Estimation. R package version 1.1.0. https://pkg.yangzhuoranyang.com/RRRR/. This vignette aims to provide illustrations to estimate and update (Online) Reduced-Rank Regression using various methods contained in the package. ## Formulation The formulation of the reduced-rank regression is as follow: $$y = \mu +AB' x + D z+innov,$$ where for each realization * $y$ is a vector of dimension $P$ for the $P$ response variables, * $x$ is a vector of dimension $Q$ for the $Q$ explanatory variables that will be projected to reduce the rank, * $z$ is a vector of dimension $R$ for the $R$ explanatory variables that will not be projected, * $\mu$ is the constant vector of dimension $P$, * $innov$ is the innovation vector of dimension $P$, * $D$ is a coefficient matrix for $z$ with dimension $P*R$, * $A$ is the so called exposure matrix with dimension $P*r$, and * $B$ is the so called factor matrix with dimension $Q*r$. The matrix resulted from $AB'$ will be a reduced rank coefficient matrix with rank of $r$. The function estimates parameters $\mu$, $A$, $B$, $D$, and $Sigma$, the covariance matrix of the innovation's distribution. ## Simulation To simulate example data that can be used to estimate reduced-rank regression, use function `RRR_sim`. ```{r} data <- RRR_sim() data ``` A number of parameters can be specified. See `?RRR_sim`. The default arguments are set in such a way that the matrix resulted from $AB'$ will be a reduced rank coefficient matrix with rank of $r$. ```{r} str(data) ``` The returned list of `RRR_sim` contains the input specifications and the data points $y$, $x$ and $z$. ## Reduced-Rank Regression using Gaussian MLE : `RRR` The Gaussian Maximum Likelihood Estimation method is described in Johansen, S. ([1991](#ref)). This method is not robust in the sense that it assumes a Gaussian distribution for the innovations which does not take into account the heavy-tailedness of the true distribution and outliers. ```{r} res_gmle <- RRR(y=data$y, x=data$x, z = data$z) res_gmle ``` The matrix $z$ and the constant $\mu$ term are optional. ```{r} res_gmle <- RRR(y=data$y, x=data$x, z=data$z, mu = FALSE) res_gmle <- RRR(y=data$y, x=data$x, z=NULL, mu = TRUE) res_gmle <- RRR(y=data$y, x=data$x, z=NULL, mu = FALSE) ``` ## Robust Reduced-Rank Regression using Cauchy distribution and Majorisation-Minimisation: `RRRR` The Majorisation-Minimisation estimation method is partly described in Zhao, Z., & Palomar, D. P. ([2017](#ref)). This method is robust in the sense that it assumes a heavy-tailed Cauchy distribution for the innovations. As before the matrix $z$ and the constant term $\mu$ are optional. ```{r} res_mm <- RRRR(y=data$y, x=data$x, z = data$z, itr = 100, earlystop = 1e-4) res_mm ``` Additional arguments that are worth noticing are `itr`, which control the maximum number of iteration, and `earlystop`, which is the criteria to stop the algorithm early. The algorithm will stop if the improvement on objective value is small than `earlystop` $\times\ objective\_from\_last\_iteration$. This method is an iterative optimization algorithm so we can use the inbuilt `plot.RRRR` method to see the convergence plot of the algorithm. ```{r} plot(res_mm, aes_x = "iteration", xlog10 = TRUE) ``` Argument `aes_x` can set the x axis to be the number of iteration or the run time. Argument `xlog10` can indicate whether the scale of x axis is log 10 transformed. ## Online Robust Reduced-Rank Regression: `ORRRR` The description of the generic Stochastic Successive Upper-bound Minimisation method and the Sample Average Approximation can be found in Razaviyayn, M., Sanjabi, M., & Luo, Z. Q. ([2016](#ref)). There are two major estimation methods supported: * SMM: Stochastic Majorisation-Minimisation * SAA: Sample Average Approximation The algorithm is online in the sense that the data is continuously incorporated and the algorithm can update the parameters accordingly. As before the matrix $z$ and the constant term $\mu$ are optional. At each iteration of SAA, a new realisation of the parameters is achieved by solving the minimisation problem of the sample average of the desired objective function using the data currently incorporated. This can be computationally expensive when the objective function is highly nonconvex. The SMM method overcomes this difficulty by replacing the objective function by a well-chosen majorising surrogate function which can be much easier to optimise. ### SMM: Stochastic Majorisation-Minimisation By default the function `ORRRR` uses SMM. ```{r} res_smm <- ORRRR(y=data$y, x=data$x, z=data$z, initial_size = 100, addon = 10) res_smm ``` The simulated data set is of size 1000. In the above command, in the first iteration 100 realisations are used in the estimation with 10 more data points in each of the following iteration. Because of the increasing data size, the estimation will be slower the longer the algorithm run. Therefore, the estimated time left in the progress bar is not very accurate in this sense. The output from `ORRRR` can also plotted using `plot.RRRR`. ```{r} plot(res_smm) ``` ### SAA: Sample Average Approximation When using SAA, there are two sub solver supported in each iteration. * optim: the `optim` function from the `stats` package, and * MM: Majorisation-Minimisation method with ten subiterations by default. ```{r, eval = FALSE} res_saa_optim <- ORRRR(y=data$y, x=data$x, z=data$z, method = "SAA", SAAmethod = "optim") res_saa_mm <- ORRRR(y=data$y, x=data$x, z=data$z, method = "SAA", SAAmethod = "MM") ``` `optim` is a general purpose solver which means it will be quite slow for this specific problem, especially when the number of parameters is large. Embedding majorisation-minimisation into subiteration of SAA is more like a heuristic without solid theory backing up its efficiency. Due to the time constraint we do not show the estimated result here. ## Truly online: `update.RRRR` With the result from `ORRRR`, user can still update it with newly achieved data using function `update`. Note the result from `RRRR` can also be updated where it simply takes the result from `RRRR` as the starting point in online estimation. ```{r} newdata <- RRR_sim() res2_smm <- update(res_smm, newy=newdata$y, newx=newdata$x, newz=newdata$z) res2_smm ``` Without other arguments specified, `update` will just take the original specification in the model. If it applies to output of `RRRR`, then the default would be the default arguments in `ORRRR`, i.e., with `method` set to "SMM" and `addon` set to 10. ## References {#ref} [Johansen, S{\o}ren. 1991. “Estimation and Hypothesis Testing of Cointegration Vectors in Gaussian Vector Autoregressive Models.” Econometrica: Journal of the Econometric Society 59 (6): 1551.](https://www.jstor.org/stable/2938278?origin=crossref) [Z. Zhao and D. P. Palomar, "Robust maximum likelihood estimation of sparse vector error correction model," in2017 IEEE Global Conferenceon Signal and Information Processing (GlobalSIP), pp. 913–917,IEEE, 2017.](https://ieeexplore.ieee.org/abstract/document/8309093/) [Razaviyayn, Meisam, Maziar Sanjabi, and Zhi Quan Luo. 2016. “A Stochastic Successive Minimization Method for Nonsmooth Nonconvex Optimization with Applications to Transceiver Design in Wireless Communication Networks.” Mathematical Programming. A Publication of the Mathematical Programming Society 157 (2): 515–45.](https://link.springer.com/article/10.1007/s10107-016-1021-7) ## License This package is free and open source software, licensed under GPL-3