%% 
% @file  fullpres.Rnw
% @brief sweave file for the vignette
%
% @author Christophe Dutang
% @author Diethelm Wuertz 
%
%
% The new BSD License is applied to this software.
% Copyright (c) 2022 Christophe Dutang, Petr Savicky, Diethelm Wuertz. 
% All rights reserved.
%
%      Redistribution and use in source and binary forms, with or without
%      modification, are permitted provided that the following conditions are
%      met:
%      
%          - Redistributions of source code must retain the above copyright
%          notice, this list of conditions and the following disclaimer.
%          - Redistributions in binary form must reproduce the above
%          copyright notice, this list of conditions and the following
%          disclaimer in the documentation and/or other materials provided
%          with the distribution.
%          - Neither the name of the ETH Zurich, nor the name of Czech academy 
%	    of sciences nor the names of its contributors may be used to endorse or 
%          promote products derived from this software without specific prior written
%          permission.
%     
%      THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
%      "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
%      LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
%      A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
%      OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
%      SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
%      LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
%      DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
%      THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
%      (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
%      OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
%  
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%% Torus algorithm and other random number generators
%%%
%%%	Sweave vignette file, many sweave code ideas are taken from the 'expm' R-forge project
%%% 


\documentclass[11pt, a4paper]{article}

%\VignetteIndexEntry{A note on random number generation}
%\VignettePackage{randtoolbox}
%\VignetteKeyword{random}

  
% package

%American Mathematical Society (AMS) math symbols
\usepackage{amsfonts,amssymb,amsmath}
%english typography
\usepackage[english]{babel}

% 8 bit accents 
%\usepackage[applemac]{inputenc} %MAC encoding
\usepackage[utf8]{inputenc} %UNIX encoding
%\usepackage[ansinew]{inputenc} %WINDOWS encoding

%graphique
\usepackage{color, graphicx, wrapfig, subcaption}

%citation
\usepackage{natbib}

%reference hypertext
\usepackage[pagebackref=true, hyperfootnotes=false]{hyperref}

%footnote
%\usepackage[stable]{footmisc}
%\usepackage{perpage}
%\MakePerPage[1]{footnote}

%sweave package
\usepackage[noae]{Sweave}
%\setkeys{Gin}{width=0.49\textwidth} %change size of figure

%url
\usepackage{url}

%customized itemize
\usepackage{mdwlist}

%header
\pagestyle{headings}


% layout
\usepackage[left=2cm, right=2cm, top=2.5cm, bottom=2.5cm]{geometry}


% two column config from http://blog.robfelty.org/2007/03/20/latex-two-column-layouts-and-rubber-lengths/
% 
% % don't hyphenate so much - default = 200, max (never hyphenate) = 10,000
% \hyphenpenalty=2000  
% 
% %two column float page must be 90% full
% \renewcommand\dblfloatpagefraction{.90}
% %two column top float can cover up to 80% of page
% \renewcommand\dbltopfraction{.80}
% %float page must be 90% full
% \renewcommand\floatpagefraction{.90}
% %top float can cover up to 80% of page
% \renewcommand\topfraction{.80}
% %bottom float can cover up to 80% of page
% \renewcommand\bottomfraction{.80}
% %at least 10% of a normal page must contain text
% \renewcommand\textfraction{.1}
% 
% %separation between floats and text
% \setlength\dbltextfloatsep{9pt plus 5pt minus 3pt }
% %separation between two column floats and text
% \setlength\textfloatsep{10pt plus 4pt minus 3pt}
% 
% %space between columns
% \setlength{\columnsep}{1.5cm}


% macros
\newcommand\HRule{\noindent\rule{\linewidth}{1pt}}
\newcommand{\II}{\mbox{\large 1\hskip -0,353em 1}}
\newcommand{\abs}[1]{\lvert#1\rvert}
\newcommand{\norm}[1]{\lVert#1\rVert}
\newcommand{\ligne}{\rule[2mm]{.3\textwidth}{0,5mm}\\}
\newcommand{\pkg}{\textbf}
\newcommand{\sigle}{\textsc}
\newcommand{\code}{\texttt}
\newcommand{\soft}{\textsf}
\newcommand{\myskip}{\vspace{\parskip}}
\newcommand{\txtm}[1]{\textrm{~~#1~~}}
\newcommand{\expo}{\textsuperscript}
\newcommand{\ind}{1\!\!1}
\newcommand{\mat}[1]{\mathbf{#1}}
\newcommand{\blank}{ \clearpage{\pagestyle{empty}\cleardoublepage} }
\newcommand{\card}{\textrm{Card}}

% footnote in table or legend
\newcounter{noteTabCap} 
\newcommand{\initFmark}{\setcounter{noteTabCap}{0}} %init noteTabCap counter
\newcommand{\fmark}{\footnotemark  \addtocounter{noteTabCap}{1}} %mark footnote
\newcommand{\updateFtext}{\addtocounter{footnote}{-\value{noteTabCap}}} %update footnote counter
\newcommand{\ftext}[1]{\addtocounter{footnote}{1}  \footnotetext{#1}} %set the footnote text


\title{A note on random number generation	}


\author{Christophe Dutang and Diethelm Wuertz}


\begin{document}
\SweaveOpts{concordance=TRUE}

\maketitle

%\twocolumn


{\raggedright  
\vspace{-1cm}
	\textit{ 
	\hfill ``Nothing in Nature is random\dots\\
	  \hfill a thing appears random only through \\ 
	\hfill the incompleteness of our knowledge.''\\ 
		}
	\hfill Spinoza, Ethics I\footnote{quote taken from \cite{niederreiter}.}.	
\vspace{1cm}	
}

\section{Introduction}
Random simulation has long been a very popular and well studied
field of mathematics. There exists a wide range of applications in biology, 
finance, insurance, physics and many others. So simulations of random
numbers are crucial. 
In this note, we describe the most random number algorithms

Let us recall the only things, that are truly random, are the measurement
of physical phenomena such as thermal noises of semiconductor chips or radioactive sources\footnote{
for more details go to \url{http://www.random.org/randomness/}.}.

The only way to simulate some
randomness on computers are carried out by deterministic algorithms. 
Excluding true randomness\footnote{For true random
number generation on \soft{R}, use the \pkg{random} package of \cite{truerand}.},
there are two kinds random generation: pseudo and quasi random number generators. 

The package
\pkg{randtoolbox} provides \soft{R} functions for pseudo and quasi random number generations, as
well as statistical tests to quantify the quality of generated random numbers.


\section{Overview of random generation algorithms\label{rng}}
In this section, we present first the pseudo random number generation and second the quasi random
number generation. By ``random numbers'', we mean random variates of the uniform $\mathcal U(0,1)$ distribution. 
More complex distributions can be generated with uniform variates and rejection or inversion methods.
Pseudo random number generation aims to seem random whereas quasi random number
generation aims to be deterministic but well equidistributed.

Those familiars with algorithms such as linear congruential generation, Mersenne-Twister type algorithms,
and low discrepancy sequences should go directly to the next section.

\subsection{Pseudo random generation}
At the beginning of the nineties, there was no state-of-the-art algorithms to generate pseudo random numbers. And the article of \cite{parkmiller} entitled \emph{Random generators: good ones are
hard to find} is a clear proof.

Despite this fact, most users thought the \code{rand} function they used was good, because of a short period and a term to term dependence. But in 1998, Japenese
mathematicians Matsumoto and Nishimura invents the first algorithm whose period ($2^{19937}-1$) exceeds the number of electron spin changes since 
the creation of the Universe ($10^{6000}$ against $10^{120}$). It was a \textbf{big} breakthrough.


As described in \cite{lecuyer90}, a (pseudo) random number generator (RNG) is defined by a structure $(S, \mu, f, U, g)$
where 
\begin{itemize*} \item $S$ a finite set of \emph{states},
\item $\mu$ a probability distribution on $S$, called the \emph{initial distribution},
\item a \emph{transition function} $f : S\mapsto S$,
\item a finite set of \emph{output} symbols $U$, 
\item an \emph{output function} $g : S \mapsto U$.
\end{itemize*}
Then the generation of random numbers is as follows:
\begin{enumerate*}
\item generate the initial state (called the \emph{seed}) $s_0$ according to $\mu$ and compute $u_0 = g(s_0)$,
\item iterate for $i = 1,\dots$, $s_i = f(s_{i-1})$ and $u_i = g(s_i)$.
\end{enumerate*}
Generally, the seed $s_0$ is determined using the clock machine, 
and so the random variates $u_0, \dots, u_n, \dots$
seems ``real'' i.i.d. uniform random variates. 
The period of a RNG, a key characteristic, is the smallest 
integer $p \in \mathbb N$, such that
$\forall n \in \mathbb N, s_{p+n} = s_n$.

\subsubsection{Linear congruential generators}\label{congruRand}
There are many families of RNGs : linear congruential, multiple recursive,\dots and ``computer operation'' algorithms.
Linear congruential generators have a \emph{transfer function} of the following type
$$
f(x) = (a x + c) \mod m\footnote{this representation could be easily generalized for matrix, see \cite{lecuyer90}.},
$$
where $a$ is the multiplier, $c$ the increment and $m$ the modulus and $x, a, c, m \in \mathbb N$ (i.e. $S$
is the set of (positive) integers). $f$ is such that
$$
x_n = (a x_{n-1} + c) \mod m.
$$
Typically, $c$ and $m$ are chosen to 
be relatively prime and $a$ such that $\forall x \in \mathbb N, a x \mod m \neq 0$. The cycle length of
linear congruential generators will never exceed modulus $m$, but can 
maximised with the three following conditions
\begin{itemize*}
\item increment $c$ is relatively prime to $m$,
\item $a-1$ is a multiple of every prime dividing $m$,
\item $a-1$ is a multiple of 4 when $m$ is a multiple of 4,
\end{itemize*}
see \cite{knuth02} for a proof.


When $c=0$, we have
the special case of Park-Miller algorithm or Lehmer algorithm (see \cite{parkmiller}). 
Let us note that the $n+j$th term
can be easily derived from the $n$th term with $a$ puts to $a^j \mod m$ (still when $c=0$). 

Finally, we generally use one of the three types of \emph{output function}: 
\begin{itemize*}
\item $g: \mathbb N \mapsto [0,1[$, and $g(x) = \frac{x}{m}$, % \{0,\dots, m-1\}
\item $g: \mathbb N \mapsto ]0,1]$, and $ g(x) = \frac{x}{m-1}$,
\item $g: \mathbb N \mapsto ]0,1[$, and $ g(x) = \frac{x+1/2}{m}$.
\end{itemize*}
%The last one has the good property to generate variate in $]0,1[$.
Linear congruential generators are implemented in the \soft{R} function \code{congruRand}.


\subsubsection{Multiple recursive generators}
A generalisation of linear congruential generators are multiple recursive generators.
They are based on the following recurrences
$$
x_n = (a_1 x_{n-1} +\dots +a_k x_{n-k}  c) \mod m,
$$
where $k$ is a fixed integer. Hence the $n$th term of the sequence depends on the 
$k$ previous one. A particular case of this type of generators is when 
$$
x_n = (x_{n-37} + x_{n-100}) \mod 2^{30},
$$
which is a Fibonacci-lagged generator\footnote{see \cite{lecuyer90}.}. The period is around $2^{129}$.
This generator has 
been invented by \cite{knuth02} and is generally called ``Knuth-TAOCP-2002'' or 
simply ``Knuth-TAOCP"\footnote{TAOCP stands for The Art Of Computer Programming, Knuth's famous book.}.

An integer version of this generator is implemented
in the \soft{R} function \code{runif} (see \code{RNG}). We include in the package the
latest double version, which corrects undesirable deficiency. As described on Knuth's 
webpage\footnote{go to \url{http://www-cs-faculty.stanford.edu/\~knuth/news02.html\#rng}.}
, the previous version of 
Knuth-TAOCP fails randomness test if we generate few sequences with several seeds.
The cures to this problem is to discard the first 2000 numbers.


\subsubsection{Mersenne-Twister}
These two types of generators are in the big family of matrix linear congruential 
generators (cf. \cite{lecuyer90}). But until here, no algorithms
exploit the binary structure of computers (i.e. use binary operations). In 1994, 
Matsumoto and Kurita invented the TT800 generator using
binary operations. But \cite{matsumoto98} greatly improved the use of binary 
operations and proposed a new random number generator 
called Mersenne-Twister.

\cite{matsumoto98} work on the finite set $N_2 = \{0,1\}$, so a variable $x$
is represented by a vectors of $\omega$ bits (e.g. 32 bits). They use the 
following linear recurrence for the $n+i$th term:
$$
x_{i+n} = x_{i+m} \oplus (x_i^{upp} | x_{i+1}^{low} ) A,
$$
where $n > m$ are constant integers, $x_i^{upp}$ (respectively $x_i^{low}$) means the upper (lower)
$\omega -r$ ($r$) bits of $x_i$ and $A$ a $\omega\times\omega$ matrix
of $N_2$. $|$ is the operator of concatenation, so $x_i^{upp} | x_{i+1}^{low}$ appends the upper
$\omega-r$ bits of $x_i$ with the lower $r$ bits of $x_{i+1}$. After a right multiplication with
the matrix A\footnote{%------------ footnote
Matrix $A$ equals to $\left( \begin{array}{cc}  0 & I_{\omega-1}\\
\multicolumn{2}{c}{a} \end{array} \right)$
whose right multiplication can be done with a bitwise rightshift 
operation and an addition with integer $a$. See the section 2 of \cite{matsumoto98} for explanations.}, %------------ footnote
$\oplus$ adds the result with $x_{i+m}$ bit to bit modulo two (i.e. $\oplus$ 
denotes the exclusive-or called xor).

Once provided an initial seed $x_0, \dots, x_{n-1}$, Mersenne Twister produces
random integers in $0,\dots, 2^{\omega}-1$.
All operations 
used in the recurrence are bitwise operations, thus it is a very fast computation
compared to modulus operations used in previous algorithms.

To increase the equidistribution, \cite{matsumoto98} added a tempering step:
\begin{eqnarray*}
%Y_k &\leftarrow& X_{k+n}\\
y_i &\leftarrow& x_{i+n} \oplus (x_{i+n} >> u),\\
y_i &\leftarrow& y_i \oplus ( (y_i << s) \oplus b ),\\
y_i &\leftarrow& y_i \oplus ( (y_i << t) \oplus c ),\\
y_i &\leftarrow& y_i \oplus (y_i >> l),
\end{eqnarray*}
where $>> u$ (resp. $<< s$) denotes a rightshift (leftshift) of $u$ ($s$) bits. 
At last, we transform random integers to reals with one of output functions
$g$ proposed above.

Details of 
the order of the successive operations used in the Mersenne-Twister (MT) algorithm can
be found at the page 7 of \cite{matsumoto98}. However, the least, we need to 
learn and to retain, is all these (bitwise) operations can be easily done in many computer
languages (e.g in \soft{C}) ensuring a very fast algorithm.

The set of parameters used are 
\begin{itemize}
\item $(\omega, n, m, r) = (32, 624, 397, 31)$,
\item $a = 0\times 9908B0DF, b=0\times 9D2C5680, c=0\times EFC60000$,
\item $u=11$, $l=18$, $s=7$ and $t=15$.
\end{itemize}
These parameters ensure a good equidistribution and a period of $2^{n\omega -r}-1 = 2^{19937}-1$.

The great advantages of the MT algorithm are a far longer period than
any previous generators (greater than the period of \cite{parkmiller} sequence of $2^{32}-1$ or
the period of \cite{knuth02} around $2^{129}$), 
a far better equidistribution (since it passed the DieHard test)
as well as an \textbf{very} good computation time (since it used binary operations and not 
the costly real operation modullus).

MT algorithm is already implemented in \soft{R} (function \code{runif}). However
the package \code{randtoolbox} provide functions to compute a new version
of Mersenne-Twister (the SIMD-oriented Fast Mersenne Twister algorithm) as well
as the WELL (Well Equidistributed Long-period Linear) generator.

\subsubsection{Well Equidistributed Long-period Linear generators}
The MT recurrence can be rewritten as 
$$
x_i = A x_{i-1},
$$
where $x_k$ are vectors of $N_2$ and $A$ a transition matrix.
The charateristic polynom of $A$ is 
$$
\chi_A(z) \stackrel{\triangle}{=} \textrm{det}(A-zI) = z^k - \alpha_1 z^{k-1}-\dots- \alpha_{k-1} z -\alpha_k,
$$
with coefficients $\alpha_k$'s in $N_2$.
Those coefficients are linked with output integers by
$$
x_{i,j} = (\alpha_1 x_{i-1,j} +\dots + \alpha_k x_{i-k,j}) \mod 2
$$
for all component $j$.

From \cite{well}, we have the period length of the recurrence 
reaches the upper bound $2^k-1$ if and only if the polynom $\chi_A$
is a primitive polynomial over $N_2$.

The more complex is the matrix $A$ the slower will be the associated
generator. Thus, we compromise between speed and quality (of equidistribution). 
If we denote by $\psi_d$ the set of all $d$-dimensional vectors produced by the generator
from all initial \emph{states}\footnote{The cardinality of $\psi_d$ is $2^k$.}.

If we divide each dimension into $2^l$\footnote{with $l$ an integer such
that $l\leq \lfloor k/d\rfloor$.} cells (i.e. the unit hypercube $[0,1[^d$ is
divided into $2^{ld}$ cells), the set $\psi_d$ is said to be $(d,l)$-equidistributed 
if and only if each cell contains exactly $2^{k-dl}$ of its points. The largest
dimension for which the set $\psi_d$ is $(d,l)$-equidistributed is denoted by $d_l$.


The great advantage of using this definition is we are not forced to compute
random points to know the uniformity of a generator. Indeed, thanks to the
linear structure of the recurrence we can express the property of bits of the current
state. From this we define a dimension gap for $l$ bits resolution
as $\delta_l = \lfloor k/l \rfloor - d_l$.

An usual measure of uniformity is the sum of dimension gaps
$$
\Delta_1 = \sum_{l=1}^\omega \delta_l.
$$
\cite{well} tries to find generators with a dimension gap sum $\Delta_1$ around zero 
and a number $Z_1$ of non-zero coefficients in $\chi_A$ around $k/2$.
Generators with these two characteristics are called Well Equidistributed Long-period Linear
generators.
As a benchmark, Mersenne Twister algorithm is characterized with $k=19937$,
$\Delta_1=6750$ and $Z_1=135$. 

The WELL generator is characterized by the following $A$ matrix
$$
\left(
\begin{array}{cccccc}
T_{5,7,0} & 0 & \dots & & & \\
T_0 & 0 & \dots & & & \\
0 & I & \ddots & & & \\
 & \ddots & \ddots & & &\\
 & & \ddots & I  & 0 &\\
 & & & 0 & L & 0\\
\end{array}
\right),
$$
where $T_.$ are specific matrices, $I$ the identity matrix and $L$ has ones on its ``top left'' corner.
The first two lines are not entirely sparse but ``fill'' with $T_{.}$ matrices. All $T_.$'s matrices are
here to change the state in a very efficient way, while the subdiagonal (nearly full of ones) is used to shift the unmodified part of the current state to the next one. See \cite{well} for details.

The MT generator can be obtained with special values of $T_.$'s matrices. \cite{well} proposes
a set of parameters, where they computed dimension gap number $\Delta_1$. The full table
can be found in \cite{well}, we only sum up parameters for those implemented in this package
in table \ref{welltable}.

\begin{table}[!htb]
\center
\begin{tabular}{lcccccc}
\hline
name & $k$ & $N_1$ & $\Delta_1$\\
\hline
WELL512a & 512 & 225 & 0\\
\hline
WELL1024a & 1024 & 407 & 0\\
\hline
WELL19937a & 19937 & 8585 & 4\\
\hline 
WELL44497a & 44497 & 16883 & 7\\
\hline
\end {tabular}
\caption{Specific WELL generators}
\label{welltable}
\end{table}

Let us note that for the last two generators a tempering step is possible in order to have
maximally equidistributed generator (i.e. $(d,l)$-equidistributed for all $d$ and $l$). These 
generators are implemented in this package thanks to the \soft{C} code of L'Ecuyer and Panneton.

\subsubsection{SIMD-oriented Fast Mersenne Twister algorithms}\label{sfmt}
A decade after the invention of MT, \cite{matsumoto08} enhances their algorithm with the
computer of today, which have Single Instruction Mutiple Data operations letting to work
conceptually with 128 bits integers. 

MT and its successor
are part of the family of multiple-recursive matrix generators since they verify
a multiple recursive equation with matrix constants. For MT, we have
the following recurrence
\begin{multline*}
x_{k+n} = \underbrace{
x_k \left(\begin{array}{cc} I_{\omega-r} &0\\ 0 & 0 \end{array}\right) A
\oplus x_{k+1} \left(\begin{array}{cc} 0 & 0\\ 0 & I_{r} \end{array}\right)A  \oplus x_{k+m}.}_{
h(x_k, x_{k+1}, \dots, x_m,\dots,x_{k+n-1})}
\end{multline*}
for the $k+n$th term.

Thus the MT recursion is entirely characterized by 
$$
h(\omega_0, \dots, \omega_{n-1}) = (\omega_0 | \omega_{1})A\oplus \omega_m ,
$$
where $\omega_i$ denotes the $i$th word integer (i.e. horizontal vectors of $N_2$).

The general recurrence for the SFMT algorithm extends MT recursion to
$$
h(\omega_0, \dots, \omega_{n-1}) = \omega_0 A \oplus \omega_m B 
\oplus \omega_{n-2} C \oplus \omega_{n-1} D,
$$
where $A, B, C, D$ are sparse matrices 
over $N_2$, $\omega_i$ are 128-bit integers and 
the degree of recursion is $n=\left\lceil \frac{19937}{128}\right\rceil = 156$. 

The matrices $A, B, C$ and $ D$ for a word $w$ are defined as follows,
\begin{itemize*}
\item $wA = \left(w \stackrel{128}{<<} 8\right) \oplus w$, 
\item $wB = \left(w \stackrel{32}{>>} 11\right) \otimes c$, where $c$ is a 128-bit constant 
%$0\times BFFFFFF6 BFFAFFFF DDFECB7F DFFFFFEF$ in hexadecimal form 
and $\otimes$ the bitwise AND operator,
\item $wC = w \stackrel{128}{>>} 8$,
\item $wD = w \stackrel{32}{<<} 18$,
\end{itemize*}
where $\stackrel{128}{<<}$ denotes a 128-bit operation while $\stackrel{32}{>>}$ a 32-bit 
operation, i.e. an operation on the four 32-bit parts of  128-bit word $w$.

Hence the \emph{transition function} of SFMT is given by
\begin{eqnarray*}
f :  \left( N_2^\omega \right)^n &\mapsto & \left( N_2^\omega \right)^n \\
  (\omega_0, \dots, \omega_{n-1}) &\mapsto & (\omega_1, \dots, \omega_{n-1}, h(\omega_0, \dots, \omega_{n-1}) ),
\end{eqnarray*}
where $ \left( N_2^\omega \right)^n$ is the \emph{state} space.

The selection of recursion and parameters was carried out to find
a good dimension of equidistribution for a given a period. This step is done by
studying the characteristic polynomial of $f$. 
SFMT allow periods of $2^{p}-1$ with
$p$ a (prime) Mersenne exponent\footnote{a Mersenne exponent is a prime 
number $p$ such that $2^p-1$ is prime. Prime numbers of the form $2^p-1$
have the special designation Mersenne numbers.}. 
\cite{matsumoto08} proposes the following set of exponents
607, 1279, 2281, 4253, 11213, 19937, 44497, 86243, 132049 and 216091.

The advantage of SFMT over MT is the computation speed, SFMT is twice
faster without SIMD operations and nearly fourt times faster with SIMD operations. SFMT has also
a better equidistribution\footnote{See linear algebra arguments of \cite{matsumoto98}.} 
and a better recovery time from zeros-excess states\footnote{states with too many zeros.}.
The function \code{SFMT} provides an interface to the \soft{C} code of Matsumoto and Saito.

\subsection{Quasi random generation}
Before explaining and detailing quasi random generation, 
we must (quickly) explain Monte-Carlo\footnote{according to wikipedia the name comes
from a famous casino in Monaco.} methods,
which have been introduced in the forties. In this section, 
we follow the approach of \cite{niederreiter}. 

Let us work on the $d$-dimensional unit cube $I^d =[0,1]^d$ and with a (multivariate) bounded 
(Lebesgues) integrable function $f$ on $I^d$. Then we
define the Monte Carlo approximation of integral of $f$ over $I^d$ by
$$
\int_{I^d} f(x)dx \approx \frac{1}{n} \sum_{i=1}^n f(X_i),
$$
where $(X_i)_{1\leq i\leq n}$ are independent random points from $I^d$. 

The strong law of large numbers
ensures the almost surely convergence of the approximation. Furthermore, the expected integration
error is bounded by $O(\frac{1}{\sqrt n})$, with the interesting fact it does not depend on dimension $d$.
Thus Monte Carlo methods have a wide range of applications.

The main difference between (pseudo) Monte-Carlo methods 
and quasi Monte-Carlo methods is that we no longer
use random points $(x_i)_{1\leq i\leq n}$ but deterministic points. Unlike statistical tests,
numerical integration does not rely on true randomness. Let us note that quasi Monte-Carlo methods date
from the fifties, and have also been used for interpolation problems and integral equations solving.


In the following, we consider a sequence
'''Furthermore the convergence condition on the sequence $(u_i)_i$ 
is to be uniformly distributed in the unit cube $I^d$
with the following sense:
$$
\forall J\subset I^d, \underset{n\rightarrow +\infty}{\lim} \frac{1}{n} \sum_{i=1}^n \ind_J(u_i)  = \lambda_d(J),
$$
where $\lambda_d$ stands for the $d$-dimensional volume 
(i.e. the $d$-dimensional Lebesgue measure) and $\ind_J$ the 
indicator function of subset $J$. The problem is that our discrete sequence will never constitute
a ``fair'' distribution in $I^d$, since there will always be a small subset with no points.

Therefore, we need to consider a more flexible definition of uniform distribution of a sequence. Before 
introducing the discrepancy, we need to define $\card_E(u_1,\dots, u_n)$
as $\sum_{i=1}^n \ind_E(u_i)$ the number of points in subset $E$.
Then the discrepancy $D_n$ of the $n$ points $(u_i)_{1\leq i\leq n}$ in $I^d$ is given by 
$$
D_n = \underset{J\in \mathcal J}{\sup}\left| \frac{\card_J(u_1,\dots, u_n)}{n} - \lambda_d(J) \right|
$$
where $\mathcal J$ denotes the family of all subintervals of $I^d$ of the form $\prod_{i=1}^d [a_i,b_i ]$.
If we took the family of all subintervals of $I^d$ of the form $\prod_{i=1}^d [0,b_i ]$, $D_n$
is called the star discrepancy (cf. \cite{niederreiter92}). 

Let us note that
the $D_n$ discrepancy is nothing else than the $L_{\infty}$-norm over the unit cube 
of the difference between the empirical ratio of
points $(u_i)_{1\leq i\leq n}$ in a subset $J$ and the theoretical point number in $J$. A $L_{2}$-norm can be defined as well, see \cite{niederreiter92} or \cite{jackel}.

The integral error is bounded by
$$
\left|  \frac{1}{n} \sum_{i=1}^n f(u_i) - \int_{I^d} f(x)dx \right| \leq V_d(f) D_n,
$$
where $V_d(f)$ is the $d$-dimensional Hardy and Krause variation\footnote{
Interested readers can find the definition page 966 of \cite{niederreiter}. In a sentence, the
Hardy and Krause variation of $f$ is the supremum of sums of $d$-dimensional
delta operators applied to function $f$. } of $f$ on $I^d$ (supposed to be finite). 

Actually the integral error bound is the product of two independent quantities:
the variability of function $f$ through $V_d(f)$ and the regularity of the sequence
through $D_n$. So, we want
to minimize the discrepancy $D_n$ since we generally do not have
a choice in the ``problem'' function $f$.

We will not explain it but this concept can be extented to subset $J$ of
the unit cube $I^d$ in order to have a similar bound for $\int_J f(x)dx$. 

In the literature, there were many ways to find sequences with small discrepancy,
generally called low-discrepancy sequences or quasi-random points. A first
approach tries to find bounds for these sequences and to search the good parameters
to reach the lower bound or to decrease the upper bound. Another school tries
to exploit regularity of function $f$ to decrease the discrepancy. Sequences coming
from the first school are called quasi-random points while those of the second
school are called good lattice points.


\subsubsection{Quasi-random points and discrepancy}\label{quasi}
Until here, we do not give any example of quasi-random points. 
In the unidimensional case, an easy example of quasi-random points is
the sequence of $n$ terms given by $(\frac{1}{2n}, \frac{3}{2n}, \dots, \frac{2n-1}{2n})$.
This sequence has a discrepancy $\frac{1}{n}$, see \cite{niederreiter} for details.

 The problem with
this finite sequence is it depends on $n$. And if we want different points numbers,
we need to recompute the whole sequence. In the following, we will on
work the first $n$ points of an infinite sequence in order to use previous
computation if we increase $n$.

Moreover we introduce the notion of discrepancy on a finite sequence  $(u_i)_{1\leq i\leq n}$. In 
the above example, we are able to calculate exactly the discrepancy. With infinite sequence,
this is no longer possible. Thus, we will only try to estimate asymptotic equivalents of 
discrepancy.

The discrepancy of the average sequence of points is governed by the law of the iterated
logarithm :
$$
\underset{n\rightarrow +\infty}{\limsup} \frac{\sqrt{n} D_n}{ \sqrt{\log \log n} } = 1,
$$
which leads to the following asymptotic equivalent for $D_n$:
$$
D_n = O\left( \sqrt{\frac{\log \log n}{n}} \right).
$$


\subsubsection{Van der Corput sequences}
An example of quasi-random points, which have a low discrepancy, is
the (unidimensional) Van der Corput sequences. 

Let $p$ be a prime number. Every integer $n$ can be decomposed in the $p$ basis,
i.e. there exists some integer $k$ such that
$$
n = \sum_{j=1}^k a_j p^j.
$$
Then, we can define the radical-inverse function of integer $n$ as
$$
\phi_p(n) = \sum_{j=1}^k \frac{a_j}{ p^{j+1}}.
$$
And finally, the Van der Corput sequence is given by
$(\phi_p(0), \phi_p(1), \dots, \phi_p(n),\dots ) \in [0,1[$.
First terms of those sequence for prime numbers 2 and 3
are given in table \ref{vandercorput}.
\begin{table}[!htb]
\center
\begin{tabular}{lcccccc}
\hline
 & \multicolumn{3}{c}{$n$ in $p$-basis} & \multicolumn{3}{c}{$\phi_p(n)$}\\
$n$	&	$p=2$ & $p=3$ & $p=5$ & 	$p=2$& $p=3$& $p=5$\\
\hline
0	&	0	&	0	&	0	&	0	&	0	&	0\\
\hline
1 	&	1	&	1	&	1	&	0.5	&	0.333&	0.2\\
\hline
2	&	10	&	2	&	2	&	0.25	&	0.666&	0.4\\
\hline
3	&	11	&	10	&	3	&	0.75	&	0.111&	0.6\\
\hline
4	&	100	&	11	&	4	&	0.125	&	0.444&	0.8\\
\hline
5	&	101	&	12	&	10	&	0.625	&	0.777&	0.04\\
\hline
6	&	110	&	20	&	11	&	0.375	&	0.222&0.24\\
\hline
7	&	111	&	21	&	12	&	0.875	&	0.555&0.44\\
\hline
8	&	1000	&	22	&	13	&	0.0625	&	0.888&0.64\\
\hline
\end {tabular}
\caption{Van der Corput first terms}
\label{vandercorput}
\end{table}

The big advantage
of Van der Corput sequence is that they use $p$-adic fractions easily 
computable on the binary structure of computers.

%\begin{table}[!htb]
%\center
%\begin{tabular}{lcccc}
%\hline
% & \multicolumn{2}{c}{$n$ in $p$-basis} & \multicolumn{2}{c}{$\phi_p(n)$}\\
%$n$	&	$p=2$ & $p=3$ & 	$p=2$& $p=3$\\
%\hline
%0	&	0	&	0	&	0	&	0\\
%\hline
%1 	&	1	&	1	&	0.5	&	0.333\\
%\hline
%2	&	10	&	2	&	0.25	&	0.666\\
%\hline
%3	&	11	&	10	&	0.75	&	0.111\\
%\hline
%4	&	100	&	11	&	0.125	&	0.444\\
%\hline
%5	&	101	&	12	&	0.625	&	0.777\\
%\hline
%\end {tabular}
%\caption{Van der Corput first terms}
%\label{vandercorput}
%\end{table}


\subsubsection{Halton sequences}
The $d$-dimensional version of the Van der Corput sequence
is known as the Halton sequence. The $n$th term of the sequence
is define as
$$
%(\phi_{p_1}(0), \dots, \phi_{p_d}(0)), \dots , (\phi_{p_1}(n), \dots, \phi_{p_d}(n)), \dots.
(\phi_{p_1}(n), \dots, \phi_{p_d}(n)) \in I^d,
$$
where $p_1,\dots, p_d$ are pairwise relatively prime bases.
The discrepancy of the Halton sequence is asymptotically 
$
O\left( \frac{\log(n)^d}{n} \right).
$

The following Halton theorem gives us better discrepancy estimate
of finite sequences. For any dimension $d\geq 1$, there exists an
finite sequence of points in $I^d$ such that the discrepancy 
$$
D_n = O\left( \frac{\log(n)^{d-1}}{n} \right)\footnote{if the sequence has at least two points, cf.
\cite{niederreiter}.}.
$$
Therefore, we have a significant guarantee there exists quasi-random points which are
outperforming than traditional Monte-Carlo methods.

\subsubsection{Faure sequences}
The Faure  sequences is also based on the decomposition of integers into prime-basis but they 
have two differences: it uses only one prime number for basis and it permutes 
vector elements from one dimension to another.

The basis prime number is chosen as the smallest prime number greater than the dimension $d$,
i.e. 3 when $d=2$, 5 when $d=3\txtm{or}4$ etc\dots
In the Van der Corput sequence, we decompose integer $n$ into the $p$-basis:
$$
n = \sum_{j=1}^k a_j p^j.
$$
Let $a_{1,j}$ be integer $a_j$ used for the decomposition of $n$. Now we define a recursive permutation of
$a_j$:
$$
\forall 2\leq D \leq d, a_{D,j} = \sum_{j=i}^k C_j^i a_{D-1,j} \mod p,
$$
where $C_j^i$ denotes standard combination $\frac{j!}{i!(j-i)!}$.
Then we take the radical-inversion $\phi_p(a_{D,1}, \dots, a_{D,k})$ defined as
$$
\phi_p(a_{1}, \dots, a_{k}) = \sum_{j=1}^k \frac{a_{j}}{ p^{j+1}},
$$
which is the same as above for $n$ defined by the $a_{D,i}$'s.


Finally the ($d$-dimensional) Faure sequence is defined by 
$$
(\phi_p(a_{1,1}, \dots, a_{1,k}) , \dots, \phi_p(a_{d,1}, \dots, a_{d,k}) ) \in I^d.
$$
In the bidimensional case, we work in 3-basis, first terms of the
sequence are listed in table \ref{faure}.
\initFmark
\begin{table}[!htb]
\center
\begin{tabular}{lcccc}
\hline
% & \multicolumn{2}{c}{$n$ in $p$-basis} & \multicolumn{2}{c}{$\phi_p(n)$}\\
$n$	&	$a_{13}a_{12}a_{11}\fmark$ & $a_{23}a_{22}a_{21}$ & $\phi(a_{13}..)$ & $\phi(a_{23}..)$\\	%$\phi_p(a_{13}a_{12}a_{11})$& $\phi_p(a_{13}a_{12}a_{11})$\\
\hline
0	&	000	&	000	&	0	&	0\\
1 	&	001	&	001	&	1/3	&	1/3\\
2	&	002	&	002	&	2/3	&	2/3\\
\hline
3	&	010	&	012	&	1/9	&	7/9\\
4	&	011	&	010	&	4/9	&	1/9\\
5	&	012	&	011	&	7/9	&	4/9\\
\hline
6	&	020	&	021	&	2/9	&	5/9\\
7	&	021	&	022	&	5/9	&	8/9\\
8	&	022	&	020	&	8/9	&	2/9\\
\hline
9	&	100	&	100	&	1/27	&	1/27\\
10	&	101	&	101	&	10/27	&	10/27\\
11	&	102	&	102	&	19/27	&	19/27\\
\hline
12	&	110	&	112	&	4/27	&	22/27\\
13	&	111	&	110	&	12/27	&	4/27\\
14	&	112	&	111	&	22/27	&	12/27\\
\hline
\end {tabular}
\caption{Faure first terms}
\label{faure}
\end{table}

\updateFtext
\ftext{we omit commas for simplicity.}

%TODO add the generalized Faure sequence

\subsubsection{Sobol sequences}
This sub-section is taken from unpublished work of Diethelm Wuertz. 

The Sobol sequence $x_n=(x_{n,1}, \dots, x_{n,d})$ is generated from a set of binary functions
of length $\omega$ bits ($v_{i,j}$ with $i=1,\dots,\omega$ and $j=1,\dots,d$). $v_{i,j}$, generally
called direction numbers are numbers related to primitive (irreducible) polynomials over
the field $\{0,1\}$.

In order to generate the $j$th dimension, we suppose that the primitive polynomial in dimension $j$
is 
$$
p_j(x) = x^q+a_1x^{q-1}+\dots +a_{q-1}x+1.
$$
Then we define the following $q$-term recurrence relation on integers $(M_{i,j})_i$
\begin{multline*}
M_{i,j} = 2a_1 M_{i-1,j} \oplus 2^2a_2 M_{i-2,j} \oplus \dots
\oplus 2^{q-1}a_{q-1} M_{i-q+1,j} \oplus 2^qa_q M_{i-q,j} \oplus M_{i-q},
\end{multline*}
where $i>q$. 

This allow to compute direction numbers as
$$
v_{i,j} = M_{i,j}/2^i.
$$
This recurrence is initialized by the set of arbitrary odd integers $v_{1,j} 2^\omega, \dots, v_{,j} 2^q\omega$, which are smaller than $2, \dots, 2^q$ respectively. Finally the $j$th dimension of the $n$th term of the Sobol sequence is with
$$
x_{n,j} = b_1 v_{1,j} \oplus  b_2 v_{2,j} \oplus \dots \oplus v_{\omega, j},
$$  
where $b_k$'s are the bits of integer $n = \sum_{k=0}^{\omega-1}b_k 2^k$. The requirement is to use a different primitive polynomial in each dimension. An e?cient variant to implement the generation of Sobol sequences was proposed by \cite{antonov}. The use of this approach is demonstrated in \cite{bratley} and \cite{pressetal}.

\subsubsection{Scrambled Sobol sequences}
Randomized QMC methods are the basis for error estimation. A generic recipe is the following: 
Let $A_1 , \dots , A_n$ be a QMC point set and $X_i$ a scrambled version of $A_i$. 
Then we are searching for randomizations which have the following properties: 
\begin{itemize}
\item Uniformity: The $X_i$ makes the approximator 
$\hat I = \frac{1}{N} \sum_{i=1}^N f (X_i )$ an unbiased estimate of $I = \int_{[0,1]^d}f (x)dx$.
\item Equidistribution: The $X_i$ form a point set with probability 1; i.e. the random- 
ization process has preserved whatever special properties the underlying point set 
had. 
\end{itemize}
The Sobol sequences can be scrambled by the Owen's type of scrambling, by the Faure-Tezuka 
type of scrambling, and by a combination of both. 

The program we have interfaced to R is based on the ACM Algorithm 659 described by 
\cite{bratley} and \cite{bratleyfoxniederreiter}. Modifications by \cite{hong} allow for a randomization of the sequences. Furthermore, in the case of the Sobol sequence we followed the implementation of \cite{joekuo} which can handle up to 1111 dimensions.
\cite{joekuo08} propose implementations up to 21201 dimensions.

To interface the Fortran routines to the R environment some modifications had to be performed. One important point was to make possible to re-initialize a sequence and to recall a sequence without renitialization from R. This required to remove BLOCKDATA, COMMON and SAVE statements from the original code and to pass the initialization variables through the argument lists of the subroutines, so that these variables can be accessed from R.


\subsubsection{Kronecker sequences}
Another kind of low-discrepancy sequence uses irrational number and fractional part.
The fractional part of a real $x$ is denoted by $\{x\} = x-\lfloor x \rfloor$. The infinite
sequence $(\{n\alpha\})_{n\leq 0}$ has a bound for its discrepancy
$$
D_n \leq C\frac{1+\log n}{n}.
$$
This family of infinite sequence $(\{n\alpha\})_{n\leq 0}$ is called the Kronecker sequence.

A special case of the Kronecker sequence is the Torus algorithm where irrational number $\alpha$
is a square root of a prime number. 
The $n$th term of the $d$-dimensional Torus algorithm is defined by
$$
\left( \{n\sqrt{p_1}\} , \dots, \{n\sqrt{p_d}\} \right) \in I^d,
$$
where $(p_1, \dots,p_d)$ are prime numbers, generally the first $d$ prime numbers.
With the previous inequality, we can derive an estimate of the Torus algorithm discrepancy:
%Thus the discrepancy of the Torus algorithm is at most
$$
O\left( \frac{1+\log n}{n} \right).
$$

\subsubsection{Mixed pseudo quasi random sequences} \label{mixtorus}
Sometimes we want to use quasi-random sequences as pseudo random ones, i.e.
we want to keep the good equidistribution of quasi-random points but without the
term-to-term dependence.

One way to solve this problem is to use pseudo random generation to mix 
outputs of a quasi-random sequence. For example in the case of the Torus sequence,
we have
repeat for $1\leq i \leq n$
\begin{itemize*}
\item draw an integer $n_i$ from Mersenne-Twister in $\{0, \dots, 2^\omega-1\}$
\item then $u_i = \{n_i \sqrt{p}\}$
\end{itemize*}


\subsubsection{Good lattice points}
In the above methods we do not take into account a better regularity of the
integrand function $f$ than to be of bounded variation in the sense of Hardy and
Krause. Good lattice point sequences try to use the eventual better
regularity of $f$.

If $f$ is 1-periodic for each variable, then the approximation with
good lattice points is
$$
\int_{I^d} f(x)dx \approx \frac{1}{n} \sum_{i=1}^n f\left(\frac{i}{n}g\right),
$$
where $g\in \mathbb Z^d$ is suitable $d$-dimensional lattice point. To impose $f$
to be $1$-periodic may seem too brutal. But there exists a method
to transform $f$ into a $1$-periodic function while preserving
regularity and value of the integrand (see \citealp[page 983]{niederreiter}).

We have the following theorem for good lattice points. For every
dimension $d\geq 2$ and integer $n\geq 2$, there exists a lattice points
$g\in \mathbb Z^d$ which coordinates relatively prime to $n$ such
that the discrepancy $D_n$ of points $\{\frac{1}{n}g \}, \dots, \{\frac{n}{n}g \}$
satisfies
$$
D_s < \frac{d}{n}+\frac{1}{2n} \left(\frac{7}{5}+2\log m\right)^d.
$$

Numerous studies of good lattice points try to find point $g$
which minimizes the discrepancy. Korobov test $g$ of the 
following form $\left(1, m, \dots, m^{d-1}\right)$ with $m\in \mathbb N$.
Bahvalov tries Fibonnaci numbers $(F_1,\dots, F_d)$.
Other studies look directly for the point $\alpha=\frac{g}{n}$
e.g. $\alpha = \left(p^{\frac{1}{d+1}}, \dots, p^{\frac{d}{d+1}}\right)$
or some cosinus functions. We let interested readers to 
look for detailed information in \cite{niederreiter}.

\section{Examples of distinguishing from truly random numbers}\label{distinguishing}

For a good generator, it is not computationally easy to distinguish the output
of the generator from truly random numbers, if the seed or the index in the
sequence is not known. In this section, we present examples of generators,
whose output may be easily distinguished from truly random numbers.

An example of such a generator is the older version of Wichmann-Hill from 1982. For
this generator, we can even predict the next number in the sequence, if we know the
last already generated one. Verifying such a predicition is easy and it is,
of course, not valid for truly random numbers. Hence, we can easily distinguish
the output of the generator from truly random numbers. An implementation
of this test in R derived from \cite{mccullough08} is as follows.
%%% R code
<<wh.predict, fig=FALSE, echo=TRUE, eval=TRUE>>=
wh.predict <- function(x)
{
    M1 <- 30269
    M2 <- 30307
    M3 <- 30323
    y <- round(M1*M2*M3*x)
    s1 <- y %% M1
    s2 <- y %% M2
    s3 <- y %% M3
    s1 <- (171*26478*s1) %% M1
    s2 <- (172*26070*s2) %% M2
    s3 <- (170*8037*s3) %% M3
    (s1/M1 + s2/M2 + s3/M3) %% 1
}

RNGkind("Wichmann-Hill")
xnew <- runif(1)
maxerr <- 0
for (i in 1:1000) {
    xold <- xnew
    xnew <- runif(1)
	err <- abs(wh.predict(xold) - xnew)
    maxerr <- max(err, maxerr)
}
print(maxerr)
@
%%%

The printed error is $0$ on some machines and less than $5 \cdot 10^{-16}$ on
other machines. This is clearly different from the error obtained for truly random
numbers, which is close to $1$.

The requirement that the output of a random number generator should not
be distinguishable from the truly random numbers by a simple computation,
is directly related to the way, how a generator is used. Typically, we
use the generated numbers as an input to a computation and we expect that
the distribution of the output (for different seeds or for different
starting indices in the sequence) is the same as if the input are truly
random numbers. A failure of this assumption implies, besides of a wrong result
of our simulation, that observing the
output of the computation allows to distinguish the output from the generator
from truly random numbers. Hence, we want to use a generator, for which
we may expect that the calculations used in the intended application cannot
distinguish its output from truly random numbers.

In this context, it has to be noted that many of the currently used generators
for simulations can be distinguished from truly random numbers using the arithmetic
mod 2 (XOR operation) applied to individual bits of the output numbers. This is true
for Mersenne Twister, SFMT and also all WELL generators. The basis for tolerating
this is based on two facts. First, the arithmetic mod 2 and extracting individual
bits of real numbers is not directly used in typical simulation problems and
real valued functions, which represent these operations, are extremely discontinuous
and such functions also do not typically occur in simulation problems. Another
reason is that we need to observe quite a long history of the output to detect
the difference from true randomness. For example, for Mersenne Twister, we
need $624$ consecutive numbers.

On the other hand, if we use a cryptographically strong pseudorandom number
generator, we may avoid distinguishing from truly random numbers under any known
efficient procedure. Such generators are typically slower than Mersenne
Twister type generators. The factor of slow down is, for example for AES,
about 5. However, note that for simulation problems, which require intensive
computation besides the generating random numbers, using slower, but better,
generator implies only negligible slow down of the computation as a whole.


\section{Description of the random generation functions\label{rngfunc}}
In this section, we detail the \soft{R} functions implemented
in \code{randtoolbox} and give examples.

\subsection{Pseudo random generation}
For pseudo random generation, \soft{R} provides many
algorithms through the function \code{runif} parametrized 
with \code{.Random.seed}. We encourage readers to 
look in the corresponding help pages for examples and
usage of those functions. Let us just say \code{runif} use
the Mersenne-Twister algorithm by default and other
generators such as Wichmann-Hill, Marsaglia-Multicarry or
Knuth-TAOCP-2002\footnote{see \cite{wichmann},
\cite{marsaglia} and \cite{knuth02} for details.}.

\subsubsection{\code{congruRand}}
The \code{randtoolbox} package provides two pseudo-random 
generators functions : \code{congruRand} and \code{SFMT}.
\code{congruRand} computes linear congruential generators,
see sub-section \ref{congruRand}. By default, it computes
the \cite{parkmiller} sequence, so it needs only the observation
number argument. If we want to generate 10 random numbers, we
type
%%% R code
<<congrurand, fig=FALSE, echo=TRUE, eval=FALSE>>=
congruRand(10)
@
<<congrurand2, fig=FALSE, echo=FALSE>>=
library(randtoolbox)
options(width = 40)
congruRand(10)
@
%%%

One will quickly note that two calls to \code{congruRand} will
not produce the same output. This is due to the fact that we use
the machine time to initiate the sequence at each call.
But the user can set the \emph{seed} with the function \code{setSeed}:
%%% R code
<<congrurandseed1, fig=FALSE, echo=TRUE, eval=FALSE>>=
setSeed(1)
congruRand(10)
@
<<congrurandseed2, fig=FALSE, echo=FALSE>>=
options( width =40)
setSeed(1)
congruRand(10)
@
%%%

One can follow the evolution of the $n$th integer generated with the option
\code{echo=TRUE}.
%%% R code
<<congrurandseed3, fig=FALSE, echo=TRUE, eval=FALSE>>=
setSeed(1)
congruRand(10, echo=TRUE)
@
<<congrurandseed4, fig=FALSE, echo=FALSE>>=
options( width =40)
setSeed(1)
congruRand(10, echo=TRUE)
@
%%%
We can check that those integers are the 10 first terms are listed in table \ref{parkmiller},
coming from \url{http://www.firstpr.com.au/dsp/rand31/}.
\begin{table}[!htb]
\center
\begin{tabular}{llll}
\hline
$n$&$x_n$ & $n$ &$x_n$\\
\hline
1 & 16807          &6 & 470211272         \\
2 & 282475249   &7 & 101027544         \\       
3 & 1622650073   &8 & 1457850878         \\       
4 & 984943658      &9 & 1458777923          \\
5 & 1144108930     &10 & 2007237709         \\
\hline
\end {tabular}
\caption{10 first integers of \cite{parkmiller} sequence}
\label{parkmiller}
\end{table}

We can also check around the 10000th term. From the site \url{http://www.firstpr.com.au/dsp/rand31/},
we know that 9998th to 10002th terms of the Park-Miller sequence are 
925166085, 1484786315, 1043618065, 1589873406, 2010798668.
The \code{congruRand} generates 
%%% R code
<<congrurandseed5, fig=FALSE, echo=TRUE, eval=FALSE>>=
setSeed(1614852353)
congruRand(5, echo=TRUE)
@
<<congrurandseed6, fig=FALSE, echo=FALSE>>=
options( width =40)
setSeed(1614852353)
congruRand(5, echo=TRUE)
@
%%%
with 1614852353 being the 9997th term of Park-Miller sequence. 

However, we are not limited to the Park-Miller sequence. If we change 
the modulus, the increment and the multiplier, we get other random sequences.
For example, 
%%% R code
<<congrurandseed5, fig=FALSE, echo=TRUE, eval=FALSE>>=
setSeed(12)
congruRand(5, mod = 2^8, mult = 25, incr = 16, echo= TRUE)
@
<<congrurandseed6, fig=FALSE, echo=FALSE>>=
options( width =30)
setSeed(12)
congruRand(5, mod = 2^8, mult = 25, incr = 16, echo= TRUE)
@
%%%
Those values are correct according to \citealp[page 119]{planchetMFAbook}.

\initFmark
Here is a example list of RNGs computable with \code{congruRand}:
\begin{table}[!htb]
\center
\begin{tabular}{llll}
\hline
RNG &\code{mod} & \code{mult} &\code{incr}\\
\hline
Knuth - Lewis & $2^{32}$ & $1664525$ & $1.01e9\fmark $\\
Lavaux - Jenssens & $2^{48}$ & $31167285$ & $1$\\
Haynes & $2^{64}$ & $6.36e17\fmark $ & $1$\\
Marsaglia & $2^{32}$ & $69069$ & $0$\\
Park - Miller & $2^{31}-1$ & $16807$ & $0$\\
\hline
\end {tabular}
\caption{some linear RNGs}
%\label{parkmiller}
\end{table}

%initialiser le compteur avec \initFmark avant d'utilise
%placer \fmark a l'endroit de la legende
%en dehors du tableau/legende, mettre a jour le compteur footnote avec \updateFtext
%placer \ftext{...le texte de la legende... } en dehors du tableau ou de la legende
\updateFtext
\ftext{1013904223.}

\ftext{636412233846793005.}


One may wonder why we implement such a short-period algorithm since
we know the Mersenne-Twister algorithm. It is provided to make
comparisons with other algorithms. The Park-Miller RNG should \textbf{not} be 
viewed as a ``good'' random generator. 

Finally, \code{congruRand} function has a \code{dim} argument to generate 
\code{dim}- dimensional vectors of random numbers. The $n$th vector
is build with $d$ consecutive numbers of the RNG sequence 
(i.e. the $n$th term is the $(u_{n+1}, \dots, u_{n+d})$).

\subsubsection{\code{SFMT}}
The SF- Mersenne Twister algorithm is described in sub-section \ref{sfmt}.
Usage of \code{SFMT} function implementing the SF-Mersenne Twister algorithm
is the same. First argument \code{n} is the number of random variates, second
argument \code{dim} the dimension. 

%%% R code
<<sfmt, fig=FALSE, echo=TRUE, eval=FALSE>>=
SFMT(10)
SFMT(5, 2) #bi dimensional variates
@
<<sfmt2, fig=FALSE, echo=FALSE>>=
options( width =40)
SFMT(10)
SFMT(5, 2)
@
%%%

A third argument is \code{mexp} for Mersenne exponent with possible values
(607, 1279, 2281, 4253, 11213, 19937, 44497, 86243, 132049 and 216091).
Below an example with a period of $2^{607}-1$:
%%% R code
<<sfmt3, fig=FALSE, echo=TRUE, eval=FALSE>>=
SFMT(10, mexp = 607)
@
<<sfmt4, fig=FALSE, echo=FALSE>>=
options( width =40)
SFMT(10, mexp = 607)
@
%%%

Furthermore, following the advice of \cite{matsumoto08} for each exponent below 19937,
\code{SFMT} uses a different set of parameters\footnote{this can be avoided
with \code{usepset} argument to \code{FALSE}.} in order to increase the independence
of random generated variates between two calls. Otherwise (for greater exponent
than 19937) we use one set of parameters\footnote{These parameter sets can be found
in the \soft{C} function \code{initSFMT} in \code{SFMT.c} source file.}. 

We must precise that we do \textbf{not} implement the SFMT algorithm, we ``just''
use the \soft{C} code of \cite{matsumoto08}. For the moment, we do not fully use
the strength of their code. For example, we do not use block generation and SSE2 SIMD operations.


\subsection{Quasi-random generation}

\subsubsection{Halton sequences}
The function \code{halton} implements both the Van Der Corput (unidimensional) and
Halton sequences. The usage is similar to pseudo-RNG functions 

%%% R code
<<halton, fig=FALSE, echo=TRUE, eval=FALSE>>=
halton(10)
halton(10, 2)
@
<<halton2, fig=FALSE, echo=FALSE>>=
options( width =40)
halton(10)
halton(10, 2)
@
%%%

You can use the \code{init} argument set to FALSE (default is TRUE) if you want that two calls to \code{halton} functions do not produce the same sequence (but the second call continues the sequence from the first call.

%%% R code
<<halton3, fig=FALSE, echo=TRUE, eval=FALSE>>=
halton(5)
halton(5, init=FALSE)
@
<<halton4, fig=FALSE, echo=FALSE>>=
options( width =40)
halton(5)
halton(5, init=FALSE)
@
%%%

\code{init} argument is also available for other quasi-RNG functions.

\subsubsection{Sobol sequences}
The function \code{sobol} implements the Sobol sequences with optional sampling (Owen, Faure-Tezuka or both type of sampling). This sub-section also comes from an unpublished work of Diethelm Wuertz.

To use the different scrambling option, you just to use the \code{scrambling} argument:
0 for (the default) no scrambling, 1 for Owen, 2 for Faure-Tezuka and 3 for both types of scrambling.
 
 %%% R code
<<sobol, fig=FALSE, echo=TRUE, eval=FALSE>>=
sobol(10)
@
<<sobol2, fig=FALSE, echo=FALSE>>=
options( width =40)
sobol(10)
@
%%%

It is easier to see the impact of scrambling by plotting two-dimensional sequence in the unit square. Below we plot the default Sobol sequence and Sobol scrambled by Owen algorithm, see figure \ref{sobol01}.


%%% R code
<<unitsquare1, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow = c(1,2))
plot(sobol(1000, 2))
plot(sobol(10^3, 2, scrambling=1))
@
\begin{figure}[!htb]
\begin{center}
<<unitsquare2, fig=TRUE, echo=FALSE, width=10, height=5, eps=FALSE>>=
par(mfrow = c(1,2))
plot(sobol(1000, 2), xlab ="u", ylab="v", main="Sobol (no scrambling)")
plot(sobol(10^3, 2, scrambling=1), xlab ="u", ylab="v", main="Sobol (Owen)")
@
\caption{Sobol (two sampling types)}
\label{sobol01}
\end{center}
\end{figure} 
%%%


\subsubsection{Faure sequences}
In a near future, \code{randtoolbox} package will have an implementation of Faure sequences.
For the moment, there is no function \code{faure}.

\subsubsection{Torus algorithm (or Kronecker sequence)}
The function \code{torus} implements the Torus algorithm. 
%%% R code
<<torus, fig=FALSE, echo=TRUE, eval=FALSE>>=
torus(10)
@
<<torus2, fig=FALSE, echo=FALSE>>=
options( width =40)
torus(10)
@
%%%
These numbers are fractional parts of $\sqrt{2}, 2\sqrt{2}, 3\sqrt{2},\dots$, see sub-section
\ref{quasi} for details.

%%% R code
<<torus3, fig=FALSE, echo=TRUE, eval=FALSE>>=
torus(5, use =TRUE)
@
<<torus4, fig=FALSE, echo=FALSE>>=
options( width =40)
torus(5, use =TRUE)
@
%%%
The optional argument \code{useTime} can be used to the machine time or not to initiate
the seed. If we do not use the machine time, two calls of \code{torus} produces obviously 
the \textbf{same} output.


If we want the random sequence with prime number 7, we just type:
%%% R code
<<torus5, fig=FALSE, echo=TRUE, eval=FALSE>>=
torus(5, p =7)
@
<<torus6, fig=FALSE, echo=FALSE>>=
options( width =40)
torus(5, p =7)
@
%%%

The \code{dim} argument is exactly the same
as \code{congruRand} or \code{SFMT}. By default,
we use the first prime numbers, e.g. 2, 3 and 5 for 
a call like \code{torus(10, 3)}. But the user can
specify a set of prime numbers, e.g. \code{torus(10, 3, c(7, 11, 13))}.
The dimension argument is limited to 100 000\footnote{the
first 100 000 prime numbers are take from \url{http://primes.utm.edu/lists/small/millions/}.}.

As described in sub-section \ref{mixtorus}, one way to deal with serial dependence is to mix
the Torus algorithm with a pseudo random generator. The \code{torus} function offers this operation
thanks to argument \code{mixed} (the Torus algorithm is mixed with SFMT).
%%% R code
<<torus7, fig=FALSE, echo=TRUE, eval=FALSE>>=
torus(5,  mixed =TRUE)
@
<<torus8, fig=FALSE, echo=FALSE>>=
options( width =40)
torus(5,  mixed =TRUE)
@
%%%
In order to see the difference between, we can plot the empirical autocorrelation 
function (\code{acf} in \soft{R}), see figure \ref{acfplot}.
%%% R code
<<torus9, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow = c(1,2))
acf(torus(10^5))
acf(torus(10^5, mix=TRUE))
@
\begin{figure}[!htb]
\begin{center}
<<torusacf, fig=TRUE, echo=FALSE, height=4, width = 6, eps=FALSE>>=
par(mfrow = c(1,2))
acf(torus(10^5))
acf(torus(10^5, mix=TRUE))
@
\caption{Auto-correlograms}
\label{acfplot}
\end{center}
\end{figure}
%%%


%TODO : mix argument in SFMT

\subsection{Visual comparisons}
To understand the difference between pseudo and quasi RNGs, 
we can make visual comparisons of how random numbers
fill the unit square.

First we compare SFMT algorithm with Torus algorithm on figure \ref{SFMTTorus}.

%%% R code
<<unitsquare3, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow = c(1,2))
plot(SFMT(1000, 2))
plot(torus(10^3, 2))
@
\begin{figure}[!htb]
\begin{center}
<<unitsquare4, fig=TRUE, echo=FALSE, width=10, height=5, eps=FALSE>>=
par(mfrow = c(1,2))
plot(SFMT(1000, 2), xlab ="u", ylab="v", main="SFMT")
plot(torus(1000, 2), xlab ="u", ylab="v", main="Torus")
@
\caption{SFMT vs. Torus algorithm}
\label{SFMTTorus}
\end{center}
\end{figure} 
%%%

Secondly we compare WELL generator with Faure-Tezuka-scrambled Sobol sequences on figure \ref{WELLSobol}.

%%% R code
<<unitsquare5, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow = c(1,2))
plot(WELL(1000, 2))
plot(sobol(10^3, 2, scrambling=2))
@
\begin{figure}[!htb]
\begin{center}
<<unitsquare6, fig=TRUE, echo=FALSE, width=10, height=5, eps=FALSE>>=
par(mfrow = c(1,2))
plot(WELL(1000, 2), xlab ="u", ylab="v", main="WELL 512a")
plot(sobol(10^3, 2, scrambling=2), xlab ="u", ylab="v", main="Sobol (Faure-Tezuka)")
@
\caption{WELL vs. Sobol}
\label{WELLSobol}
\end{center}
\end{figure} 
%%%


%problem of high dimensions of qmc sequence

%3D graphs??

%\newpage

\subsection{Applications of QMC methods}

\subsubsection{$d$ dimensional integration}
Now we will show how to use low-discrepancy sequences to compute a $d$-dimensional integral defined on the unit hypercube. We want compute
\begin{multline*}
I_{cos}(d) = \int_{\mathbb R^d} cos(|| x ||) e^{|| x ||^2 } dx
\approx \frac{\pi^{d/2}}{n} \sum_{i=1}^n cos\left(\sqrt{ \sum\limits_{j=1}^d \left(\Phi^{-1} \right)^2 (t_{ij}) } \right).
\end{multline*}
where $\Phi^{-1}$ denotes the quantile function of the standard normal distribution.

We simply use the following code to compute the $I_{cos}(25)$ integral whose exact value is $-1356914$.

%%% R code
<<integralcos, fig=FALSE, echo=TRUE, eval=FALSE>>=
I25 <- -1356914
nb <- c(1200, 14500, 214000)
ans <- NULL
for(i in 1:3)
{
 tij <- sobol(nb[i], dim=25, 
  scrambling=0, normal=TRUE)
 Icos <- sqrt(rowSums(tij^2/2))
 Icos <- mean(cos(Icos)) * pi^(25/2)
 ans <- rbind(ans, c(n=nb[i], 
  I25=Icos, Delta=(Icos-I25)/I25))
}
data.frame(ans)
@

<<integralcos2, fig=FALSE, echo=FALSE>>=
options( width =40)
I25 <- -1356914
nb <- c(1200, 14500, 214000)
ans <- NULL
for(i in 1:3)
{
	tij <- sobol(nb[i], dim=25, scramb=0, norm=TRUE )
	Icos <- mean(cos(sqrt( apply( tij^2/2, 1, sum ) ))) * pi^(25/2)
	ans <- rbind(ans, c(n=nb[i], I25=Icos, Delta=(Icos-I25)/I25 ))
}
data.frame(ans)
@
%%%

The results obtained from the Sobol Monte Carlo method in comparison to those obtained by \cite{papageorgiou} with a generalized Faure sequence and in comparison with the quadrature rules of \cite{mcnamee}, \cite{genz} and \cite{paterson} are listed in the following table.

\begin{table}[!htb]
\center
\begin{tabular}{lccc}
\hline
n & 1200 & 14500 & 214000\\
\hline
Faure (P\&T) &  0.001 & 0.0005 & 0.00005\\
\hline
Sobol (s=0) & 0.02 & 0.003 & 0.00006\\
s=1 & 0.004 & 0.0002 & 0.00005\\
s=2 & 0.001 & 0.0002 & 0.000002\\
s=3 & 0.002 & 0.0009 & 0.00003\\
\hline
Quadrature (McN\&S) & 2 & 0.75 & 0.07\\
G\&P & 2 & 0.4 & 0.06\\
\hline
\end {tabular}
\caption{list of errors}
%\label{parkmiller}
\end{table}


\subsubsection{Pricing of a Vanilla Call}

In this sub-section, we will present one financial application of QMC methods.
We want to price a vanilla European call in the framework of a geometric
Brownian motion for the underlying asset. Those options are already 
implemented in the package \code{fOptions} of \code{Rmetrics}
bundle\footnote{created by \cite{foptions}.}.

The payoff of this classical option is 
$$
f(S_T) = (S_T-K)_+,
$$
where $K$ is the strike price. A closed formula
for this call was derived by \cite{blackscholes73}.

The Monte Carlo method to price this option is quite simple
\begin{enumerate*}
\item simulate $s_{T,i}$ for $i=1\dots n$ from starting point $s_0$,
\item compute the mean of the discounted payoff $\frac{1}{n}\sum_{i=1}^n e^{-rT}(s_{T,i}-K)_+ $.
\end{enumerate*}
With parameters ($S_0=100$, $T=1$, $r=5\%$, $K=60$, $\sigma=20\%$), we plot
the relative error as a function of number of simulation $n$ on figure \ref{vanilla}.

We
test two pseudo-random generators (namely Park Miller and SF-Mersenne Twister)
and one quasi-random generator (Torus algorithm). No code will
be shown, see the file \code{qmc.R} in the package source. But we use a step-by-step
simulation for the Brownian motion simulation and the inversion function method
for Gaussian distribution simulation (default in \soft{R}).

\begin{figure}[!htb]
\begin{center}
\includegraphics{vanilla.pdf}
\caption{Error function for Vanilla call}
\label{vanilla}
\end{center}
\end{figure} 

As showed on figure \ref{vanilla}, the convergence of Monte Carlo price for the Torus algorithm
is extremely fast. Whereas for SF-Mersenne Twister and Park Miller prices, the convergence
is very slow.

\subsubsection{Pricing of a DOC}

Now, we want to price a barrier option: a down-out call i.e. an Downward
knock-Out Call\footnote{ DOC is disactived when the underlying
asset hits the barrier.}. These kind of options belongs to the path-dependent
option family, i.e. we need to simulate whole trajectories of the underlying asset $S$ on $[0,T]$.

In the same framework of a geometric Brownian motion,
there exists a closed formula for DOCs (see \cite{rubinstein}). Those options
are already implemented in the package \code{fExoticOptions} of \code{Rmetrics}
bundle\footnote{created by \cite{fexoptions}.}.

The payoff of a DOC option is
$$
f(S_T) = (S_T-K)_+ \ind_{(\tau_H > T)},
$$
where $K$ is the strike price, $T$ the maturity, $\tau_H$ the stopping
time associated with the barrier $H$ and $S_t$ the 
underlying asset at time $t$. 

%The price of a such DOC is
%the expected payoff under risk neutral probability:
%$$
%P = E_{\mathbb Q}\left[ e^{-rT} f(S_T) \right],
%$$
%where $r$ denotes the risk-free rate.

As the price is needed on the whole period $[0,T]$, we produc as follows:
\begin{enumerate*}
\item start from point $s_{t_0}$,
\item for simulation $i=1\dots n$ and time index $j=1\dots d$
	\begin{itemize*}
	\item simulate $s_{t_j, i}$ ,
	\item update disactivation boolean $D_i$
	\end{itemize*}
\item compute the mean of the discounted payoff $\frac{1}{n}\sum_{i=1}^n e^{-rT} (s_{T,i}-K)_+ \overline D_i $,
\end{enumerate*}
where $n$ is the simulation number, $d$ the point number for the grid of time and $ \overline D_i$ the opposite
of boolean $D_i$. 

In the following, we set $T=1$, $r=5\%$, $s_{t_0}=100$, $H=K=50$, $d=250$ and $\sigma=20\%$. 
We test crude Monte Carlo methods with Park Miller and SF-Mersenne Twister generators and
a quasi-Monte Carlo method with (multidimensional) Torus algorithm on the figure \ref{doc}.

\begin{figure}[!htb]
\begin{center}
\includegraphics{DOC1e5.pdf}
\caption{Error function for Down Out Call}
\label{doc}
\end{center}
\end{figure} 

One may wonder why the Torus algorithm is still the best (on this example). We use
the $d$-dimensional Torus sequence. Thus for time $t_j$, the simulated underlying assets
$(s_{t_j,i})_i$ are computed with the sequence $(i \{\sqrt{p_j}\})_i$. Thanks to the linear
independence of the Torus sequence over the rationals\footnote{i.e. for $k\neq j, \forall i, (i \{\sqrt{p_j}\})_i$ 
and $(i \{\sqrt{p_k}\})_i$ are linearly independent over $\mathbb Q$.},
we guarantee a non-correlation of Torus quasi-random numbers.

However, these results do \textbf{not} prove the Torus algorithm is always better than
traditional Monte Carlo. The results are sensitive to the barrier level $H$, the strike price $X$
(being in or out the money has a strong impact), the asset volatility $\sigma$ and
the time point number $d$.

Actuaries or readers with actuarial background can find an example of actuarial applications of
QMC methods in \cite{albrecher}. This article focuses on simulation methods in ruin models 
with non-linear dividend barriers.

\section{Random generation tests\label{rngtest}}
Tests of random generators aim to check if the output $u_1,\dots, u_n,\dots$ could be 
considered as independent and identically distributed (i.i.d.) uniform variates for
a given confidence level. There are
two kinds of tests of the uniform distribution: first on the interval $]0,1[$,
second on the binary set $\{0,1\}$. In this note, we only describe tests for $]0,1[$ outputs
(see \cite{lecuyer07} for details about these two kind of tests).

Some RNG tests can be two-level tests, i.e. we do not work directly on the RNG output $u_i$'s
but on a function of the output such as the spacings (coordinate difference of the sorted sample).

\subsection{Test on one sequence of $n$ numbers}
\subsubsection{Goodness of Fit}
Goodness of Fit tests compare the empirical cumulative distribution 
function (cdf) $\mathbb F_n$ of $u_i$'s with a specific
distribution ($\mathcal U(0,1)$ here). The most known test are Kolmogorov-Smirnov,
Cr\'amer-von Mises and Anderson-Darling tests. They use different norms to quantify
the difference between the empirical cdf $\mathbb F_n$ and the true cdf $F_{\mathcal U_{(0,1)}}$.
\begin{itemize*}
\item Kolmogorov-Smirnov statistic is
$$K_n = \sqrt n ~\underset{x\in \mathbb R}{\sup} \left |\mathbb F_n(x) - F_{\mathcal U_{(0,1)}}(x) \right |,$$
\item Cr\'amer-von Mises statistic is
$$W_n^2 = n\int_{-\infty}^{+\infty} \left( \mathbb F_n(x) - F_{\mathcal U_{(0,1)}}(x) \right)^2 dF_{\mathcal U_{(0,1)}}(x),$$
\item and Anderson-Darling statistic is
$$A_n^2 = n\int_{-\infty}^{+\infty} \frac{\left( \mathbb F_n(x) - F_{\mathcal U_{(0,1)}}(x) \right)^2 dF_{\mathcal U_{(0,1)}}(x) }{
F_{\mathcal U_{(0,1)}}(x)(1-F_{\mathcal U_{(0,1)}}(x))}.$$
\end{itemize*}
Those statistics can be evaluated empirically thanks to the sorted sequence of $u_i$'s.
But we will not detail any further those tests, since according to \cite{lecuyer07} they are not 
powerful for random generation testing. 

%todo clustering test see lecuyer 97b
\subsubsection{The gap test\label{gaptest}}
The gap test investigates for special patterns in the sequence $(u_i)_{1\leq i\leq n}$.
We take a subset $[l, u] \subset [0,1]$ and compute the 'gap' variables with
$$G_ i = 
	       	\left\{  
		\begin{array}{cl}
	       	 1 & \txtm{if} l \leq U_i \leq u\\
		 0 & \txtm{otherwise.}\\
		 \end{array}
		 \right.
$$	
The probability $p$ that $G_i$ equals to $1$ is just the $u-l$ (the Lebesgue measure
of the subset). The test computes the length of zero gaps. If we denote by $n_j$ the number
of zero gaps of length $j$. 

The chi-squared statistic of a such test is given by
$$ S = \sum_{j=1}^m \frac{(n_j - n p_j)^2}{n p_j},
$$	
where $p_j=(1-p)^2 p^j$ is the probability that the length of gaps equals 
to $j$; and $m$ the max number of lengths. In theory $m$ equals to $+\infty$,
but in pratice, it is a large integer.
We fix $m$ to be at least
$$\left\lfloor \frac{ \log( 10^{-1} ) - 2\log(1- p)-log(n) }{ \log( p )} \right\rfloor,$$
in order to have lengths whose appearance probabilitie is at least $0.1$.

\subsubsection{The order test\label{ordertest}}
The order test looks for another kind of patterns. We test a $d$-tuple, if its components are ordered
equiprobably. For example with $d=3$, we should have an equal number of
vectors $(u_i, u_{i+1}, u_{i+2})_i$ such that 
\begin{itemize*}
\item$ u_i< u_{i+1} <u_{i+2}$, 
\item$ u_i< u_{i+2} <u_{i+1}$,
\item $ u_{i+1}< u_{i} <u_{i+2}$, 
\item $ u_{i+1}< u_{i+2} <u_{i}$, 
\item $ u_{i+2}< u_{i} <u_{i+1}$ 
\item and $ u_{i+1}< u_{i+2} <u_{i}$.
\end{itemize*}
For some $d$, we have $d!$ possible orderings of coordinates, which have the same probability
to appear $\frac{1}{d!}$. The chi-squared statistic for the order test for a sequence $(u_i)_{1\leq i\leq n}$ is just
 $$ S = \sum_{j=1}^{d!} \frac{(n_j - m\frac{1}{d!})^2}{m \frac{1}{d!}},
$$	
where $n_j$'s are the counts for different orders and $ m=\frac{n}{d}$. Computing
$d!$ possible orderings has an exponential cost, so in practive $d$ is small.

\subsubsection{The frequency test\label{freqtest}}
The frequency test works on a serie of ordered contiguous integers 
($J = [i_{1},\dots,i_{l}] \cap \mathbb Z$). If we denote by $(n_i)_{1\leq i\leq n}$ the sample
number of the set $I$, the expected number of integers equals to $j\in J$ is
$$
\frac{1}{i_l - i_1+1}\times n,
$$
which is independent of $j$. From this, we can compute a chi-squared statistic
$$
S = \sum_{j=1}^l \frac{(\card(n_i=i_j) - m)^2}{m},
$$
where $ m=\frac{n}{d}$.

%todo run test

\subsection{Tests based on multiple sequences}
Under the i.i.d. hypothesis, a vector of output values $u_i, \dots, u_{i+t-1}$
is uniformly distributed over the unit hypercube $[0,1]^t$. Tests based on 
multiple sequences partition the unit hypercube into cells and compare
the number of points in each cell with the expected number.

\subsubsection{The serial test\label{serialtest}}
The most intuitive way to split the unit hypercube $[0,1]^t$ into $k=d^t$
subcubes. It is achieved by splitting each dimension into $d>1$ pieces.
The volume (i.e. a probability) of each cell is just $\frac{1}{k}$.

The associated chi-square statistic is defined as
$$ S = \sum_{j=1}^m \frac{(N_j - \lambda)^2}{\lambda},
$$	
where $N_j$ denotes the counts and $\lambda=\frac{n}{k}$
their expectation.


%todo serial test

\subsubsection{The collision test\label{colltest}}
The philosophy is still the same: we want to detect some pathological
behavior on the unit hypercube $[0,1]^t$. A collision is defined as
when a point $v_i=(u_i, \dots, u_{i+t-1})$ falls in a cell where there are
already points $v_j$'s. Let us note $C$ the number of collisions

The distribution of collision number $C$ is given by 
$$
P(C = c) = \prod_{i=0}^{n-c-1}\frac{k-i}{k} \frac{1}{k^c} \,_2S_n^{n-c}, 
$$
where $\,_2S_n^k$ denotes the Stirling number of the second kind\footnote{they are 
defined by $\,_2S_n^k = k \times \,_2S_{n-1}^k + \,_2S_{n-1}^{k-1}$ and $\,_2S_n^1 = \,_2S_n^n = 1$. For example
go to wikipedia.} and $c=0,\dots,n-1$.

But we cannot use this formula for large $n$ since the Stirling number
need $O(n\log(n))$ time to be computed. As \cite{lecuyer02} we use
a Gaussian approximation if $\lambda=\frac{n}{k}>\frac{1}{32}$ and $n\geq 2^8$,
a Poisson approximation if $\lambda < \frac{1}{32}$ and the exact formula
otherwise.

The normal approximation assumes $C$ follows 
a normal distribution with mean $m=n-k+k\left(\frac{k-1}{k} \right)^n$
and variance very complex (see \cite{lecuyer07}). Whereas the Poisson approximation
assumes $C$ follows a Poisson distribution of parameter
$\frac{n^2}{2k}$.


\subsubsection{The $\phi$-divergence test}
There exist generalizations of these tests where we take a 
function of counts $N_j$, which we called $\phi$-divergence test.
Let $f$ be a real valued function. The test statistic is given by
$$
\sum_{j=0}^{k-1} f(N_j).
$$
We retrieve the collision test with $f(x) = (x-1)_+$ and
the serial test with $f(x)=\frac{(x-\lambda)^2}{\lambda}$. Plenty
of statistics can be derived, for example if we want
to test the number of cells with at least $b$ points, $f(x)=\ind_{(x=b)}$.
For other statistics, see \cite{lecuyer02}.

\subsubsection{The poker test\label{pokertest}}
The poker test is a test where cells of the unit cube $[0,1]^t$ do not have
the same volume. If we split the unit cube into $d^t$ cells, then 
by regrouping cells with left hand corner having the same number
of distinct coordinates we get the poker test. In a more intuitive way, let us
consider a hand of $k$ cards from $k$ different cards. The probability
to have exactly $c$ different cards is
$$
P(C=c) = \frac{1}{k^k} \frac{k!}{(k-c)!} \,_2S_{k}^c,
$$
where $C$ is the random number of different cards and $\,_2S_{n}^d$
the second-kind Stirling numbers. For a demonstration, go to \cite{knuth02}.

\section{Description of RNG test functions\label{rngtestfunc}}
In this section, we will give usage examples of RNG test functions, in a similar way
as section \ref{rngfunc} illustrates section \ref{rng} - two first sub-sections. The last sub-section 
focuses on detecting a particular RNG.

%todo add comparison of mean, standard deviation kurtosis... from http://www.puc-rio.br/macro.ind/quasi_mc.html

%%% R code
<<hist, fig=FALSE, echo=TRUE, eval=FALSE>>=
par(mfrow = c(1,2))
hist(SFMT(10^3), 100)
hist(torus(10^3), 100)
@
\begin{figure}[!htb]
\begin{center}
<<hist, fig=TRUE, echo=FALSE, width=10, height=5, eps=FALSE>>=
par(mfrow = c(1,2))
hist(SFMT(10^3), 100)
hist(torus(10^3), 100)
@
\end{center}
\end{figure} 
%%%

\subsection{Test on one sequence of $n$ numbers}
Goodness of Fit tests are already implemented in \soft{R} with the function
\code{ks.test} for Kolmogorov-Smirnov test and in package \code{adk} for
Anderson-Darling test. 
In the following, we will focus on one-sequence test implemented in \code{randtoolbox}.

\subsubsection{The gap test}
The function \code{gap.test} implements the gap test as described in
sub-section \ref{gaptest}. By default, lower and upper bound are $l=0$
and $u=0.5$, just as below.

%%% R code
<<gap1, fig=FALSE, echo=TRUE, eval=FALSE>>=
gap.test(runif(1000))
@
<<gap2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- gap.test(runif(1000), echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	
	cat("\n\t Gap test\n")
	cat("\nchisq stat = ", stat, ", df = ",df, "\n, p-value = ", pvalue, "\n", sep="")
	cat("\n(sample size : ",1000,")\n\n", sep="")
       cat("length observed freq theoretical freq\n")
       for(i in 1:(df+1))
                cat(i,"\t", obsnum[i],"\t", expnum[i],"\n")
@
%%%
If you want $l=1/3$ and $u=2/3$ with a SFMT sequence, you just type
%%% R code
<<gap3, fig=FALSE, echo=TRUE, eval=FALSE>>=
gap.test(SFMT(1000), 1/3, 2/3)
@
%%%

\subsubsection{The order test}
The Order test is implemented in function \code{order.test} for $d$-uples
when $d=2,3,4,5$. A typical call is as following
%%% R code
<<order1, fig=FALSE, echo=TRUE, eval=FALSE>>=
order.test(runif(4000), d=4)
@
<<order2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- order.test(runif(4000), d=4, echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	
	cat("\n\t Order test\n")
         cat("\nchisq stat = ", stat, ", df = ",df, "\n, p-value = ", pvalue, "\n", sep="")
         cat("\n (sample size : ", 1000,")\n\n", sep="")
         cat("observed number ",obsnum[1:6],"\n",obsnum[7:18],"\n", obsnum[19:24],"\n")
         cat("expected number ",expnum,"\n")  
@
%%%

Let us notice that the sample length must be a multiple of dimension $d$, see
sub-section \ref{ordertest}.

\subsubsection{The frequency test}

The frequency test described in sub-section \ref{freqtest}  is just a
basic equi-distribution test in $[0,1]$ of the generator. We use a sequence
integer to partition the unit interval and test counts in each sub-interval.

%%% R code
<<freq, fig=FALSE, echo=TRUE, eval=FALSE>>=
freq.test(runif(1000), 1:4)
@
<<freq2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- freq.test(runif(1000), 1:4, echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	         
         cat("\n\t Frequency test\n")
         cat("\nchisq stat = ", stat, ", df = ",df, "\n, p-value = ", pvalue, "\n", sep="")
         cat("\n (sample size : ",1000,")\n\n", sep="")
         cat("observed number ",obsnum,"\n")
         cat("expected number ",expnum,"\n")    
@
%%%

\subsection{Tests based on multiple sequences}
Let us study the serial test, the collision test and
the poker test.

\subsubsection{The serial test}
Defined in sub-section \ref{serialtest}, the serial test focuses
on the equidistribution of random numbers in the unit hypercube $[0,1]^t$.
We split each dimension of the unit cube in $d$ equal pieces. 
Currently in function \code{serial.test}, we implement $t=2$ and $d$
fixed by the user.

%%% R code
<<serial, fig=FALSE, echo=TRUE, eval=FALSE>>=
serial.test(runif(3000), 3)
@
<<serial2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- serial.test(runif(3000), 3, echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	         
         cat("\n\t Serial test\n")
         cat("\nchisq stat = ", stat, ", df = ",df, "\n, p-value = ", pvalue, "\n", sep="")
         cat("\n (sample size : ",3000,")\n\n", sep="")
         cat("observed number ",obsnum[1:4],"\n", obsnum[5:9])
         cat("expected number ",expnum,"\n")    
@
%%%

In newer version, we will add an argument $t$ for the dimension.

\subsubsection{The collision test}
The exact distribution of collision number costs a lot of time when
sample size and cell number are large (see sub-section \ref{colltest}).
With function \code{coll.test}, we do not yet implement the normal
approximation. 

The following example tests Mersenne-Twister algorithm (default in R)
and parameters implying the use of the exact distribution (i.e. $n<2^8$ and
$\lambda>1/32$).
%%% R code
<<coll1, fig=FALSE, echo=TRUE, eval=FALSE>>=
coll.test(runif, 2^7, 2^10, 1)
@
<<coll2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- coll.test(runif, 2^7, 2^10, 1, echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	         
         cat("\n\t Collision test\n")
         cat("\nchisq stat = ", stat, ", df = ", df, "\n, p-value = ", pvalue, "\n", sep="")
         cat("\n exact distribution \n(sample number : ", 1000,"/sample size : ", 128,"\n / cell number : ", 1024,")\n\n", sep="")
         cat("collision observed expected\n", "number    count    count\n", sep="")
         for(i in 1:(df + 1) )
              cat(" ", i,"    ", obsnum[i],"    ", expnum[i],"\n")
@
%%%

When the cell number is far greater than the sample length, we use 
the Poisson approximation (i.e. $\lambda<1/32$). For example with \code{congruRand}
generator we have
%%% R code
<<coll3, fig=FALSE, echo=TRUE, eval=FALSE>>=
coll.test(congruRand, 2^8, 2^14, 1)
@
<<coll4, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- coll.test(congruRand, 2^8, 2^14, 1, echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	         
         cat("\n\t Collision test\n")
         cat("\nchisq stat = ", stat, ", df = ", df, "\n, p-value = ", pvalue, "\n", sep="")
         cat("\n Poisson approximation \n(sample number : ", 1000,"/sample size : ", 256,"\n / cell number : ", 16384,")\n\n", sep="")
         cat("collision observed expected\n", "number    count    count\n", sep="")
         for(i in 1:(df + 1) )
              cat(" ", i-1,"    ", obsnum[i],"    ", expnum[i],"\n")
@
%%%
Note that the Normal approximation is not yet implemented and those two approximations are not valid when some expected collision numbers are below 5. 

\subsubsection{The poker test}
Finally the function \code{poker.test} implements
the poker test as described in sub-section \ref{pokertest}.
We implement for any ``card number'' denoted by $k$.
A typical example follows

%%% R code
<<poker, fig=FALSE, echo=TRUE, eval=FALSE>>=
poker.test(SFMT(10000))
@
<<poker2, fig=FALSE, echo=FALSE>>=
options( width =40)
res <- poker.test(SFMT(10000), echo=FALSE)
	stat <- res$statistic
	pvalue <- res$p.value
	df <- res$parameter
	obsnum <- res$observed
	expnum <- res$expected
	         
            cat("\n\t Poker test\n")
            cat("\nchisq stat = ", stat, ", df = ", df, "\n, p-value = ", pvalue, "\n", sep="")
            cat("\n (sample size : ", 10000,")\n\n", sep="")
            cat("observed number ", obsnum,"\n")
            cat("expected number ", expnum,"\n") 
@
%%%


\subsection{Hardness of detecting a difference from truly random numbers}

Random number generators typically have an internal memory of fixed size,
whose content is called the internal state. Since the number of possible
internal states is finite, the output sequence is periodic. The length
of this period is an important parametr of the random number generator.
For example, Mersenne-Twister generator, which is the default in R, has
its internal state stored in $624$ unsigned integers of $32$ bits each.
So, the internal state consists of $19968$ bits, but only $19937$ are used.
The period length is $2^{19937}-1$, which is a Mersenne prime.


Large period is not the only important parameter of a generator. For a good
generator, it is not computationally easy to distinguish the output of
the generator from truly random numbers, if the seed or the index in the
sequence is not known. Generators, which are good from this point of view,
are used for cryptographic purposes. These generators are chosen so that
there is no known procedure, which could distinguish their output from
truly random numbers within practically available computation time. For
simulations, this requirement is usually relaxed. 

However, even for
simulation purposes, considering the hardness of distinguishing the
generated numbers from truly random ones is a good measure of the quality
of the generator. In particular, the well-known empirical tests of random
number generators such as Diehard\footnote{The Marsaglia Random Number CDROM
including the Diehard Battery of Tests of Randomness, Research Sponsored by
the National Science Foundation (Grants DMS-8807976 and DMS-9206972),
copyright 1995 George Marsaglia.}
or TestU01 \cite{lecuyer07} are based on comparing statistics computed for
the generator with their values expected for truly random numbers.
Consequently, if a generator fails an empirical test, then the output of
the test provides a way to distinguish the generator from the truly random
numbers.

Besides of general purpose empirical tests constructed without the knowledge
of a concrete generator, there are tests specific to a given generator, which
allow to distinguish this particular generator from truly random numbers.

An example of a generator, whose output may easily be distinguished from truly 
random numbers, is the older version of Wichmann-Hill from 1982. For this
generator, we can even predict the next number in the sequence, if we know the
last already generated one. Verifying such a predicition is easy and it is,
of course, not valid for truly random numbers. Hence, we can easily distinguish
the output of the generator from truly random numbers. An implementation
of this test in R derived from \cite{mccullough08} is as follows.

%%% R code
<<wh1, fig=FALSE, echo=TRUE, eval=TRUE>>=
wh.predict <- function(x)
{
    M1 <- 30269
    M2 <- 30307
    M3 <- 30323
    y <- round(M1*M2*M3*x)
    s1 <- y %% M1
    s2 <- y %% M2
    s3 <- y %% M3
    s1 <- (171*26478*s1) %% M1
    s2 <- (172*26070*s2) %% M2
    s3 <- (170*8037*s3) %% M3
    (s1/M1 + s2/M2 + s3/M3) %% 1
}

RNGkind("Wichmann-Hill")
xnew <- runif(1)
err <- 0
for (i in 1:1000) 
{
 xold <- xnew
 xnew <- runif(1)
 err <- max(err, abs(wh.predict(xold)-xnew))
}
print(err)
@
%%%


The printed error is $0$ on some machines and less than $5 \cdot 10^{-16}$ on
other machines. This is clearly different from the error obtained for truly random
numbers, which is close to $1$.

The requirement that the output of a random number generator should not
be distinguishable from the truly random numbers by a simple computation,
is also directly related to the way, how a generator is used. Typically, we
use the generated numbers as an input to a computation and we expect that
the distribution of the output (for different seeds or for different
starting indices in the sequence) is the same as if the input are truly
random numbers. A failure of this assumption implies that observing the
output of the computation allows to distinguish the output from the generator
from truly random numbers. Hence, we want to use a generator, for which
we may expect that the calculations used in the intended application cannot
distinguish its output from truly random numbers.

In this context, it has to be noted that many of the currently used generators
for simulations can be distinguished from truly random numbers using the
arithmetic mod 2 applied to individual bits of the output numbers. This is true
for Mersenne Twister, SFMT and also all WELL generators. The basis for tolerating
this is based on two facts. 

First, the arithmetic mod 2 and extracting individual
bits of real numbers is not directly used in typical simulation problems and
real valued functions, which represent these operations are extremely discontinuous
and such functions also do not typically occur in simulation problems. Another
reason is that we need to observe quite a long history of the output to detect
the difference from true randomness. For example, for Mersenne Twister, we
need $624$ consecutive numbers.

On the other hand, if we use a cryptographically strong pseudorandom number
generator, we may avoid a difference from truly random numbers under any known
efficient procedure. Such generators are typically slower than Mersenne
Twister type generators. The factor of slow down may be, for example, 5.
If the simulation problem requires intensive computation besides the
generating random numbers, using slower, but better, generator may imply
only negligible slow down of the computation as a whole.


\section{Calling the functions from other packages}
In this section, we briefly present what to do if you want to use
this package in your package. This section is mainly taken
from package \code{expm} available on \soft{R-forge}.

Package authors can use facilities from \pkg{randtoolbox} in two ways:
\begin{itemize}
\item call the \soft{R} level functions (e.g. \code{torus}) in
  \soft{R} code;
\item if random number generators are needed in \soft{C},
  call the routine \code{torus},\dots
\end{itemize}

Using \soft{R} level functions in a package simply
requires the following two import directives:
\begin{verbatim}
Imports: randtoolbox
\end{verbatim}
in file \code{DESCRIPTION} and
\begin{verbatim}
import(randtoolbox)
\end{verbatim}
in file \code{NAMESPACE}.

Accessing \soft{C} level routines further requires to prototype
the function name and to retrieve its pointer in the package initialization
function \code{R\_init\_\textit{pkg}}, where \code{\textit{pkg}} is
the name of the package.

For example if you want to use \code{torus} \soft{C} function, you need
\begin{verbatim}
void (*torus)(double *u, int nb, int dim, 
int *prime, int ismixed, int usetime);

void R_init_pkg(DllInfo *dll)
{
	torus = (void (*) (double, int, int, 
	int, int, int)) \
		R_GetCCallable("randtoolbox", "torus");
}
\end{verbatim}
See file \code{randtoolbox.h} to find headers of RNGs.
Examples of $C$ calls to other functions can be found in this package 
with the WELL RNG functions.

The definitive reference for these matters remains the \emph{Writing R
Extensions} manual, page 20 in sub-section ``specifying imports
exports'' and page 64 in sub-section ``registering native routines''.


\newpage
\bibliographystyle{agsm}
\bibliography{randtoolbox}


\end{document}

%%% Local Variables:
%%% mode: latex
%%% TeX-master: t
%%% coding: utf-8
%%% End: