--- title: "Bayes_Rule" author: "Peiyuan Zhu" date: "2023-11-02" output: rmarkdown::html_vignette # output: word_document vignette: > %\VignetteIndexEntry{Bayes_Rule} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup, include=FALSE} # devtools::load_all(".") # only used in place of dst when testing with R-devel library(dst) # knitr::opts_chunk$set(echo = TRUE) ``` # Introduction In Mathematical Theory of Evidence Glenn Shafer talked about how Dempster's rule of combination generalizes Bayesian conditioning. In this document we investigate numerically how a simple Bayesian model can be encoded into the language of belief function. Recall the Bayes Rule of conditioning in simple terms: $$P(H|E) = \dfrac{P(H) \cdot P(E|H)} {P(E)}$$ Let's see how this is translated in the belief functions setup. # 1. Simple Bayes Example In particular, the Bayesian belief functions concentrates their masses on the singletons only, unlike more general basic mass assignment functions. For instance, in a frame $\Theta=\{a,b,c\}$, basic mass assignment $m(\{a\})=0.2$, $m(\{b\})=0.3$ and $m(\{c\})=0.5$ defines a Bayesian belief function. In the Bayesian language, this is the prior distribution $P(H)$. Function *bca* is used to set the distribution of *H*. ```{r bpa1 definition, echo = FALSE, warning=FALSE} Theta<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,1), nrow = 4, byrow = TRUE) H <- bca(tt=matrix(c(1,0,0,0,1,0,0,0,1), nrow = 3, byrow = TRUE), m = c(0.2, 0.3, 0.5), cnames = c("a", "b", "c"), idvar = 1) cat("The prior distribution H","\n") bcaPrint(H) # round(belplau(H, h=Theta), digits = 3) ``` The law of conditional probability is a special case of Dempster's rule of combination that all the masses focus on the event is conditioned. For instance, basic mass assignment focuses all the masses on subset $E =\{b,c\}$. Hence, using function *bca*, we set $m(\{b,c\})=1$. ```{r bpa2 definition, echo = FALSE, warning=FALSE} bpa2 <- bca(tt=rbind(diag(x=1, nrow=3), matrix(c(0,1,1,1,1,1), nrow=2, byrow = TRUE)), m = c(0,0,0,1,0), cnames = c("a", "b", "c"), idvar = 1) Event <- addTobca(bpa2, tt = diag(x=1, nrow = 3)) cat("Setting an Event E = {b,c} with mass = 1","\n") bcaPrint(Event) ``` Now we set the computation of Bayes's Theorem in motion. In a first step, we use function *dsrwon* to combine our two basic mass assignments H and Event. The non-normalized Dempster Rule of combination gives a mass distribution *H_Event* composed of two parts: 1. the distribution of the product $P(H) \cdot P(E|H)$ on $\Theta$; 2. a mass allotted to the empty set $m(\varnothing)$. ```{r H_Event Dempster_rule1, echo = FALSE, warning=FALSE} H_Event <- dsrwon(H, bpa2) cat("The combination of H and Event E","\n") bcaPrint(H_Event) ``` It turns out that we can obtain the marginal $P(E)$ from $m(\varnothing)$: $$P(E) = 1 - m(\varnothing)$$. Hence, $P(E)$ is nothing else than the normalization constant of Dempster's rule of combination. In our second step of computation we us function *nzdsr*, to apply the normalization constant to distribution *H_Event*, which gives the posterior distribution $P(H|E)$ ```{r H_Event Dempster_rule2, echo = FALSE, warning=FALSE} H_given_E <- nzdsr(H_Event) cat("The posterior distribution P(H|E)","\n") bcaPrint(H_given_E) ``` Note that *H_given_E* is defined only on singletons and the mass allocated to $\Theta$ is zero. Hence $bel(\cdot) = P(\cdot) = Pl(\cdot)$, as shown by the following table. ```{r H_Event Dempster_rule3, echo = FALSE, warning=FALSE} round(belplau(H_given_E, h=Theta), digits = 3) ``` # 2. Example with two variables In the first example, the conditioning event was a subset of the frame $\Theta$ of variable *H*. We now show the computation of Bayes's rule of conditioning by Dempster's Rule in the case of two variables. Let's say we have the variable H defined on $\Theta = \{a, b, c\}$ as before. ```{r bpa1_copy, echo = FALSE, warning=FALSE} Theta<-matrix(c(1,0,0,0,1,0,0,0,1,1,1,1), nrow = 4, byrow = TRUE) X <- bca(tt=matrix(c(1,0,0,0,1,0,0,0,1), nrow = 3, byrow = TRUE), m = c(0.2, 0.3, 0.5), cnames = c("a", "b", "c"), idvar = 1, varnames = "x") cat("The prior distribution","\n") bcaPrint(X) ``` let's add a second variable E with three outcomes $\Lambda =\{d, e, f\}$ . $P(\{d|a\})=0.1$, $P(\{d|b\})=0.2$ and $P(\{d|c\})=0.7$. This distribution will be encoded in the product space $\Theta \times \Lambda$ by setting $m(\{a,d\}) = 0.1$; $m(\{b,d\}) = 0.2$; $m(\{c,d\}) = 0.7$ We now do this using function *bcaRel*. ```{r relation, echo = FALSE, warning=FALSE} # bpa4 <- bca(tt=matrix(c(1,0,0,0,1,0,0,0,1), nrow = 3, byrow = TRUE), m = c(1, 0, 0), cnames = c("d", "e", "f"), idvar = 4, varnames = "y") # bcaPrint(bpa4) # cat("Specify information on variables, description matrix and mass vector","\n") inforvar_EX <- matrix(c(1,4,3,3), ncol = 2, dimnames = list(NULL, c("varnb", "size")) ) cat("Identifying variables and frames","\n") inforvar_EX cat("Note that variables numbers must be in increasing order","\n") # tt_EX <- matrix(c(1,0,0,1,0,0, 0,1,0,1,0,0, 0,0,1,1,0,0, 1,1,1,1,1,1), ncol = 6, byrow = TRUE, dimnames = list(NULL, c("a", "b", "c", "d", "e", "f"))) cat("The description matrix of the relation between X and E","\n") tt_EX cat("Note Columns of matrix must follow variables ordering. ","\n") # spec_EX <- matrix(c(1:4, 0.1, 0.2, 0.7, 0 ), ncol = 2, dimnames = list(NULL, c("specnb", "mass"))) cat("Mass specifications","\n") spec_EX # rel_EX <- bcaRel(tt = tt_EX, spec = spec_EX, infovar = inforvar_EX, varnames = c("x", "y"), relnb = 1) cat("The relation between Evidence E and X","\n") bcaPrint(rel_EX) ``` Now we combine Prior $P(X)$ with rel_EX. But first, we need to extent *X* to the space $\Theta \times \Lambda$. ```{r X_xtnd, echo = FALSE, warning=FALSE} X_xtnd <- extmin(X, relRef = rel_EX) cat("Prior X extended in product space of (X,E","\n") bcaPrint(X_xtnd) ``` Combine X extended and E_X in the product space $\Theta \times \Lambda$. ```{r relation2, echo = FALSE, warning=FALSE} comb_X_EX <- dsrwon(X_xtnd, rel_EX) cat("Mass distribution of the combination of X extended and E_X","\n") bcaPrint(comb_X_EX) ``` As we can see, we have 1. the distribution of the product $P(H) \cdot P(E|H)$ on $\Theta \times \Lambda$; 2. a mass allotted to the empty set $m(\varnothing)$, which is $1 - P(E)$. Using function *nzdsr*, we apply the normalization constant to obtain the desired result. Then, using function *elim*, we obtain the marginal of X, which turns out to be $P(X | E = d)$ ```{r relation3, echo = FALSE, warning=FALSE} norm_comb_X_EX <- nzdsr(comb_X_EX) cat("The normalized mass distribution of the combination of X extended and E_X","\n") bcaPrint(norm_comb_X_EX) dist_XgE <- elim(norm_comb_X_EX, xnb = 4) cat("The posterior distribution P(X|E) for (a,d), (b,d), (c,d), after eliminating variable E","\n") bcaPrint(dist_XgE) ```