--- title: Graphical Independence Networks subtitle: A vignette for the gRain package author: Søren Højsgaard output: bookdown::html_document2: base_format: rmarkdown::html_vignette toc: true number_sections: true pdf_document: keep_tex: true toc_depth: 3 number_sections: true toc: true vignette: > %\VignetteIndexEntry{Graphical Independence Networks} %\VignetteKeyword{Graphical Models} %\VignetteKeyword{Bayesian networks} %\VignetteKeyword{Graphical models} %\VignettePackage{gRain} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8} bibliography: grain.bib --- \tableofcontents ```{r include=FALSE,echo=FALSE,warning=FALSE} dir.create("figures") knitr::opts_chunk$set(fig.height=3, fig.width=5, fig.path='figures/grain-', warning=FALSE, message=FALSE ) options("prompt"="> ","width"=85, "digits"=4) library(gRain) library(igraph) ``` ```{r echo=FALSE} require(gRain) prettyVersion <- packageDescription("gRain")$Version prettyDate <- format(Sys.Date()) ``` # Bayesian networks ## Introduction The gRain package implements Bayesian Networks (hereafter often abbreviated BNs). The name gRain is an acronym for [gra]phical [i]ndependence [n]etworks. The main reference for gRain is @hoj:12, see also `citation("gRain")`. Moreover, @hoj:edw:lau:12 gives a broad treatment of graphical models (including Bayesian networks) More information about the package, other graphical modelling packages and development versions is available from \begin{quote} (http://people.math.aau.dk/~sorenh/software/gR) \end{quote} ### Example: Chest clinic {#sec:chest-clinic} ```{r echo=F, results='hide'} yn <- c("yes","no") a <- cpt(~asia, values=c(1,99),levels=yn) t.a <- cpt(~tub|asia, values=c(5,95,1,99),levels=yn) s <- cpt(~smoke, values=c(5,5), levels=yn) l.s <- cpt(~lung|smoke, values=c(1,9,1,99), levels=yn) b.s <- cpt(~bronc|smoke, values=c(6,4,3,7), levels=yn) e.lt <- cpt(~either|lung:tub,values=c(1,0,1,0,1,0,0,1),levels=yn) x.e <- cpt(~xray|either, values=c(98,2,5,95), levels=yn) d.be <- cpt(~dysp|bronc:either, values=c(9,1,7,3,8,2,1,9), levels=yn) cpt_list <- list(a, t.a, s, l.s, b.s, e.lt, x.e, d.be) plist <- compile_cpt(cpt_list) plist chest_bn <- grain(plist, compile=FALSE) chest_bn ``` This section reviews the chest clinic example of @lau/spieg:88 (illustrated in Figure \@ref(fig:chest-LS)) and shows one way of specifying the model in gRain. @lau/spieg:88 motivate the chest clinic example with the following narrative: *``Shortness--of--breath (dyspnoea) may be due to tuberculosis, lung cancer or bronchitis, or none of them, or more than one of them. A recent visit to Asia increases the chances of tuberculosis, while smoking is known to be a risk factor for both lung cancer and bronchitis. The results of a single chest X--ray do not discriminate between lung cancer and tuberculosis, as neither does the presence or absence of dyspnoea.''* ```{r} chest_dag <- dag(list("asia", c("tub", "asia"), "smoke", c("lung", "smoke"), c("bronc", "smoke"), c("either", "tub", "lung"), c("xray", "either"), c("dysp", "bronc", "either"))) ``` ```{r chest-LS, echo=F, fig.height=3, fig.cap="Chest clinic example from Lauritzen and Spiegelhalter (1988)."} par(mar=c(0,0,0,0)) plot(chest_bn) ``` ## Building a Bayesian network The description above involves the following binary variables: $a=\mbox{asia}$, $s=\mbox{smoker}$, $t=\mbox{tuberculosis}$, $l=\mbox{lung cancer}$, $b=\mbox{bronchitis}$, $e=\mbox{either tuberculosis or lung cancer}$, $d=\mbox{dyspnoea}$ and $x=\mbox{xray}$. Each variable is binary and can take the values "yes" and "no": Note that $e$ is a logical variable which is true (yes) if either $t$ or $l$ are true (yes) and false (no) otherwise. The connection between the variables is displayed by the DAG (directed acyclic graph) in Figure~\@ref(fig:chest-LS). A joint probability density factorizing according to a DAG with nodes $V$ can be constructed as follows: Each node $v\in V$ has a set $pa(v)$ of parents and each node $v\in V$ has a finite set of states. A joint distribution over the variables $V$ can be given as a product of conditional distributions \begin{align} (\#eq:dagfact1) p(V) = \prod_{v\in V} p(v|pa(v)) \end{align} where $p(v|pa(v))$ is a function defined on $(v,pa(v))$. This function satisfies that $$ \sum_{v^*} p(v=v^*|pa(v))=1, $$ i.e.\ that for each configuration of the parents $pa(v)$, the sum over the levels of $v$ equals one. Hence $p(v|pa(v))$ becomes the conditional distribution of $v$ given $pa(v)$. In practice $p(v|pa(v))$ is specified as a table called a conditional probability table or a CPT for short. Thus, a Bayesian network can be regarded as a complex stochastic model built up by putting together simple components (conditional probability distributions). A joint probability density for all eight variables in Figure~\@ref(fig:chest-LS) can be constructed as \begin{align} (\#eq:chestfact1) p(V) = p(a)p(s)p(t|a)p(l|s)p(b|s)p(e|t,l) p(d|e, b)p(x|e). \end{align} ## Queries to networks Suppose we are given the evidence (sometimes also called "finding") that a set of variables $E\subset V$ have a specific value $e^*$. With this evidence, we are often interested in the conditional distribution $p(v|E=e^*)$ for some of the variables $v \in V \setminus E$ or in $p(U|E=e^*)$ for a set $U\subset V \setminus E$. Interest might also be in calculating the probability of a specific event, e.g.\ the probability of seeing a specific evidence, i.e.\ $p(E=e^*)$. Other types of evidence (called soft evidence, virtual evidence or likelihood evidence) are discussed in Section \@ref(sec:hard-soft). For example that a person has recently visited Asia and suffers from dyspnoea, i.e.\ $a=\mbox{yes}$ and $d=\mbox{yes}$. In the chest clinic example, interest might be in $p(l|e^*)$, $p(t|e^*)$ and $p(b|e^*)$, or possibly in the joint (conditional) distribution $p(l,t,b|e^*)$. ## A one--minute version of gRain ### Specifying a Bayesian network A simple way of specifying the model for the chest clinic example is as follows. Specify conditional probability tables (with values as given in @lau/spieg:88) (there are other ways of specifying conditional probability tables, see the package documentation): ```{r } yn <- c("yes","no") a <- cpt(~asia, values=c(1, 99),levels=yn) t.a <- cpt(~tub|asia, values=c(5, 95, 1, 99),levels=yn) s <- cpt(~smoke, values=c(5, 5), levels=yn) l.s <- cpt(~lung|smoke, values=c(1, 9, 1, 99), levels=yn) b.s <- cpt(~bronc|smoke, values=c(6, 4, 3, 7), levels=yn) e.lt <- cpt(~either|lung:tub,values=c(1, 0, 1, 0, 1, 0, 0, 1),levels=yn) x.e <- cpt(~xray|either, values=c(98, 2, 5, 95), levels=yn) d.be <- cpt(~dysp|bronc:either, values=c(9, 1, 7, 3, 8, 2, 1, 9), levels=yn) ``` Compile list of conditional probability tables. ```{r } chest_cpt <- compile_cpt(a, t.a, s, l.s, b.s, e.lt, x.e, d.be) chest_cpt ``` The components are arrays, but coercion into dataframes sometimes makes it easier to digest the components. ```{r } chest_cpt$tub chest_cpt$tub |> as.data.frame.table() ``` Create the network: ```{r } chest_bn <- grain(chest_cpt) chest_bn ``` Default is that the network is compiled at creation time, but if one chooses not to do so, compilation can be done with: ```{r } chest_bn <- compile(chest_bn) ``` ### Setting evidence and querying a network {#sec:querying-network} In the chest clinic example, there are three disease variables, two background variables and two symptoms. Following the narrative, we can set evidence that a person has recently visited Asia and has dyspnoea: ```{r} disease <- c("tub", "lung", "bronc") asia_dysp <- list(asia="yes", dysp="yes") ``` Initially, the network can be queried without evidence: ```{r } chest_bn |> querygrain(nodes=disease, type="marginal") chest_bn |> querygrain(nodes=disease, type="marginal", simplify = TRUE) chest_bn |> querygrain(nodes=disease, type="joint") chest_bn |> querygrain(nodes=disease, type="joint", simplify = TRUE) chest_bn |> querygrain(nodes=disease, type="conditional") chest_bn |> querygrain(nodes=disease, type="conditional", simplify = TRUE) ``` Above we obtain the marginal, joint distributions and conditional distribution for the disease variables. The output can in some cases be simplified to dataframes. For the conditional distribution, we obtain the conditional distribution of the first node given the rest of the nodes. ## Setting evidence Evidence can be entered in different ways: ```{r} asia_dysp <- list(asia="yes", dysp="yes") chest_ev <- chest_bn |> evidence_add(evidence=asia_dysp) ``` The evidence is a list and can conveniently be displayed as a dataframe: ```{r } chest_ev |> evidence_get() ``` ```{r } chest_ev |> querygrain(nodes=disease, simplify = TRUE) chest_ev |> evidence_prob() ``` The network can be queried again, and we can also obtain the probability of observing this evidence: ```{r } chest_ev |> querygrain(nodes=disease, simplify=TRUE) chest_ev |> evidence_prob() ``` ```{r } chest_bn |> evidence_prob(evidence=list(asia="yes", dysp="yes")) ``` The probability of observing a specific evidence can be found by setting the evidence as a vector of weights. This is useful in connection with soft evidence (also called likelihood evidence), see Section~\@ref(sec:hard-soft). ## Hints and shortcuts {#sec:small-shortcuts} An alternative way of specifying a network is by first defining CPTs and then entering values afterwards. Programmatically, this can be done as: ```{r } yn <- c("yes", "no") node_parents_list <- list("asia", c("tub", "asia"), "smoke", c("lung", "smoke"), c("bronc", "smoke"), c("either", "tub", "lung"), c("xray", "either"), c("dysp", "bronc", "either")) chest_dummy_cpt2 <- lapply(node_parents_list, function(f){ cpt(f, levels=yn) }) bn_temp <- compile_cpt(chest_dummy_cpt2) |> grain() ``` Above the network has ones in all potentials. Next update values in (some of the) potentials: ```{r} cpt_values <- list(asia=c(1, 99), tub=c(5, 95, 1, 99), smoke=c(5, 5), lung=c(1, 9, 1, 99), bronc=c(6, 4, 3, 7), either=c(1, 0, 1, 0, 1, 0, 0, 1), xray=c(98, 2, 5, 95), dysp=c(9, 1, 7, 3, 8, 2, 1, 9)) bn_real <- replace_cpt(bn_temp, cpt_values) ``` ```{r} bn_temp |> querygrain(evi=asia_dysp, nodes=disease, simplify = TRUE) bn_real |> querygrain(evi=asia_dysp, nodes=disease, simplify = TRUE) ``` Consider querying a network where focus is on marginal distributions (the default). If all variables have the same levels (as the case is here), the output can be coerced to a dataframe: ```{r } querygrain(chest_bn, nodes=disease, simplify = TRUE) ``` In the more general case the output can be coerced to a list of dataframes ```{r } querygrain(chest_bn, nodes=disease, result="data.frame") ``` A typical use of gRain involves setting evidence and then querying the network afterwards. This can all be done in one call of `querygrain()` (notice that this does not alter the network object): ```{r } chest_bn |> querygrain(evidence=asia_dysp, nodes=disease, simplify = TRUE) ``` Evidence can be also be given as a vector of weights. ```{r } chest_bn |> querygrain(evidence=list(asia=c(1, 0), dysp=c(1, 0)), nodes=disease, simplify = TRUE) ``` The weights must be non-negative but need not sum to one. This is important in connection with soft evidence (also called likelihood evidence), see Section~\@ref(sec:hard-soft). Above, the weights could also have been set as c(.1, 0). The important part is that the zero excludes certain states as being impossible. Nodes on which evidence is given are not reported unless `exclude=FALSE` ```{r } querygrain(chest_bn, evidence=list(asia=c(1, 0), dysp=c(1, 0)), nodes=c("lung", "bronc", "asia", "dysp"), exclude=FALSE, simplify = TRUE) ``` If `nodes`are not specified, queries for all nodes without evidence are reported. ```{r } querygrain(chest_bn, evidence=asia_dysp, simplify = TRUE) ``` If `nodes` are not specified and `exclude=FALSE`, then queries for all nodes are reported. ```{r } querygrain(chest_bn, evidence=asia_dysp, exclude = FALSE, simplify = TRUE) ``` ## Conditioning on evidence with zero probability {#sec:zero-probabilities} Consider setting the evidence ```{r } chest_bn3 <- evidence_add(chest_bn, evidence=list(either="no", tub="yes")) ``` Under the model, this specific evidence has zero probability: `either` is true if `tub` is true or `lung` is true (or both). Hence the specific evidence is impossible and therefore, all conditional probabilities are (under the model) undefined: ```{r } evidence_prob(chest_bn3) querygrain(chest_bn3, nodes=disease, type="joint") ``` ## Hard and soft (likelihood) evidence {#sec:hard-soft} Below we describe how to work with virtual evidence (also known as soft evidence or likelihood evidence) in gRain. This is done via the function evidence_add(). The clique potential representation in a Bayesian network gives \begin{align} p(x) \propto \psi(x) = \prod_{C} \psi_C(x_C). \end{align} Here we recall that the whole idea in computations with Bayesian networks is to avoid calculation the product on the right hand side above. Instead computations are based on propagation (multiplying, dividing and summing clique potentials $\psi_C$ in an appropriate order, and such an appropriate order comes from a junction tree). The normalizing constant, say $c=\sum_x \psi(x)$, comes out of propagation as a "by-product". Suppose a set of nodes $E$ are known to have a specific value, i.e. $x_E=x^*_E$. This is called hard evidence. The probability of the event $x_E=x^*_E$ is \begin{align} p(x_E=x^*_E)=E_p\{I(x_E=x^*_E)\} = \sum_x I(x_E=x^*_E) p(x) = \frac{1}{c} \sum_x I(x_E=x^*_E) \psi(x) \end{align} The computations are based on modifying the clique potentials $\psi_C$ by giving value zero to states in $\psi_C$ which are not consistent with $x_E=x^*_E$. This can be achieved with an indicator function, say $L_C(x_C)$ such that we obtain a set of new potentials $\tilde \psi_C(x) = L_C(x_C) \psi_C(x_C)$. Propagation with these new potentials gives, as a by product, $\tilde c=\sum \tilde \psi(x)$ where $\tilde\psi(x)= \prod_C \tilde\psi_C(x_C)$. Consequently, we have $p(x_E=x^*_E)=\tilde c / c$. In a more general setting we may have non--negative weights $L(x)$ for each value of $x$. We may calculate \begin{align} E_p\{L(X)\} = \sum_x L(x)p(x) \end{align} If $L(X)$ factorizes as $L(X)= \prod_C L_C(X_C)$ then the computations are carried out as outlined above, i.e.\ by the message passing scheme. ### Hard evidence {#sec:hard-evidence} Suppose we want to make a diagnosis about tuberculosis given the evidence that a person is a smoker. We call such evidence hard evidence because the state of the variables are known with certainty. The function setEvidence() can be used for this purpose. The following forms are equivalent (the reason will be explained below): ```{r } chest_ev1 <- chest_bn |> evidence_add(list(smoke="yes")) chest_ev2 <- chest_bn |> evidence_add(list(smoke=c(1, 0))) ``` ```{r } chest_bn |> querygrain(nodes=disease, simplify = TRUE) chest_ev1 |> querygrain(nodes=disease, simplify = TRUE) ``` ### Soft evidence (also called virtual evidence or likelihood evidence) {#sec:virt-evid-likel} Suppose we are not certain if a patient is a smoker or not. We shall assume that for patients who are smokers, we would (correctly) guess so in 80\% of the cases, whereas for patients who are not smokers we would (erroneously) guess that they are smokers in 10\% of the cases. In gRain this, situation can be handled in two different ways. One way is to introduce a new variable `smoke_guess` with `smoke` as its only parent and then we enter evidence for this node. ```{r} g.s <- cpt(~ smoke_guess|smoke, levels=yn, values=c(.8, .2, .1, .9)) g.s ``` ```{r} chest_ext <- c(cpt_list, list(g.s)) |> compile_cpt() |> grain() chest_ext |> querygrain(nodes=disease, simplify = TRUE) chest_ext |> querygrain(nodes=disease, evidence=list(smoke="yes"), simplify = TRUE) chest_ext |> querygrain(nodes=disease, evidence=list(smoke_guess="yes"), simplify = TRUE) ``` Another approach is to enter virtual evidence in the original network as shown below. ```{r } chest_ve <- chest_bn |> evidence_add(evidence = list(smoke = c(.8, .1))) chest_ve |> querygrain(nodes=disease, simplify = TRUE) evidence_get(chest_ve) ``` Consequently, specifying the hard evidence that `smoke="yes"` can be done as ```{r } chest_bn |> setEvidence(evidence=list(smoke=c(1, 0))) ``` ## Building a network from data {#sec:using-data} When building a network from data, there are two situations to be distinguished between. 1) The network is known and the data is used to estimate the parameters in the network. 2) The network is unknown and the data is used to estimate the network structure and the parameters in the network. In both cases it is possible to have a network specified as a dag or as an undirected (chordal) graph. The following code illustrates how to build a network from data. The data is simulated data from the chest clinic example. ### Building a network from a known network {#sec:building-known} The following list defines a dag for the chest clinic example: Each component is a list with two elements: the first element is the node and the second element is a vector of parents. ```{r} node_parents_list <- list("asia", c("tub", "asia"), "smoke", c("lung", "smoke"), c("bronc", "smoke"), c("either", "tub", "lung"), c("xray", "either"), c("dysp", "bronc", "either")) g1 <- dag(node_parents_list) par(mar=c(0,0,0,0)) plot(g1) ``` The following list defines an undirected graph for the chest clinic example: Each component is a list with the nodes in the clique. ```{r} cliq <- list(c("xray", "either"), c("asia", "tub"), c("smoke", "lung", "bronc"), c("lung", "either", "tub"), c("lung", "either", "bronc" ), c("bronc", "either", "dysp")) g2 <- ug(cliq) par(mar=c(0,0,0,0)) plot(g2) ``` A netowrk can be built from data using: ```{r} bn1 <- grain(g1, data=gRbase::chestSim100000) bn2 <- grain(g2, data=gRbase::chestSim100000) ``` It is explained in the references that the two networks specify the same joint distribution because internally a dag is converted to an undirected chordal graph which provides the basis for the computations. In practice the two joint distributions differ slightly, and this is because the way in which information is extracted from data, see examples below. ```{r} j1 <- querygrain(bn1, type="joint") j2 <- querygrain(bn2, type="joint") d <- j1 %a-% j2 max(abs(d)) ``` ## Building networks from data {#sec:building-networks} The following two graphs specify the same model stating that A and C are conditionally independent given B. The first graph is a directed acyclic graph (DAG) and the second graph is an undirected graph (UG). ```{r } g1 <- dag(~A:C + B:C, result="igraph") g2 <- ug(~A:C + B:C, result="igraph") par(mfrow=c(1,2), mar=c(0,0,0,0)) plot(g1); plot(g2) ``` Given such a graph, we can build a network from data. Suppose data is as follows: ```{r } tab <- tabNew(~A:B:C, levels=c("+", "-"), values=c(1, 2, 3, 5, 1, 2, 1, 4)) ``` A network can be built from data using: ```{r } bn1 <- grain(g1, data=tab) bn2 <- grain(g2, data=tab) ``` The two networks specify the same joint distribution: ```{r} j1 <- querygrain(bn1, type="joint") j2 <- querygrain(bn2, type="joint") d <- j1 %a-% j2 max(abs(d)) ``` In the process of creating networks, conditional probability tables are extracted when the graph is a dag and clique potentials are extracted when the graph is a chordal (i.e.\ triangulated) undirected graph. This takes place as follows (internally): Neet to estimate $p(A|B)$, $p(C|B)$ and $p(B)$ from the data. This can be done as follows: ```{r} p <- extract_cpt(tab, g1) |> c() p ``` Likewise, we can extract the clique potentials from the data, call these $q(AC)$ and $q(BC)$: ```{r} q <- extract_pot(tab, g2) |> c() q ``` Clique potentials are not uniquely defined, but the product of clique potentials is. In the implementation in gRain, $q(AC)$ is the relative frequency in the $AC$-marginal table, i.e. $p(AB)$. Moreover and $q(BC)$ is the frequency in the $BC$-marginal table divided by the frequency in the $C$-marginal table. Hence, the product of the clique potentials and the product of the cpts is the same as the joint distribution: ```{r} tabProd(p) |> ftable() tabProd(q) |> ftable() ``` ### Handling zeros in the data {#sec:handling-zeros} Next suppose data is as follows ```{r} tab0 <- tab tab0[1:4] <- 0 tab0 ``` In the process of creating networks above, the following quantities are computed: ```{r} n_BC <- tab0 |> tabMarg(~B:C) n_AC <- tab0 |> tabMarg(~A:C) n_C <- tab0 |> tabMarg(~C) p.B_C <- n_BC %a/% n_C p.A_C <- n_AC %a/% n_C p.C <- n_C / sum(n_C) p.C p.B_C p.A_C ``` ```{r } bn01 <- grain(g1, data=tab0) bn02 <- grain(g2, data=tab0) ``` ```{r} p <- extract_cpt(tab0, g1) |> c() q <- extract_pot(tab0, g2) |> c() p q tabProd(p) |> ftable() tabProd(q) |> ftable() ``` ```{r } eps <- 0.01 bn01e <- grain(g1, data=tab0, smooth=eps) bn02e <- grain(g2, data=tab0, smooth=eps) ``` ```{r} j1 <- querygrain(bn01e, type="joint") j2 <- querygrain(bn02e, type="joint") j1 j2 d <- j1 %a-% j2 max(abs(d)) ``` ### Building a network from an unknown network {#sec:building-unknown} This is a more complex task, sometimes called structural learning. The following code illustrates how to build a network from data. The data is simulated data from the chest clinic example. We illustrate two approaches: 1) based on the stepwise algorithm in the gRim package and 2) based on the hill climbing algorithm in the bnlearn package. ```{r,eval=T} dat <- gRbase::chestSim10000 ## Using gRim and stepwise selection sat_model <- gRim::dmod(~.^., data=dat) mm1 <- stepwise(sat_model, criterion="aic", type="decomposable") ##bn1 <- grain(mm1$modelinfo$ug, data=dat) bn1 <- grain(mm1$modelinfo$ug, data=dat, smooth=0.01) ## Using bnlearn and hill climbing sat_graph <- bnlearn::random.graph(names(dat), prob = 1) # complete graph mm2 <- bnlearn::hc(dat, start=sat_graph) bn2 <- bnlearn::as.grain(bnlearn::bn.fit(mm2, dat)) bn2$dag ``` An important detail is that gRim works with decomposable undirected graphical models (see Figure \@ref(fig:bn1)) while bnlearn works with dags (see Figure \@ref(fig:bn2), left). However, even for models specified as dags, all comptational structures are based on a corresponding undirected graph (see Figure \@ref(fig:bn2), left). For details, see the references. ```{r bn1, echo=F, fig.cap="XXXX."} par(mfrow=c(1, 1), mar=c(0,0,0,0)) coords <- layout_(bn1$ug, nicely()) plot(bn1$ug, layout=coords) ``` ```{r bn2, echo=F, fig.cap="XXXX."} par(mfrow=c(1,2), mar=c(0,0,0,0)) new_coords <- function(coords, src, dst) { nms1 <- nodes(src) nms2 <- nodes(dst) coords[match(nms2, nms1),] } plot(bn2$dag, layout=new_coords(coords, bn1$ug, bn2$dag)) plot(bn2$ug, layout=new_coords(coords, bn1$ug, bn2$ug)) ``` ## Extending networks to include other types of variables {#sec:mixture} gRain only handles discrete variables with a finite state space, but using likelihood evidence it is possible to work with networks with both discrete and continuous variables (or other types of variables). This is possible only when he networks have a specific structure. This is possible when no discrete variable has non--discrete parents. Take a simple example: Form a network for variables $x=(x_1, x_2)$. Conceptually augment this network with additional variables $y=(y_1, y_2)$ where $y_1|x_1=k \sim N(\mu_k, v)$ and $y_2|x_2=k \sim Poi(l_k)$ for $k=1,2$. Also we make the assumption that $y_1$ and $y_2$ are independent given $x=(x_1, x_2)$. This gives the DAG below: ```{r } par(mar=c(0,0,0,0)) plot(dag(~y1:x1 + x2:x1 + y2:x2)) ``` A network for $x$ can be constructed as: ```{r } u <- list(x1=yn, x2=yn) x1 <- cpt(~x1, values=c(1, 3), levels=u) x2 <- cpt(~x2|x1, values=c(3, 1, 1, 7), levels=u) bn <- grain(compile_cpt(x1, x2)) querygrain(bn, simplify=TRUE) ``` The augmentation of $y|x$ can go along these lines: The parameters describing $y|x$ are set to be: ```{r } s <- 2 mu <- c(mu1=2, mu2=5) lambda <- c(lambda1=1, lambda2=5) ``` Suppose we observe $y_1 = y_1^*$. Then \begin{align} p(x|y_1= y_1^*)\propto p(x_1)p(x_2|x_1) p(y_1=y_1^*|x_1) = p(x_1)p(x_2|x_1) L_1(x_1) \end{align} where $L_1(x_1)$ denotes the likelihood. In a network setting this corresponds to changing $p(x_1)$ as \begin{align} p(x_1) \leftarrow p(x_1)L_1(x_1) \end{align} and then carry on with propagation. This can be achieved in different ways. The likelihood is: ```{r } y1_obs <- 14 # Observed value for y1_obs lik1 <- dnorm(y1_obs, mean=mu, sd=s) lik1 ``` One is by setting the likelihood as evidence. An alternative is to explicitly modify the CPT which specifies $p(x_1)$: ```{r } querygrain(bn, simplify=TRUE, exclude=FALSE) bn1 <- setEvidence(bn, evidence=list(x1=lik1)) querygrain(bn1, simplify=TRUE, exclude=FALSE) x1_upd <- getgrain(bn, "cptlist")$x1 * lik1 bn2 <- replace_cpt(bn, list(x1=x1_upd)) querygrain(bn2, simplify=TRUE) ``` A final remark: The conditional distribution of $y_1$ is normal, but the unconditional distribution is a mixture of normals. Likewise, the conditional distribution of $y_2$ is poisson, but the unconditional distribution is a mixture of two poisson distributions. Evidence on, say $y_1$ changes the belief in $x_1$ and $x_2$ and this in turn changes the distribution of $y_2$ (evidence changes the mixture weights.) ```{r } nsim <- 10000 xsim_marg <- simulate(bn, nsim, seed=2022) xsim_cond <- simulate(bn1, nsim, seed=2022) y2marg <- rpois(n=nsim, lambda=lambda[xsim_marg$x2]) y2cond <- rpois(n=nsim, lambda=lambda[xsim_cond$x2]) summary(y2marg) summary(y2cond) par(mfrow=c(1,2)) y2marg |> hist(prob=T, ylim=c(0, .4), breaks=20, main="marginal p(y2)") y2cond |> hist(prob=T, ylim=c(0, .4), breaks=20, main="conditional p(y2|y1*)") ``` ## Brute force computations and why they fail {#sec:brute-force-comp} The gRain package makes computations as those outlined above in a very efficient way; please see the references. However, it is in this small example also possible to make the computations directly: We can construct the joint distribution (an array with $2^8=256$ entries) directly as: ```{r } joint <- tabProd(chest_cpt) dim(joint) joint |> as.data.frame.table() |> head() ``` This will clearly fail even moderate size problems: For example, a model with $80$ nodes each with $10$ levels will give a joint state space with $10^{80}$ states; that is about the number of atoms in the universe. Similarly, $265$ binary variables will result in a joint state space of about the same size. Yet, gRain has been used successfully in models with tens of thousand variables. The ``trick'' in gRain is to make all computations without ever forming the joint distribution. However, we can do all the computations by brute force methods as we will illustrate here: Marginal distributions are ```{r } tabMarg(joint, "lung") tabMarg(joint, "bronc") ``` Conditioning on evidence can be done in different ways: The conditional density is a $6$--way slice of the original $8$--way joint distribution: ```{r } asia_dysp cond1 <- tabSlice(joint, slice=asia_dysp) cond1 <- cond1 / sum(cond1) dim(cond1) tabMarg(cond1, "lung") tabMarg(cond1, "bronc") ``` Alternatively, multiply all entries not consistent by zero and all other entries by one and then marginalize: ```{r } cond2 <- tabSliceMult(joint, slice=asia_dysp) cond2 <- cond2 / sum(cond2) dim(cond2) tabMarg(cond2, "lung") tabMarg(cond2, "bronc") ```