## Pages

### Bayesian Network in R

A Bayesian Network (BN) is a probabilistic model based on directed a cyclic graphs that describe a set of variables and their conditional dependencies to each other. It is a graphical model, and we can easily check the conditional dependencies of the variables and their directions in a graph.

`> library(bnlearn)`

The bnlearn package implements the following learning algorithms.
• Constrain-based: PC, GS, IAMB, MMPC, Hilton-PC
• Score-based: Hill-Climbing, Tabu Search
• Pairwise: ARACNE, Chow-Liu
• Hybrid: MMHC, RSMAX2

We use score based learning algorithm, Hill-Climbing with hc() function. First, we'll generate simple dataset for this tutorial.

```> print(simd)
```
```     state element type  color accept
1   Target    long    A    red    yes
2   Target    long    A    red    yes
3   Target  normal    C yellow    yes
4   Target  normal    B  green    yes
5  Outlier   short    A    red     no
6   Target    long    B  green    yes
7  Outlier   short    A    red     no
8  Outlier    long    C yellow     no
9   Target    long    C yellow    yes
10 Outlier  normal    A    red     no ```

In this dataset, the 'state' has relationships with 'element' and 'accept' columns. And 'type' with 'color' column. When you create a data frame with categorical data, column should be a factor type. Otherwise, the data frame cannot be used in BN structure creation.

```> str(simd)
```
```'data.frame': 10 obs. of  5 variables:
\$ state  : Factor w/ 2 levels "Outlier","Target": 2 2 2 2 1 2 1 1 2 1
\$ element: Factor w/ 3 levels "long","normal",..: 1 1 2 2 3 1 3 1 1 2
\$ type   : Factor w/ 3 levels "A","B","C": 1 1 3 2 1 2 1 3 3 1
\$ color  : Factor w/ 3 levels "green","red",..: 2 2 3 1 2 1 2 3 3 2
\$ accept : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 1 1 2 1```

Next, we'll create learning structure with hc() function.

`> hc_simd <- hc(simd)`
```> class(hc_simd)
```
` "bn"`
` `
```> hc_simd

Bayesian network learned via Score-based methods

model:
[state][type][element|state][color|type][accept|state]
nodes:                                 5
arcs:                                  3
undirected arcs:                     0
directed arcs:                       3
average markov blanket size:           1.20
average neighbourhood size:            1.20
average branching factor:              0.60

learning algorithm:                    Hill-Climbing
score:                                 BIC (disc.)
penalization coefficient:              1.151292546
tests used in the learning procedure:  22
optimized:                             TRUE ```

We can see the structure in a plot.

`> plot(hc_simd)`

In this plot, state, element, accept, type, and color are called nodes. Directions between nodes are described in arcs that is a matrix containing a data of from-to elements direction.

```> hc_simd\$arcs
```
```      from    to
[1,] "type"  "color"
[2,] "state" "accept"
[3,] "state" "element"```

As above arcs show that 'type' to 'color', and 'state' to 'accept' and 'element'  relationships exist in our data. 'type' and 'state' are two independent groups that are not dependent on each other.

Next, we'll fit the model with data.

```> simd_fitted <- bn.fit(hc_simd, data=simd)
> simd_fitted

Bayesian network parameters

Parameters of node state (multinomial distribution)

Conditional probability table:
Outlier  Target
0.4     0.6

Parameters of node element (multinomial distribution)

Conditional probability table:

state
element       Outlier       Target
long   0.2500000000 0.6666666667
normal 0.2500000000 0.3333333333
short  0.5000000000 0.0000000000

Parameters of node type (multinomial distribution)

Conditional probability table:
A   B   C
0.5 0.2 0.3

Parameters of node color (multinomial distribution)

Conditional probability table:

type
color    A B C
green  0 1 0
red    1 0 0
yellow 0 0 1

Parameters of node accept (multinomial distribution)

Conditional probability table:

state
accept Outlier Target
no        1      0
yes       0      1
```

Based on the above train data, we can perform conditional probability queries.

Let's say element="short", and accept="no", and in this condition we check state probability of "Outlier" and "Target". To query this condition we use cpquery() function.

`> cpquery(simd_fitted,`
`           event=((element=="short")&(accept=="no")),`
```           evidence=(state=="Outlier"))
 0.4856262834```

state's probability of becoming "Outlier" is 48%.

`> cpquery(simd_fitted,`
`            event=((element=="short")&(accept=="no")),`
```            evidence=(state=="Target"))
 0```

state's probability of becoming "Target" is 0%.

In this post, we've briefly learned how to apply Bayesian networks with bnlearn package in R.