In this post, I we'll briefly learn how to use Bayesian networks with the bnlearn package in R. You may find more information about the package please refer to this page.
We'll start by loading the bnlearn package.
> library(bnlearn)
The bnlearn package implements the following learning algorithms.
- Constrain-based: PC, GS, IAMB, MMPC, Hilton-PC
- Score-based: Hill-Climbing, Tabu Search
- Pairwise: ARACNE, Chow-Liu
- Hybrid: MMHC, RSMAX2
We use score based learning algorithm, Hill-Climbing with hc() function. First, we'll generate simple dataset for this tutorial.
> print(simd) state element type color accept 1 Target long A red yes 2 Target long A red yes 3 Target normal C yellow yes 4 Target normal B green yes 5 Outlier short A red no 6 Target long B green yes 7 Outlier short A red no 8 Outlier long C yellow no 9 Target long C yellow yes 10 Outlier normal A red no | |
> str(simd) 'data.frame': 10 obs. of 5 variables: $ state : Factor w/ 2 levels "Outlier","Target": 2 2 2 2 1 2 1 1 2 1 $ element: Factor w/ 3 levels "long","normal",..: 1 1 2 2 3 1 3 1 1 2 $ type : Factor w/ 3 levels "A","B","C": 1 1 3 2 1 2 1 3 3 1 $ color : Factor w/ 3 levels "green","red",..: 2 2 3 1 2 1 2 3 3 2 $ accept : Factor w/ 2 levels "no","yes": 2 2 2 2 1 2 1 1 2 1
Next, we'll create learning structure with hc() function.
> hc_simd <- hc(simd)
> class(hc_simd) [1] "bn"
> hc_simd Bayesian network learned via Score-based methods model: [state][type][element|state][color|type][accept|state] nodes: 5 arcs: 3 undirected arcs: 0 directed arcs: 3 average markov blanket size: 1.20 average neighbourhood size: 1.20 average branching factor: 0.60 learning algorithm: Hill-Climbing score: BIC (disc.) penalization coefficient: 1.151292546 tests used in the learning procedure: 22 optimized: TRUE
We can see the structure in a plot.
> plot(hc_simd)
In this plot, state, element, accept, type, and color are called nodes. Directions between nodes are described in arcs that is a matrix containing a data of from-to elements direction.
> hc_simd$arcs from to [1,] "type" "color" [2,] "state" "accept" [3,] "state" "element"
As above arcs show that 'type' to 'color', and 'state' to 'accept' and 'element' relationships exist in our data. 'type' and 'state' are two independent groups that are not dependent on each other.
Next, we'll fit the model with data.
> simd_fitted <- bn.fit(hc_simd, data=simd) > simd_fitted Bayesian network parameters Parameters of node state (multinomial distribution) Conditional probability table: Outlier Target 0.4 0.6 Parameters of node element (multinomial distribution) Conditional probability table: state element Outlier Target long 0.2500000000 0.6666666667 normal 0.2500000000 0.3333333333 short 0.5000000000 0.0000000000 Parameters of node type (multinomial distribution) Conditional probability table: A B C 0.5 0.2 0.3 Parameters of node color (multinomial distribution) Conditional probability table: type color A B C green 0 1 0 red 1 0 0 yellow 0 0 1 Parameters of node accept (multinomial distribution) Conditional probability table: state accept Outlier Target no 1 0 yes 0 1
Based on the above train data, we can perform conditional probability queries.
Let's say element="short", and accept="no", and in this condition we check state probability of "Outlier" and "Target". To query this condition we use cpquery() function.
> cpquery(simd_fitted,
event=((element=="short")&(accept=="no")),
evidence=(state=="Outlier"))
[1] 0.4856262834
state's probability of becoming "Outlier" is 48%.
> cpquery(simd_fitted,
event=((element=="short")&(accept=="no")),
evidence=(state=="Target"))
[1] 0
state's probability of becoming "Target" is 0%.
In this post, we've briefly learned how to apply Bayesian networks with bnlearn package in R.
Thank you for reading!
No comments:
Post a Comment