Hidden Markov Model example in R


   Hidden Markov Model (HMM) is a method for representing most likely corresponding sequences of observation data. HMM is used in speech and pattern recognition, computational biology, and other areas of data modeling.
   In this post, I will try to explain HMM, and its usage in R. HMM package provides HMM related functions in R.

> library(HMM)
 
   The transition and emission matrix are the main parameters to build HMM.
  • The transition matrix is a probability of switching from one state to another.
  • Emission matrix is a selection probability of the element in a list.

   Let's see an example. There are two possible states called "Target" and "Outlier" in a test data, and their selecting probabilities are as below,

> states <- c("Target","Outlier")
> targetProb <- c(0.4, 0.6)
> outlierProb <- c(0.6, 0.4)

Based on those selection probabilities, we build a transition probability matrix.

> transProb <- matrix(c(targetProb, outlierProb), 2)
> transProb
     [,1] [,2]
[1,]  0.4  0.6
[2,]  0.6  0.4

   A state element can be a "short" or "long" or "normal". In each state, the selection probability of an element is different. For a "Target" state, the elements probabilities are Pshort=0.1, Pnormal=0.3, Plong=0.6 and for an "Outlier" state probabilities are Pshort=0.6, Pnormal=0.3, Plong=0.1 percent. We can define them as below.

> elements <- c("short","normal","long")
> targetStateProb <- c(0.1, 0.3, 0.6)
> outlierStateProb <- c(0.6, 0.3, 0.1)

We create an emission probability matrix.

> emissProb <- matrix(c(targetStateProb,outlierStateProb), 2, byrow = T) 
> emissProb
     [,1] [,2] [,3]
[1,]  0.1  0.3  0.6
[2,]  0.6  0.3  0.1

Now, we can build a model with the above inputs. The initHMM function creates HMM.

> hmm <- initHMM(States = states, 
                 Symbols = elements,
                 transProbs=transProb,
                 emissionProbs = emissProb)

We can check the summary of an hmm model

> print(hmm)
$States
[1] "Target"  "Outlier"

$Symbols
[1] "short"  "normal" "long"  

$startProbs
 Target Outlier 
    0.5     0.5 

$transProbs
         to
from      Target Outlier
  Target     0.4     0.6
  Outlier    0.6     0.4

$emissionProbs
         symbols
states    short normal long
  Target    0.1    0.3  0.6
  Outlier   0.6    0.3  0.1

Next, we simulate 10 observation elements with a simHMM function using our hmm model.

> simhmm <- simHMM(hmm, 10)
> simulated <- data.frame(state=simhmm$states, element=simhmm$observation)

Printing the result of simulated data.

> print(simulated)
     state element
1   Target  normal
2  Outlier  normal
3   Target  normal
4   Target   short
5   Target    long
6  Outlier   short
7  Outlier   short
8  Outlier   short
9   Target    long
10 Outlier   short


Predicting possible state of a test data

   Another interesting function in HMM package is a viterbi function. The Viterbi algorithm calculates the possible state for a sequence of observations for a given HMM.
   We create test data and find out possible states for those elements with the hmm model.

> testElements <- c("long","normal","normal","short",
                   "normal","normal","short","long")
> stateViterbi <- viterbi(hmm, testElements)

The result is listed below.

> predState <- data.frame(Element=testElements, State=stateViterbi)
> print(predState)
  Element   State
1    long  Target
2  normal Outlier
3  normal  Target
4   short Outlier
5  normal  Target
6  normal  Target
7   short Outlier
8    long  Target


Based on our hmm model, possible states of test elements are predicted.
Thank you for reading! I hope you have found it useful!

2 comments:

  1. Hey!
    I think there are some problems with the matrices in this post (maybe it was written against an earlier version of the HMM library?

    The transProbs-matrix needs to be transposed, so that each of the rows sum to 1. In general, this matrix needs to have the same amount of rows and columns.

    The emissionProbs-matrix also needs to have the same amount of rows/columns.

    These conclusions I have drawn from the documentation of initHMM(..).

    ReplyDelete
    Replies
    1. Yes, you are right, the rows sum must be equal to 1.
      I updated matrix values. Thanks!

      Delete