Yekun's Note

Machine learning notes and writeup.

Fork me on GitHub

POS Tagging with HMMs

An introduction of Part-of-Speech tagging using Hidden Markov Model (HMMs).

Markov chains

Consider a sequence of state variables . A first-order Markov model instantiate two simplifying assumptions.

  1. The probability of a state depends only the previous state.
  2. The probability of an output observation depends only on the current state that produced the observation :

Hidden Markov Model

  • A set of $N$ states:
  • A transition probability matrix , each representing the probability of moving from state $i$ to state $j$, s.t. :
  • A sequence of $T$ observations, each one drawn from a vocabulary :
  • A sequence of observation likelihoods, a.k.a. emission probabilities, each expressing the probability of an observation , being generated from a state $i$:
  • An initial probability distribution over states. is the prob. that the Markov chain will start in state $i$. Some state $j$ may have , meaning that they cannot be initial states. Also, :

Training with MLE

  • The initial probability $\pi$:

  • The transition probability matrix $A$ contains :

    where means the count of the first word’s pos tag in bigram tuples.

  • The emission probability matrix $B$ contains :

Handling OOV problems

Decoding

Given as input an HMM $\lambda = (A, B)$ and a sequence of observations , find the most probable sequence of states

Viterbi algorithm