An introduction of Part-of-Speech tagging using `Hidden Markov Model`

(HMMs).

## Markov chains

Consider a sequence of state variables . A first-order Markov model instantiate two simplifying assumptions.

- The probability of a state depends only the previous state.
- The probability of an output observation depends only on the current state that produced the observation :

## Hidden Markov Model

- A set of $N$
**states**: - A
**transition probability matrix**, each representing the probability of moving from state $i$ to state $j$, s.t. : - A sequence of $T$
**observations**, each one drawn from a vocabulary : - A sequence of
**observation likelihoods**, a.k.a.**emission probabilities**, each expressing the probability of an observation , being generated from a state $i$: - An
**initial probability distribution**over states. is the prob. that the Markov chain will start in state $i$. Some state $j$ may have , meaning that they cannot be initial states. Also, :

## Training with MLE

The initial probability $\pi$:

The transition probability matrix $A$ contains :

where means the count of the first word’s pos tag in bigram tuples.

The emission probability matrix $B$ contains :

### Handling OOV problems

## Decoding

Given as input an HMM $\lambda = (A, B)$ and a sequence of observations , find the most probable sequence of states