**Dynamic Programming** (DP) is ubiquitous in NLP, such as Minimum Edit Distance, Viterbi Decoding, forward/backward algorithm, CKY algorithm, etc.

# Minimum Edit Distance

Minimum Edit Distance (Levenshtein distance) is string metric for measuring the difference between two sequences.

Given two strings $a$ and $b$ (with length $|a|$ and $|b|$ respectively), define as the edit distance between $a[1 \cdots i]$ and $b[1 \cdots j]$, i.e. the first $i$ chars of $a$ and the first $j$ chars of $b$. The edit distance between two strings is $\text{D}(|a|, |b|)$.

Let the costs of insertion, deletion, substitution are 1,1 and 2, respectively.

That is,

Algorithms in Python:

1 | class Solution: |

# DP in HMMs

Hidden Markov Models (HMMs) describe the joint probability of its hidden(unobserved) and observed discrete random variables. It relies on the Markov assumption and output indenpendence.

- The observation sequence
- Let be a discrete random vriable with $K$ possible states. The transition matrix
- The initial state distribution(t=1) is
- The output/emission matrix

Thus the hidden markov chain can be defined by .

## Forward algorithm

Forward algorithm is used to calculate the "belief state": the probability at time $t$, given the history of evidence.^{[1]} Such process is also called **filtering**.

- The goal is to compute the joint probability , which requires marginalizing over all possible state sequence , the # of which grows exponentially with $t$.
- The
*conditional independence*of HMM could help to perform calculation recursively. - Time complexity: $O(K^2T)$

**Forward algorithm**:

- Init $t=0$, transition probabilities , emission probabilities , observed sequence $y_{1:t}$

for $t \leftarrow t+1, \cdots, T$:return

## Viterbi Algorithm

**Viterbi algorithm** is to finding the most probable sequence of hidden states, i.e. Viterbi path.^{[2]}

**Input**:

- The observation space
- state space
- initial probability array such that
- observation sequence
- transition matrix $A$ with size $K \times K$. stores the transition probability of transition from state to
- emission matrix $B$ of size $K \times N$, such that stores the probability of observing from state

**Output**:

- the most likely hidden state sequence
- Time complexity: $O(|S|^2T)$

**function**Viterbi($O$,$S$, $\pi$, $Y$, $A$, $B$):**returns**$X$- for state i = do
- Viterbi init step $0 \rightarrow 1$ :
- Backtrack init step $0 \rightarrow 1$ :

- for observation $j = 2,3,\cdots, T$:
- for state $i = 1,2,\cdots,K$:
- Viterbi records the best prob
- Backpointer saves the best backpointer

- for state $i = 1,2,\cdots,K$:
- best path prob.
- best path pointer
- for $j=T, T-1, \cdots, 2$:
- do back tracking $z_{j-1} \leftarrow T_2[z_j,j]$
- Prev hidden state

- return

- for state i = do

## Baum-Welch algorithm

Baum-Welch algorithm, as a speciqal case of EM algorithms, uses the forward-backward algorithm to find the maximum likelihood estimate of the unknown parameters of a HMM given a set of observed feature vectors.

The local maximum

### Forward process

Let represent the probability of the observation of :

### backward process

Let denote the probability of the sequence starting at time $t$.

### update

The prob. of being state $i$ at time $t$ given the observation $Y$ and parameters $\theta$

The prob. of being state $i$ and $j$ at times $t$ and $t+1$ respectively given the observation $Y$ and parameter $\theta$

Update the HMM $\theta$:

- $\pi$ at state $i$ at time 1:
- the transition matrix $A$.
the emission matrix $B$

- where

# CKY algorithm

TBD