Yekun's ML Notes

Some machine learning notes and writeup.

A short introduction about machine learning.

Type of machine learning (ML)

Predictive/ supervised learning

• Goal: learn a mapping from inputs $x$ to outputs $y$, given a labeled set of input-output pairs $\mathscr{D} = \{ (\mathbf{x}_i, y_i) \}_{i=1}^N$, a.k.a. training set.
• Conditional density estimation, i.e. build models for $p(y_i |\mathbf{x}_i, \Theta)$

Classification

a.k.a. pattern recognition.
• Binary classification
• Multi-class classification
• Multi-label classification (viewed as doing multiple binary predictions)

The mode of the distribution $p(y|\mathbf{x}, \mathscr{D})$, a.k.a. MAP (maximum a posteriori) estimate:

• Given a probabilistic output, compute the “best guess” as to the “true label”:

Descriptive/ unsupervised learning

• Goal: Only given inputs $\mathscr{D} = \{\mathbf{x}_i\}_{i=1}^N$, find “interesting patterns” in the data (a.k.a. knowledge discovery).
• Unconditional density estimation, i.e. build models for $p(\mathbf{x}_i|\Theta)$

Popular deep unsupervised generative models:

• GANs
• VAEs
• Fully visible belief networks (FVBN)

Discovering clusters

Dimension reduction: Clustering data into groups. Let $K$ denote the number of clusters, we estimate the distribution over the number of clusters, $P(K|\mathscr{D})$, which tells us if there are subpopulations within the data.

Discovering latent factors

Reduce the dimensionality by projecting the data to a lower dimensional subspace which captures the “essence” of the data.

Discovering graph structure

Measure a set of correlated variables, discover which ones are most correlated with which others. We learn the graph structure from the data, i.e. compute $\hat{G} = \arg\max p(\mathscr{G} | \mathscr{D} )$.

Matrix completion

Missing data (NaN, “not a number”) completion.

Image inpainting

“Fill in” holes in an image with realistic texture. This can be tackled by building a joint probability model of the pixels, given a set of clean images, and then inferring the unknown variables (pixels) given the known variables (pixels).

Collaborative filtering

Key idea: the prediction is not based on features of the movie or user (although it could be), but merely on a ratings matrix $\mathbf{X}(m,u)$ with user $u$ of movie $m$.

Reinforcement learning

• Goal: learn how to act / behave when given occasional reward or punishment signals (e.g. how a baby learns to walk).

Basic ML concepts

Parametric v.s. non-parametric models

Parametric models

• Models have a fixed number of parameters.
• Cons: strong assumptions about the nature of the data distributions.

Non-parametric models

• The number of model parameters grow with the amount of training set.
• Example: $K$-nearest neighbor classifier
• The curse of dimensionality