Aug 23: Overview

Unsupervised Learning: data has no labels. Goal is to recover structure or explain data.

Two high-level approaches:

  1. Model fitting: hypothesize model for data; estimate parameters of model. Or find best-fit model. Methods: PCA, tensor factorization, isotropic transformation.
  2. Clustering: choose objective function or clustering criterion; find clustering that optimizes objective function or satisfies criterion. Spectral clustering; k-means.

Supervised Learning: data consists of examples with labels. Find labeling function or approximation for it. PAC (Probably Approximately Correct) model. VC-dimension. Halfspaces; decision trees. Statistical query learning. Perceptron, Winnow, SVM, kernel-based, Fourier learning, lower bounds for parity.

Online Learning: data arrives one example at a time. A decision has to be made or label has to be chosen, then cost or true label is revealed. The goal is to minimize total regret or number of mistakes. Weighted majority; Follow-the-perturbed-leader.

Agnostic Learning: Any of the above settings, with noise in the data and/or labels.

Contemporary models: Active/Interactive learning, reward-based learning, cortical learning,…

Gaussian distribution in one dimension N(\mu, \sigma^2) has mean \mu, variance \sigma^2 and density f(x) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{\|x-\mu\|^2}{2\sigma^2}}

Gaussian distribution in {\bf R}^n: N(\mu, \Sigma) with mean \mu and covariance matrix \Sigma has density f(x) = \frac{1}{(2\pi)^{n/2}|\det\Sigma|}e^{-\frac{1}{2}(x-\mu)^T \Sigma^{-1}(x-\mu)}