# Homework #1

Due Sept 27th in class.

1. Show that the projection of a Gaussian distribution to any lower dimensional subspace is also a Gaussian distribution.
2. Given a mixture of k distributions, each of which is the uniform distribution over an unknown cube (of arbitrary side length and orientation), give a separation condition that allows efficient clustering of a sample with high probability. Bound the time and sample complexity of your algorithm.
3. Give an algorithm to estimate the variances and mixing weights of a mixture of k Gaussians given that the Gaussians  (1) are spherical and (2) all have mean zero.
4. Consider the following algorithm for ICA. For a vector $u$, let $M(u)$ be the covariance matrix of the sample with each point weighted by $e^{u^Tx}$. Pick random Gaussian vectors $u,v$, compute $M(0), M(u)$ and $M(v)$ and output $(M(u)-M(0))(M(v)-M(0))^{-1}$. Show that this algorithm works if $x$ is generated as $As+y$ where $A$ is full-rank and square, $s$ is a product distribution and $y$ has an unknown Gaussian distribution.
5. Suppose $a=(1,...,1,0,.....,0)\in\mathbb{R}^{n}$ is a vector whose first $s$ coordinates are equal to 1 and the remaining coordinates are 0. Let $r\in\mathbb{R}^{n}$ be a vector whose entries are i.i.d from N(0,1). Show that when $s$ is sufficiently small compared to $n$, with probability at least $1-1/n$, the unique solution to the program below is $x_1=1, x_2=0$ :\begin{aligned}\min_{x_{1},x_{2}\in\mathbb{R}} & \|x_{1}a+x_{2}r\|_{1}\\\mbox{s.t. } & x_{1}a_1+x_{2}r_1=1 \end{aligned}

What is the highest $s$ (as a function of $n$) for which your proof works?

6. Consider the following k-means/EM algorithm for learning data from a mixture of k Gaussians:
1. start with a random set of k Gaussians, i.e. each with a random data point as its mean and and identity covariance matrix.
2. For each data point, find the Gaussian that is most likely to have generated it among the current k Gaussians (i.e. the density assigned is the highest of the k densities).
3. This gives a partition of the data. For each part, estimate the mean and covariance of the subset of data, yielding k new Gaussians and k new mixing weights.
4. Repeat 2,3 till an equilibrium is reached.

Give an example for which this algorithm fails to estimate the true mixture, and reaches an equilibrium that is significantly different from the true mixture.