Homework #3

Due Nov 29th in class. You can work in teams of two.
For this assignment, you will be experimenting with the agnostic mean estimation algorithm from class. The algorithm receives a noisy sample (with 1-\eta fraction coming from some true distribution D and with an \eta fraction being adversarial noise) and outputs an estimate \hat{\mu} for the true mean \mu . Let the error of the algorithm be ||\mu - \hat{\mu}||_2. Your task will be to choose the true distribution and the locations of the adversarial noise to cause the algorithm’s error to be as large as possible.
For the true distribution, you should try 1) a single Gaussian 2) a mixture of two Gaussians 3) the uniform distribution in the simplex 4) another distribution of your choice. Try these distributions in 100 dimensions, 200 dimensions, and 500 dimensions. You will decide where to place the noise (you can also try picking the noise from some other distribution). For your experiments, use \eta = 0.1, and let the number of total samples (noise and true samples) be 10 times the dimension.
At the end of your experiments, give the average error of the mean estimation algorithm over 10 runs (i.e. 10 independent sets of true samples) for each distribution for dimensions 100, 200, and 500. Describe how you chose your noise distributions and explain why the algorithm does better or worse for the different true distributions and noise placements that you chose. Justify your choice of noise distribution and also why you chose your fourth distribution. For each true distribution, you will want to compare different noise distributions to explain why your final noise distribution makes the algorithm perform the worst.
You can access the code of the algorithm at