Homework #2

Due Nov 3rd in class.

1. Consider a classifier in \mathbb{R}^2 which assigns value 1 to a point if and only if it is inside a certain axis-aligned rectangle. Such a classifier is precisely defined as
h_{(a_1,b_1,a_2,b_2)} (x_1,x_2) = \left\{ \begin{array}{ll} 1 & \text{if } a_1 \leq x_1 \leq b_1 \text{ and } a_2 \leq x_2\leq b_2\\ 0 & \text{ otherwise } \end{array}\right.
Consider the class of all axis-aligned rectangles in the plane
\mathcal{H}_{rec}^{2} = \{h_{(a_1, b_1, a_2, b_2)}\text{ : }a_1 \leq b_1 \text{ and } a_2 \leq b_2\}

  • Consider the algorithm that returns the smallest (in areas) axis-aligned rectangle enclosing all positive examples in the training set. Show that this algorithm (\epsilon, \delta)-PAC learns \mathcal{H}_{rec}^2.
  • Let \mathcal{H}_{rec}^d be the class of axis-aligned rectangles in \mathbb{R}^d. Prove that VCdim(\mathcal{H}_{rec}^d) = 2d.

2. Let \mathcal{H} be a set of classifiers with VC-dimension d. Let \mathcal{F}_t be the set of classifiers obtained by taking a weighted majority vote of t classifiers from \mathcal{H}. Prove that the VC-dimension of \mathcal{F}_t is at most O(td \log td).

3. Let \mathcal{H}_1,\ldots, \mathcal{H}_r be hypothesis classes over domain set \mathcal{X}. Let d=\max_{i} VCdim(\mathcal{H}_i). Prove that

  • VCdim(\cup_{i=1}^{r}\mathcal{H}_i)\leq 4d\log(2d)+2log(r)
  • For r=2, VCdim(\mathcal{H}_1 \cup \mathcal{H}_2) \leq 2d+1

For simplicity, you can assume that d\geq 3.

4. Show that any decision list on n Boolean variables is equivalent to a halfspace. For a decision list of length k, give a bound on the margin of  the halfspace, and thereby bound the number of mistakes made by Perceptron and by Winnow in the worst case.

5. Give an algorithm to PAC-learn the parity of a set of d linear threshold functions in  {\mathbb{R}}^n; the time and sample complexity of the algorithm should be polynomial for any fixed d.

6. Let S = \{ (x_1,y_1),\ldots, (x_n,y_n)\} be a labeled sample of n points in \mathbb{R}^n with
x_i = (\underbrace{(-1)^i,\ldots, (-1)^i, (-1)^{i+1}}_\text{{\it i} first components}, 0, \ldots, 0) \text{ and } y_i = (-1)^{i+1}
Show that the Perceptron algorithm makes \Omega (2^n) updates before finding a separating hyperplane, regardless of the order in which it receives the points.