Homework #2

Due Nov 3rd in class.

1. Consider a classifier in $\mathbb{R}^2$ which assigns value $1$ to a point if and only if it is inside a certain axis-aligned rectangle. Such a classifier is precisely defined as
$h_{(a_1,b_1,a_2,b_2)} (x_1,x_2) = \left\{ \begin{array}{ll} 1 & \text{if } a_1 \leq x_1 \leq b_1 \text{ and } a_2 \leq x_2\leq b_2\\ 0 & \text{ otherwise } \end{array}\right.$
Consider the class of all axis-aligned rectangles in the plane
$\mathcal{H}_{rec}^{2} = \{h_{(a_1, b_1, a_2, b_2)}\text{ : }a_1 \leq b_1 \text{ and } a_2 \leq b_2\}$

• Consider the algorithm that returns the smallest (in areas) axis-aligned rectangle enclosing all positive examples in the training set. Show that this algorithm $(\epsilon, \delta)$-PAC learns $\mathcal{H}_{rec}^2$.
• Let $\mathcal{H}_{rec}^d$ be the class of axis-aligned rectangles in $\mathbb{R}^d$. Prove that VCdim$(\mathcal{H}_{rec}^d) = 2d$.

2. Let $\mathcal{H}$ be a set of classifiers with VC-dimension $d$. Let $\mathcal{F}_t$ be the set of classifiers obtained by taking a weighted majority vote of $t$ classifiers from $\mathcal{H}$. Prove that the VC-dimension of $\mathcal{F}_t$ is at most $O(td \log td)$.

3. Let $\mathcal{H}_1,\ldots, \mathcal{H}_r$ be hypothesis classes over domain set $\mathcal{X}$. Let $d=\max_{i} VCdim(\mathcal{H}_i)$. Prove that

• VCdim$(\cup_{i=1}^{r}\mathcal{H}_i)\leq 4d\log(2d)+2log(r)$
• For $r=2$, VCdim$(\mathcal{H}_1 \cup \mathcal{H}_2) \leq 2d+1$

For simplicity, you can assume that $d\geq 3$.

4. Show that any decision list on $n$ Boolean variables is equivalent to a halfspace. For a decision list of length $k$, give a bound on the margin of  the halfspace, and thereby bound the number of mistakes made by Perceptron and by Winnow in the worst case.

5. Give an algorithm to PAC-learn the parity of a set of $d$ linear threshold functions in ${\mathbb{R}}^n$; the time and sample complexity of the algorithm should be polynomial for any fixed $d$.

6. Let $S = \{ (x_1,y_1),\ldots, (x_n,y_n)\}$ be a labeled sample of $n$ points in $\mathbb{R}^n$ with
$x_i = (\underbrace{(-1)^i,\ldots, (-1)^i, (-1)^{i+1}}_\text{{\it i} first components}, 0, \ldots, 0) \text{ and } y_i = (-1)^{i+1}$
Show that the Perceptron algorithm makes $\Omega (2^n)$ updates before finding a separating hyperplane, regardless of the order in which it receives the points.