A halfspace is defined by a vector and a threshold as follows:

if then , else

In order to use the sample complexity bound we derived last time, which depends on , we need to show that the number of effective halfspaces is finite.

Two halfspaces are effectively the same if their predictions on all points are the same.

A halfspace can be moved till it touches points without changing any labels. There are different points in , so at most different halfspaces.

This is a loose bound. We will show a better bound next time.

disjunction (and conjunction) is a special case of halfspace

E.g. <=>

decision list is a special case of halfspace

E.g. “if then 1, else if then 0, else 1” <=>

A non-linear threshold function can be written as a halfspace by introducing new variables , ie.

We can write a halfspace as , by extending and . We can also assume and .

Perceptron

On each mistake, update

Theorem: Let . The Perceptron algorithm makes at most mistakes

Proof:

on a mistake, .

So after mistakes,

On the other hand,

So after mistakes,

Combining the two inequalities

Therefore,

A modified Perceptron

random unit vector

On each mistake and if , update

Theorem: The above algorithm updates at most times (w.h.p.)

Proof:

w.l.o.g, assume all labels are positive (because we can negate the feature part)

(w.p. )

After steps, , thus is the maximum possible number of mistakes.

2 Comments

Hi,
I have a quick question about the Perceptron notes. In developing the bound for ||w’||^2, we have:
||w’||^2 = ||w||^2 + ||x||^2 + 2*l(x)*(w \cdot x)
= ||w||^2 + 1 – 2|w \cdot x|

In moving from the first equality to the second, we seem to assume that x is a unit vector. Could you mention how we justify this?

We assume here that each example is from the unit L2 ball. In general, without ths assumption, the bound would go up by the maximum length squared of an example, as scaling to fit within the unit ball would reduce the margin proportionately; see next set of notes.

Hi,

I have a quick question about the Perceptron notes. In developing the bound for ||w’||^2, we have:

||w’||^2 = ||w||^2 + ||x||^2 + 2*l(x)*(w \cdot x)

= ||w||^2 + 1 – 2|w \cdot x|

In moving from the first equality to the second, we seem to assume that x is a unit vector. Could you mention how we justify this?

We assume here that each example is from the unit L2 ball. In general, without ths assumption, the bound would go up by the maximum length squared of an example, as scaling to fit within the unit ball would reduce the margin proportionately; see next set of notes.