(based on Avrim Blum’s notes)

**Predicting from expert advice**

- In each round, each of the experts makes a prediction in
- The learner makes a prediction based on the experts’ advice
- The learner observes the actual outcome

- The goal is to predict nearly as well as the best expert in hindsight
- mistakes by the learner
- mistakes by the best expert in hindsight

- If there is a perfect expert, then by predicting the majority class and removing the erring experts in each round, .
- If no perfect expert, we can restart the above process whenever all the experts are removed. Each expert makes at least one mistake when all of them are removed, thus

**Weighted Majority**

- Set , ,
- On a mistake, for very expert who predicted incorrectly, set
- On a mistake, at least half of the weights are reduced by half, so goes down by a factor of .
- After mistakes,
- On the other hand, the best expert has weight .
- Since the total weight is larger than any single weight, by combining the above two inequalities, we get or .

**Randomized Weighted Majority**

- Instead of predicting the majority, pick expert randomly with probability proportional to and follow the advice
- Fix . In each time step, for each erring expert, set
- At time , Let be the fraction of experts who make a mistake.
- The total weight at time is , so , where we use the fact that and
- On the other hand, the weight of the best expert is . Therefore, . Rearranging gives
- Choose , then
- Since , where is the total number of rounds, we have . This means the expected number of mistakes per step (“regret”) approaches that of the best expert as increases.

**Winnow **

Learn an OR of out of variables

- Initialize , . Predict if , otherwise .
- Mistake on a positive example: ,
- Mistake on a negative example: , (could set ).
- The weights of relevant variables never decrease, and each of them can only be doubled times, so #mistakes on positive examples is at most
- For each mistake on , increases at most , and for each mistake on , decreases at least . Since never drops below zero, we have (#mistakes on ) (#mistakes on ).
- Therefore, total #mistakes .

k-out-of-r majority-vote function

- Initialize , . Predict if , otherwise .
- Mistake on a positive example: ,
- Mistake on a negative example: ,
- Let be number of mistakes on positives and be the number of mistakes on negatives. Then,
- , since the total number of factors that can go on the relevant variables is at most . Also,
- since the total weight remains positive. Thus, . Using this in the first inequality,
- Setting , , and has the same bound.
- Total #mistakes .

Halfspaces

- OR function is a halfspace:
- k-out-of-r majority-vote function is a halfspace:
- Consider a general halfspace
- We can scale so that all weights are integers, and assume all are

non-negative (by introducing new variables ). - Duplicate each variable times, then it becomes the -out-of- problem with total number of variables. Therefore, the mistake bound is .
- Can be generalized to any , . The general mistake bound is , where and .

Winnow vs. Perceptron

- E.g , and . (k ones), then and .
- Winnow:
- Perceptron:

- E.g. , then and . Assume and .
- Winnow:
- Perceptron:

Advertisements