The online optimization problem is defined as follows:

- For each time
- Learner makes a decision
- Observes the cost function and incurs cost

- The goal is to minimize the regret: the difference between the algorithm’s total cost and the cost of always using a single best decision

- We now consider online linear optimization
- The decision space
- The cost function is a linear function: , where
- Assume we have access to an oracle , such that

- A naive approach would be “follow the leader”
- , where
- The algorithm fails even for a simple 2-dimension case

- Follow the Perturbed leader:
- Pick uniformly from
- Choose

- Another variant:
- Pick with density
- Choose

- Assume
- , .
- , .
- ,

- Theorem 1:
- Set , then has cost
- Set , then has cost

- Lemma 1:
- Consider the “Be the leader” algorithm (impossible in practice, only for analysis): use at time
- We prove this by induction:

- Lemma 2: “Be the perturbed leader” has a small regret
- .
- pf: Suppose the cost is

- Lemma 3:
- The distributions over and are similar: they are both distributions over cubes. If they overlap on a fraction of their volume, then

- Lemma 4:
- For any , the cubes and overlap in at least a fraction.

- Proof of Theorem 1:
- (using Lemma 3)
- (using Lemma 2)

- Online Convex Optimization
- We now consider the case where the cost function are convex
- At each time , choose the decision , where is a convex body of diameter .
- Let be the gradient of the cost function.

- Online Gradient Descent:
- Facts:
- For a convex function ,
- ,

- Theorem:
- Let . Assume .
- If set , then .
- If set , then .

- Proof:
- Let
- By Fact 1, we have
- Set , then
- Set , then

- In HW4, we prove that FPL also works for online convex optimization, by using the gradients as the linear cost functions.

Advertisements