 PAC Learning for arbitrary distribution is hard
 e.g., Learning DNF, , Learning Decision Trees.
 We don’t have efficient algorithms for arbitrary distributions
 We can make the problem easier by only considering special distributions
 uniform distribution
 product distribution: each is independent with .
 logconcave distribution (e.g. Gaussian)
 We can also allow “membership queries”
 Instead of drawing a labeled example from the distribution, we can ask the label for any point in the feature space
 Below we describe how to learn Decision Trees under the uniform distribution
 Assume , . Since is defined by the outputs for the possible , we can also represent .
 Define
 So
 is a vector with coordinates:
 standard basis:
 Suppose is an orthonormal basis , then f can be represented as
 Parity basis:
 For , the parity function is the parity function over variables in : .
 Therefore, is an orthonormal basis
 , where are called discrete Fourier coefficients. The following identities are immediate.
 Lemma 1 (Parseval):
 Lemma 2 (Plancharel):
 pf:
 We want to weaklearn by approximately finding the Fourier coefficients

 pf:
 It suffices to learn all “large” coefficients

 Goal: Find all such that , in time poly().
 If we know the subsets with large Fourier coefficients, then we can estimate by sampling.
 KushilevitzMansour (KM) algorithm:
 We build up a binary tree as follow.
 Initially, the root contains the set of all possible , where each is a binary vector of length . From Lemma 1, we know .
 Split the node into two nodes:
 Recursively split all the nodes whose sum of in that set is larger then .
 All in a leveli node will have the same prefix of length i
 Max depth: . Max width .
 Only times needed to ask the sum of .
 The only question now is how can we ask for the sum of squares of in these sets.
 Consider a set with prefix (k zeros).
 Claim: , where , are uniform random vectors. (yx is the concatenation of y and x)
 Suppose is a parity function
 If agrees with (doesn’t contain the first k variables), then so .
 If doesn’t agree with , then so .
 We know that any f can be written as a sum of parities, i.e.
 Suppose is a parity function
 For arbitrary prefix , we can instead ask , where is the parity of the subset of variables that are set to 1 in the string .
 Lemma:
 It suffices to learn all such that .
 For Decision Trees with at most leaves, .
Advertisements