VIEWS: 11 PAGES: 36 POSTED ON: 3/29/2011
An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar Aizikowitz Presentation Outline Introduction Algorithms We Use Estimating Expected Values Hypothesis Boosting Finding Weak-approximating Parity Functions Learning DNF With Respect to Uniform Existence of Weak Approximating Parity Functions for every f, D Nonuniform Weak DNF Learning Strongly Learning DNF 2 Introduction DNF is weakly-learnable with respect to the uniform distribution as shown by Kushilevitz and Mansour. We show that DNF is weakly learnable with respect to a certain class of nonuniform distributions. We then use a method based on Freund’s boosting algorithm to produce a strong learner with respect to uniform. 3 Algorithms We Use Our learning algorithm makes use of several previous algorithms. Following is a short reminder of these algorithms. 4 Estimating Expected Values The AMEAN Algorithm: Efficiently estimates the expectancy of a random variable. Based on Hoeffding’s inequality: Let Xi be independent random variables such that: Xi , Xi[a,b] and E[Xi]=μ then: 5 The AMEAN Algorithm Input: random X[a,b] b–a λ, > 0 Output: μ’ such that Pr[|E[X] – μ’| λ] 1 – δ Running time: O((b-a)2log(δ-1) / λ2) 6 Hypothesis Boosting Our algorithm is based on boosting weak hypotheses into a final strong hypothesis. We use a boosting method very similar to Freund’s boosting algorithm. We refer to Freund’s original algorithm as F1. 7 The F1 Boosting Algorithm Input: positive ε, δ and γ (½ – γ)-approximate PAC learner for representation class EX( f,D) for some f in and any distribution D Output: ε-approximation for f with respect to D with probability at least 1 – δ Running time: polynomial in n, s, γ-1, ε -1, and log(δ -1) 8 The Idea Behind F1 (1) The algorithm generates a series of weak hypotheses hi. h0 is a weak approximator for f with respect to the distribution D. Each subsequent hi is a weak approximator for f with respect to the distribution Di. 9 The Idea Behind F1 (2) Each distribution Di focuses weight on those areas where slightly more than half the hypotheses already generated were incorrect. The final hypothesis h is a majority vote on all the hi-s. 10 The Idea Behind F1 (3) If a sufficient number of weak hypotheses is generated then h will be an ε-approximator for f with respect to the distribution D. Freund showed that ½γ-2ln(ε-1) weak hypotheses suffice. 11 Finding Weak-approximating Parity Functions In order to use the boosting algorithm, we need to be able to generate weak- approximators for our DNF f with respect to the distributions Di. Our algorithm is based on the Weak Parity algorithm (WP) by Kushilevitz and Mansour. 12 The WP Algorithm Finds the large Fourier coefficients of a Boolean function f on {0,1}n using a Membership Oracle for f. 13 The WP’ Algorithm (1) Our learning algorithm will need to find the large coefficients of a non-Boolean function. The basic WP algorithm can be extended to the WP’ algorithm which works for non- Boolean f as well. WP’ gives us a weak approximator for a non- Boolean f with respect to the uniform distribution. 14 The WP’ Algorithm (2) Input: MEM( f ) for f:{0,1}n→ θ, δ, n, L( f ) > 0 Output: With probability at least 1 – δ, WP’ outputs a set S such that for all A: Running time: 15 Learning DNF with Respect to Uniform We now show the main result: DNF is learnable with respect to uniform. We begin by showing that for every DNF f and distribution D there exists a parity function that weakly approximates f with respect to D. We use this to produce an algorithm for weakly learning DNF with respect to certain nonuniform distributions. Finally we show that this weak learner can be boosted into a strong learner with respect to the uniform distribution. 16 Existence of Weak Approximating Parity Functions for every f, D (1) For every DNF f and every distribution D there exists a parity function that weakly approximates f with respect to D. The more difficult case is when ED[ f ] ~ 0. 17 Existence of Weak Approximating Parity Functions for every f, D (2) Let f be a DNF such that E[ f ] ~ 0. Let s be the number of terms in f. Let T(x) be the {-1,+1} valued function equivalent to the term in f best correlated with f with respect to D. 18 Existence of Weak Approximating Parity Functions for every f, D (3) 19 Existence of Weak Approximating Parity Functions for every f, D (4) T is a term of f PrD [T(x) = f(x) | f(x) = -1] = 1 There are s terms in f, T is the best correlated with f PrD [ T(x) = f(x) | f(x) = 1 ] ≥ 1/s PrD [ T(x) = f(x) ] ≥ 1/2(1 + 1/s) ED [fT] ≥ 1/s 20 Existence of Weak Approximating Parity Functions for every f, D (5) T can be represented using the Fourier transform. Define: 21 Nonuniform Weak DNF Learning (1) We have shown that for every DNF f and every distribution D there exists a parity function that is a weak approximator for f with respect to D. How can we find such a parity function? We want an algorithm that when given a threshold θ and a distribution D finds a parity such that, say: 22 Nonuniform Weak DNF Learning (2) 23 Nonuniform Weak DNF Learning (3) We have reduced the problem of finding a well correlated parity to finding a large Fourier coefficient of g. g is not Boolean therefore we use WP’. Invocation: WP’(n,MEM(g),θ,L(g) ,) MEM(g)(x) 2n MEM( f )(x) D 24 The WDNF Algorithm (1) We define a new algorithm: Weak DNF (WDNF). WDNF finds the large Fourier coefficients of g(x)=2nf(x)D(x) therefore finding a parity that is well correlated with f with respect to the distribution D. WDNF makes use of the WP’ algorithm for finding the Fourier coefficients of the non- Boolean g. 25 The WDNF Algorithm (2) Proof of Existence: Let g(x)=2nf(x)D(x) Output with prob. 1 – : Running Time: poly. in n, s, log(-1), and L(2nD) 26 The WDNF Algorithm (3) Input: EX(f,D) MEM( f ) D δ>0 Output: With probability at least 1 – δ : parity function h (possibly negated) s.t.: ED[fh] = Ω(s-1) Running time: polynomial in n, s, log(-1), and L(2nD) 27 The WDNF Algorithm (4) WDNF is polynomial in L(g) = L(2nD). If D is at most poly(n,s,ε, -1) / 2n then WDNF runs polynomially in the normal parameters. Such D is referred to as polynomially-near uniform. WDNF weakly learns DNF with respect to any polynomially-near uniform distribution D. 28 Strongly Learning DNF We define the Harmonic Sieve Algorithm (HS). HS is an application of the F1 boosting algorithm on the weak learner generated by WDNF. The main difference between HS and F1 is the need to supply WDNF with an oracle for distribution Di at each stage of boosting. 29 The HS Algorithm (1) Input: EX( f,D) MEM( f ) D s ε, > 0 Output: With probability 1 – : h s.t. h is an ε-approximator of f with respect to D. Running Time: polynomial in n, s, ε-1, log(-1), and L(2nD) 30 The HS Algorithm (2) For WDNF to work, and work efficiently, two requirements must be met: An oracle for the distribution must be provided for the learner. The distribution must be polynomially-near uniform. We show how to simulate an approximate oracle Di’ that can be provided to the weak learner instead of an exact one. We then show that the distributions Di are in fact polynomially-near uniform. 31 Simulating Di (1) Define: To provide an exact oracle we need to compute the denominator which could potentially take an exponentially long time. Instead we will estimate the value of using AMEAN. 32 Simulating Di (2) . 33 Implications of Using Di’ Note that: gi’ = 2n f Di’ = 2nf ci Di = ci gi Multiplying the distribution oracle by a constant is like multiplying all the coefficients of gi by the same constant. The relative sizes of the coefficients stay the same. WDNF will be able to find the large coefficients. The running time is not adversely affected. 34 Bound on Distributions Di It can be shown that for each i: Thus Di is bounded by a polynomial in L(D) and ε-1. If is D polynomially-near uniform then Di is also polynomially-near. HS strongly learns DNF with respect to the uniform distribution. 35 Summary DNF can be weakly learned with respect to polynomially-near distributions using the WDNF algorithm. The HS algorithm strongly learns DNF with respect to the uniform distribution by boosting the WDNF weak learner. 36