An Efficient Membership Query Algorithm for Learning DNF with

Document Sample
An Efficient Membership Query Algorithm for Learning DNF with Powered By Docstoc
					   An Efficient Membership-Query
  Algorithm for Learning DNF with
Respect to the Uniform Distribution


               Jeffrey C. Jackson
                 Presented By:
                       Eitan Yaakobi
                       Tamar Aizikowitz
Presentation Outline
   Introduction

   Algorithms We Use
       Estimating Expected Values
       Hypothesis Boosting
       Finding Weak-approximating Parity Functions

   Learning DNF With Respect to Uniform
       Existence of Weak Approximating Parity Functions for
        every f, D
       Nonuniform Weak DNF Learning
       Strongly Learning DNF

                                                               2
Introduction
   DNF is weakly-learnable with respect to the
    uniform distribution as shown by Kushilevitz
    and Mansour.
   We show that DNF is weakly learnable with
    respect to a certain class of nonuniform
    distributions.
   We then use a method based on Freund’s
    boosting algorithm to produce a strong
    learner with respect to uniform.
                                                   3
Algorithms We Use
   Our learning algorithm makes use of several
    previous algorithms.

   Following is a short reminder of these
    algorithms.




                                                  4
Estimating Expected Values
   The AMEAN Algorithm:
       Efficiently estimates the expectancy of a random
        variable.
       Based on Hoeffding’s inequality:
           Let Xi be independent random variables such that:
             Xi , Xi[a,b] and E[Xi]=μ
            then:




                                                                5
The AMEAN Algorithm
   Input:
       random X[a,b]
       b–a
       λ,  > 0
   Output:
       μ’ such that Pr[|E[X] – μ’|  λ]  1 – δ
   Running time:
     O((b-a)2log(δ-1) / λ2)

                                                   6
Hypothesis Boosting
   Our algorithm is based on boosting weak
    hypotheses into a final strong hypothesis.

   We use a boosting method very similar to
    Freund’s boosting algorithm.

   We refer to Freund’s original algorithm as F1.



                                                     7
The F1 Boosting Algorithm
   Input:
       positive ε, δ and γ
       (½ – γ)-approximate PAC learner for representation
        class 
       EX( f,D) for some f in  and any distribution D
   Output:
       ε-approximation for f with respect to D with
        probability at least 1 – δ
   Running time:
       polynomial in n, s, γ-1, ε -1, and log(δ -1)

                                                             8
The Idea Behind F1 (1)
   The algorithm generates a series of weak
    hypotheses hi.

   h0 is a weak approximator for f with respect to
    the distribution D.

   Each subsequent hi is a weak approximator
    for f with respect to the distribution Di.

                                                  9
The Idea Behind F1 (2)
   Each distribution Di focuses weight on those
    areas where slightly more than half the
    hypotheses already generated were
    incorrect.

   The final hypothesis h is a majority vote on all
    the hi-s.



                                                   10
The Idea Behind F1 (3)
   If a sufficient number of weak hypotheses is
    generated then h will be an ε-approximator for
    f with respect to the distribution D.

   Freund showed that ½γ-2ln(ε-1) weak
    hypotheses suffice.




                                                 11
Finding Weak-approximating
Parity Functions
   In order to use the boosting algorithm, we
    need to be able to generate weak-
    approximators for our DNF f with respect to
    the distributions Di.

   Our algorithm is based on the Weak Parity
    algorithm (WP) by Kushilevitz and Mansour.



                                                  12
The WP Algorithm
   Finds the large Fourier coefficients of a
    Boolean function f on {0,1}n using a
    Membership Oracle for f.









                                                13
The WP’ Algorithm (1)
   Our learning algorithm will need to find the
    large coefficients of a non-Boolean function.
   The basic WP algorithm can be extended to
    the WP’ algorithm which works for non-
    Boolean f as well.
   WP’ gives us a weak approximator for a non-
    Boolean f with respect to the uniform
    distribution.

                                                    14
The WP’ Algorithm (2)
   Input:
       MEM( f ) for f:{0,1}n→
       θ, δ, n, L( f ) > 0
   Output:
       With probability at least 1 – δ, WP’ outputs a set S such that
        for all A:
        


        

   Running time:
    

                                                                     15
Learning DNF with Respect to
Uniform
   We now show the main result: DNF is learnable
    with respect to uniform.
   We begin by showing that for every DNF f and
    distribution D there exists a parity function that
    weakly approximates f with respect to D.
   We use this to produce an algorithm for weakly
    learning DNF with respect to certain nonuniform
    distributions.
   Finally we show that this weak learner can be
    boosted into a strong learner with respect to the
    uniform distribution.
                                                         16
Existence of Weak Approximating
Parity Functions for every f, D (1)
   For every DNF f and every distribution D
    there exists a parity function that weakly
    approximates f with respect to D.





   The more difficult case is when ED[ f ] ~ 0.


                                                   17
Existence of Weak Approximating
Parity Functions for every f, D (2)

   Let f be a DNF such that E[ f ] ~ 0.
   Let s be the number of terms in f.
   Let T(x) be the {-1,+1} valued function
    equivalent to the term in f best correlated with
    f with respect to D.



                                                   18
Existence of Weak Approximating
Parity Functions for every f, D (3)




                                      19
Existence of Weak Approximating
Parity Functions for every f, D (4)
   T is a term of f  PrD [T(x) = f(x) | f(x) = -1] = 1
   There are s terms in f, T is the best correlated
    with f  PrD [ T(x) = f(x) | f(x) = 1 ] ≥ 1/s




    PrD [ T(x) = f(x) ] ≥ 1/2(1 + 1/s)
    ED [fT] ≥ 1/s
                                                       20
Existence of Weak Approximating
Parity Functions for every f, D (5)
   T can be represented using the Fourier
    transform.
   Define:








                                             21
Nonuniform Weak DNF
Learning (1)
   We have shown that for every DNF f and
    every distribution D there exists a parity
    function that is a weak approximator for f with
    respect to D.
   How can we find such a parity function?
   We want an algorithm that when given a
    threshold θ and a distribution D finds a parity
    such that, say:

                                                  22
Nonuniform Weak DNF
Learning (2)




                      23
Nonuniform Weak DNF
Learning (3)
   We have reduced the problem of finding a
    well correlated parity to finding a large
    Fourier coefficient of g.
   g is not Boolean
               therefore we use WP’.
   Invocation: WP’(n,MEM(g),θ,L(g) ,)


              MEM(g)(x)  2n  MEM( f )(x)  D
                                                 24
The WDNF Algorithm (1)
   We define a new algorithm: Weak DNF
    (WDNF).
   WDNF finds the large Fourier coefficients of
    g(x)=2nf(x)D(x) therefore finding a parity that is
    well correlated with f with respect to the
    distribution D.
   WDNF makes use of the WP’ algorithm for
    finding the Fourier coefficients of the non-
    Boolean g.
                                                     25
The WDNF Algorithm (2)
   Proof of Existence:
       Let g(x)=2nf(x)D(x)
    


    
           Output with prob. 1 – :


           Running Time:
                          poly. in n, s, log(-1), and L(2nD)

                                                                 26
The WDNF Algorithm (3)
   Input:
       EX(f,D)
       MEM( f )
       D
       δ>0
   Output:
       With probability at least 1 – δ :
        parity function h (possibly negated) s.t.:
        ED[fh] = Ω(s-1)
   Running time:
       polynomial in n, s, log(-1), and L(2nD)
                                                     27
The WDNF Algorithm (4)
   WDNF is polynomial in L(g) = L(2nD).
    If D is at most poly(n,s,ε, -1) / 2n then WDNF
    runs polynomially in the normal parameters.
   Such D is referred to as polynomially-near
    uniform.

    WDNF weakly learns DNF with respect to
    any polynomially-near uniform distribution D.

                                                    28
Strongly Learning DNF
   We define the Harmonic Sieve Algorithm (HS).
   HS is an application of the F1 boosting
    algorithm on the weak learner generated by
    WDNF.
   The main difference between HS and F1 is
    the need to supply WDNF with an oracle for
    distribution Di at each stage of boosting.

                                                   29
The HS Algorithm (1)
   Input:
       EX( f,D)
       MEM( f )
       D
       s
       ε, > 0
   Output:
       With probability 1 –  :
        h s.t. h is an ε-approximator of f with respect to D.
   Running Time:
       polynomial in n, s, ε-1, log(-1), and L(2nD)
                                                                30
The HS Algorithm (2)
   For WDNF to work, and work efficiently, two
    requirements must be met:
       An oracle for the distribution must be provided for the
        learner.
       The distribution must be polynomially-near uniform.
   We show how to simulate an approximate oracle Di’
    that can be provided to the weak learner instead of
    an exact one.
   We then show that the distributions Di are in fact
    polynomially-near uniform.
                                                                  31
Simulating Di (1)
   Define:



    To provide an exact oracle we need to
    compute the denominator which could
    potentially take an exponentially long time.
   Instead we will estimate the value of
                   using AMEAN.
                                                   32
Simulating Di (2)
   .






















                    33
Implications of Using Di’
   Note that:
       gi’ = 2n f Di’ = 2nf ci Di = ci gi
    

    Multiplying the distribution oracle by a constant is
    like multiplying all the coefficients of gi by the same
    constant.
    The relative sizes of the coefficients stay the
    same.
    WDNF will be able to find the large coefficients.
   The running time is not adversely affected.
                                                          34
Bound on Distributions Di
   It can be shown that for each i:


   Thus Di is bounded by a polynomial in L(D)
    and ε-1.
    If is D polynomially-near uniform then Di is
    also polynomially-near.
    HS strongly learns DNF with respect to the
    uniform distribution.
                                                     35
Summary
   DNF can be weakly learned with respect to
    polynomially-near distributions using the
    WDNF algorithm.

   The HS algorithm strongly learns DNF with
    respect to the uniform distribution by boosting
    the WDNF weak learner.



                                                  36

				
DOCUMENT INFO