# An Efficient Membership Query Algorithm for Learning DNF with

Document Sample

An Efficient Membership-Query
Algorithm for Learning DNF with
Respect to the Uniform Distribution

Jeffrey C. Jackson
Presented By:
Eitan Yaakobi
Tamar Aizikowitz
Presentation Outline
   Introduction

   Algorithms We Use
   Estimating Expected Values
   Hypothesis Boosting
   Finding Weak-approximating Parity Functions

   Learning DNF With Respect to Uniform
   Existence of Weak Approximating Parity Functions for
every f, D
   Nonuniform Weak DNF Learning
   Strongly Learning DNF

2
Introduction
   DNF is weakly-learnable with respect to the
uniform distribution as shown by Kushilevitz
and Mansour.
   We show that DNF is weakly learnable with
respect to a certain class of nonuniform
distributions.
   We then use a method based on Freund’s
boosting algorithm to produce a strong
learner with respect to uniform.
3
Algorithms We Use
   Our learning algorithm makes use of several
previous algorithms.

   Following is a short reminder of these
algorithms.

4
Estimating Expected Values
   The AMEAN Algorithm:
   Efficiently estimates the expectancy of a random
variable.
   Based on Hoeffding’s inequality:
   Let Xi be independent random variables such that:
 Xi , Xi[a,b] and E[Xi]=μ
then:

5
The AMEAN Algorithm
   Input:
   random X[a,b]
   b–a
   λ,  > 0
   Output:
   μ’ such that Pr[|E[X] – μ’|  λ]  1 – δ
   Running time:
 O((b-a)2log(δ-1) / λ2)

6
Hypothesis Boosting
   Our algorithm is based on boosting weak
hypotheses into a final strong hypothesis.

   We use a boosting method very similar to
Freund’s boosting algorithm.

   We refer to Freund’s original algorithm as F1.

7
The F1 Boosting Algorithm
   Input:
   positive ε, δ and γ
   (½ – γ)-approximate PAC learner for representation
class 
   EX( f,D) for some f in  and any distribution D
   Output:
   ε-approximation for f with respect to D with
probability at least 1 – δ
   Running time:
   polynomial in n, s, γ-1, ε -1, and log(δ -1)

8
The Idea Behind F1 (1)
   The algorithm generates a series of weak
hypotheses hi.

   h0 is a weak approximator for f with respect to
the distribution D.

   Each subsequent hi is a weak approximator
for f with respect to the distribution Di.

9
The Idea Behind F1 (2)
   Each distribution Di focuses weight on those
areas where slightly more than half the
hypotheses already generated were
incorrect.

   The final hypothesis h is a majority vote on all
the hi-s.

10
The Idea Behind F1 (3)
   If a sufficient number of weak hypotheses is
generated then h will be an ε-approximator for
f with respect to the distribution D.

   Freund showed that ½γ-2ln(ε-1) weak
hypotheses suffice.

11
Finding Weak-approximating
Parity Functions
   In order to use the boosting algorithm, we
need to be able to generate weak-
approximators for our DNF f with respect to
the distributions Di.

   Our algorithm is based on the Weak Parity
algorithm (WP) by Kushilevitz and Mansour.

12
The WP Algorithm
   Finds the large Fourier coefficients of a
Boolean function f on {0,1}n using a
Membership Oracle for f.




13
The WP’ Algorithm (1)
   Our learning algorithm will need to find the
large coefficients of a non-Boolean function.
   The basic WP algorithm can be extended to
the WP’ algorithm which works for non-
Boolean f as well.
   WP’ gives us a weak approximator for a non-
Boolean f with respect to the uniform
distribution.

14
The WP’ Algorithm (2)
   Input:
   MEM( f ) for f:{0,1}n→
   θ, δ, n, L( f ) > 0
   Output:
   With probability at least 1 – δ, WP’ outputs a set S such that
for all A:




   Running time:


15
Learning DNF with Respect to
Uniform
   We now show the main result: DNF is learnable
with respect to uniform.
   We begin by showing that for every DNF f and
distribution D there exists a parity function that
weakly approximates f with respect to D.
   We use this to produce an algorithm for weakly
learning DNF with respect to certain nonuniform
distributions.
   Finally we show that this weak learner can be
boosted into a strong learner with respect to the
uniform distribution.
16
Existence of Weak Approximating
Parity Functions for every f, D (1)
   For every DNF f and every distribution D
there exists a parity function that weakly
approximates f with respect to D.



   The more difficult case is when ED[ f ] ~ 0.

17
Existence of Weak Approximating
Parity Functions for every f, D (2)

   Let f be a DNF such that E[ f ] ~ 0.
   Let s be the number of terms in f.
   Let T(x) be the {-1,+1} valued function
equivalent to the term in f best correlated with
f with respect to D.

18
Existence of Weak Approximating
Parity Functions for every f, D (3)

19
Existence of Weak Approximating
Parity Functions for every f, D (4)
   T is a term of f  PrD [T(x) = f(x) | f(x) = -1] = 1
   There are s terms in f, T is the best correlated
with f  PrD [ T(x) = f(x) | f(x) = 1 ] ≥ 1/s

    PrD [ T(x) = f(x) ] ≥ 1/2(1 + 1/s)
    ED [fT] ≥ 1/s
20
Existence of Weak Approximating
Parity Functions for every f, D (5)
   T can be represented using the Fourier
transform.
   Define:



21
Nonuniform Weak DNF
Learning (1)
   We have shown that for every DNF f and
every distribution D there exists a parity
function that is a weak approximator for f with
respect to D.
   How can we find such a parity function?
   We want an algorithm that when given a
threshold θ and a distribution D finds a parity
such that, say:

22
Nonuniform Weak DNF
Learning (2)

23
Nonuniform Weak DNF
Learning (3)
   We have reduced the problem of finding a
well correlated parity to finding a large
Fourier coefficient of g.
   g is not Boolean
 therefore we use WP’.
   Invocation: WP’(n,MEM(g),θ,L(g) ,)

MEM(g)(x)  2n  MEM( f )(x)  D
24
The WDNF Algorithm (1)
   We define a new algorithm: Weak DNF
(WDNF).
   WDNF finds the large Fourier coefficients of
g(x)=2nf(x)D(x) therefore finding a parity that is
well correlated with f with respect to the
distribution D.
   WDNF makes use of the WP’ algorithm for
finding the Fourier coefficients of the non-
Boolean g.
25
The WDNF Algorithm (2)
   Proof of Existence:
   Let g(x)=2nf(x)D(x)



   Output with prob. 1 – :

   Running Time:
poly. in n, s, log(-1), and L(2nD)

26
The WDNF Algorithm (3)
   Input:
   EX(f,D)
   MEM( f )
   D
   δ>0
   Output:
   With probability at least 1 – δ :
parity function h (possibly negated) s.t.:
ED[fh] = Ω(s-1)
   Running time:
   polynomial in n, s, log(-1), and L(2nD)
27
The WDNF Algorithm (4)
   WDNF is polynomial in L(g) = L(2nD).
    If D is at most poly(n,s,ε, -1) / 2n then WDNF
runs polynomially in the normal parameters.
   Such D is referred to as polynomially-near
uniform.

    WDNF weakly learns DNF with respect to
any polynomially-near uniform distribution D.

28
Strongly Learning DNF
   We define the Harmonic Sieve Algorithm (HS).
   HS is an application of the F1 boosting
algorithm on the weak learner generated by
WDNF.
   The main difference between HS and F1 is
the need to supply WDNF with an oracle for
distribution Di at each stage of boosting.

29
The HS Algorithm (1)
   Input:
   EX( f,D)
   MEM( f )
   D
   s
   ε, > 0
   Output:
   With probability 1 –  :
h s.t. h is an ε-approximator of f with respect to D.
   Running Time:
   polynomial in n, s, ε-1, log(-1), and L(2nD)
30
The HS Algorithm (2)
   For WDNF to work, and work efficiently, two
requirements must be met:
   An oracle for the distribution must be provided for the
learner.
   The distribution must be polynomially-near uniform.
   We show how to simulate an approximate oracle Di’
that can be provided to the weak learner instead of
an exact one.
   We then show that the distributions Di are in fact
polynomially-near uniform.
31
Simulating Di (1)
   Define:

    To provide an exact oracle we need to
compute the denominator which could
potentially take an exponentially long time.
   Instead we will estimate the value of
using AMEAN.
32
Simulating Di (2)
   .













33
Implications of Using Di’
   Note that:
   gi’ = 2n f Di’ = 2nf ci Di = ci gi


    Multiplying the distribution oracle by a constant is
like multiplying all the coefficients of gi by the same
constant.
    The relative sizes of the coefficients stay the
same.
    WDNF will be able to find the large coefficients.
   The running time is not adversely affected.
34
Bound on Distributions Di
   It can be shown that for each i:

   Thus Di is bounded by a polynomial in L(D)
and ε-1.
    If is D polynomially-near uniform then Di is
also polynomially-near.
    HS strongly learns DNF with respect to the
uniform distribution.
35
Summary
   DNF can be weakly learned with respect to
polynomially-near distributions using the
WDNF algorithm.

   The HS algorithm strongly learns DNF with
respect to the uniform distribution by boosting
the WDNF weak learner.

36

DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 11 posted: 3/29/2011 language: English pages: 36