The Monotonicity of Information

Document Sample

```					     The Monotonicity of Information in the Central
Limit Theorem and Entropy Power Inequalities
Department of Statistics
Yale University

Abstract— We provide a simple proof of the monotonicity of                   for general collections we have a corresponding inequality
information in the Central Limit Theorem for i.i.d. summands.                   for inverse Fisher information. Details of these results can be
Extensions to the more general case of independent, not identi-                 found in [4].
cally distributed summands are also presented. New families of
Fisher information and entropy power inequalities are discussed.                   These inequalities are relevant for the examination of mono-
tonicity in central limit theorems. Indeed, if X1 and X2
are independent and identically distributed (i.i.d.), then (1) is
I. I NTRODUCTION                                          equivalent to
Let X1 , X2 , . . . , Xn be independent random variables with                                    X1 + X2
densities and ﬁnite variances, and let H denote the (dif-                                      H       √        ≥ H(X1 ).                    (5)
2
ferential) entropy. The classical entropy power inequality of
Shannon [1] and Stam [2] states                                                 This factn implies that the entropy of the standardized sums
P
X
n                                            Yn = i=1 i increases along the powers-of-2 subsequence,
√
n
e2H(X1 +...+Xn ) ≥          e2H(Xj ) .                       (1)   i.e., H(Y2k ) is non-decreasing in k. Characterization of the
j=1                                           increase in entropy in (5) was used in proofs of central
limit theorems by Shimizu [5], Barron [6] and Johnson and
Recently, Artstein, Ball, Barthe and Naor [3] proved a new
Barron [7]. In particular, Barron [6] showed that the sequence
entropy power inequality
{H(Yn )} of entropies of the normalized sums converges to the
n
1                    P                              entropy of the normal; this, incidentally, is equivalent to the
e2H(X1 +...+Xn ) ≥                 e2H         j=i   Xj
,       (2)
n−1                                                  convergence to 0 of the relative entropy (Kullback divergence)
i=1
from a normal distribution when the Xi have zero mean.
where each term involves the entropy of the sum of n − 1                           In 2004, Artstein, Ball, Barthe and Naor [3] (hereafter
of the variables excluding the i-th, which is an improvement                    denoted by ABBN [3]) showed that H(Yn ) is in fact a non-
over (1). Indeed, repeated application of (2) for a succession                  decreasing sequence for every n, solving a long-standing
of values of n yields not only (1) but also a whole family of                   conjecture. In fact, (2) is equivalent in the i.i.d. case to the
intermediate inequalities                                                       monotonicity property
1                    P
X1 + . . . + Xn           X1 + . . . + Xn−1
e2H(X1 +...+Xn ) ≥    n−1            e2H          j∈s   Xj
,   (3)      H         √             ≥H         √             .        (6)
m−1    s∈Ωm                                                          n                        n−1
where we write Ωm for the collection of all subsets of                          Note that the presence of the factor n − 1 (rather than n) in
{1, 2, . . . , n} of size m. Below, we give a simpliﬁed and direct              the denominator of (2) is crucial for this monotonicity.
proof of (3) (and, in particular, of (2)), and also show that                      Likewise, for sums of independent random variables, our
equality holds if and only if the Xi are normally distributed                   inequality (3) is equivalent to “monotonicity on average”
(in which case it becomes an identity for sums of variances).                   properties for certain standardizations; for instance,
In fact, all these inequalities are particular cases of a                                X1 + . . . + Xn          1                       i∈s Xi
generalized entropy power inequality, which we develop in                       exp 2H           √               ≥   n          exp 2H       √        .
n                m
m
[4]. Let S be an arbitrary collection of subsets of {1, . . . , n}                                                       s∈Ωm
and let r = r(S, n) be the maximum number of subsets in S                       A similar monotonicity also holds, as we shall show, when
in which any one index i can appear, for i = 1, . . . , n. Then                 the sums are standardized by their variances. Here again the
n−1                  n
1                                                   m−1 rather than m in the denominator of (3) for the
P
e2H(X1 +...+Xn ) ≥             e2H       j∈s   Xj
.            (4)   unstandardized version is crucial.
r
s∈S                                               We ﬁnd that all the above inequalities (including (2)) as
For example, if S consists of subsets s whose elements are                      well as corresponding inequalities for Fisher information can
m consecutive indices in {1, . . . , n}, then r = m, whereas                    be proved by simple tools. Two of these tools, a convolution
n−1
if S = Ωm , then r = m−1 . So (4) extends (3). Likewise                         identity for score functions and the relationship between Fisher
information and entropy (discussed in Section II), are familiar           This integrates to give an inequality for entropy that is an
in past work on entropy power inequalities. An additional trick           extension of the “linear form of the entropy power inequality”
n−1
is needed to obtain the denominators of n−1 and m−1 in (2)                developed by Dembo et al [10]. Speciﬁcally we obtain
and (3) respectively. This is a simple variance drop inequality
i∈s   Xi
for statistics expressible via sums of functions of m out of                 H(X1 + . . . + Xn ) ≤          ws H                        .     (10)
n−1
n variables, which is familiar in other statistical contexts (as                                     s∈Ωm               ws        m−1
we shall discuss). It was ﬁrst used for information inequality
development in ABBN [3]. The variational characterization of              Likewise using the scaling property of entropy on (10) and
Fisher information that is an essential ingredient of ABBN [3]            optimizing over w yields our extension of the entropy power
is not needed in our proofs.                                              inequality
For clarity of presentation, we ﬁnd it convenient to ﬁrst                                                 1
outline the proof of (6) for i.i.d. random variables. Thus,               exp{2H(X1 + . . . + Xn )} ≥       n−1             exp 2H                Xj   .
m−1    s∈Ωm                     j∈s
in Section III, we establish the monotonicity result (6) in a
simple and revealing manner that boils down to the geome-                    Thus both inverse Fisher information and entropy power
try of projections (conditional expectations). Whereas ABBN               satisfy an inequality of the form
[3] requires that X1 has a C 2 density for monotonicity of
Fisher divergence, absolute continuity of the density sufﬁces                 n−1
ψ X1 + . . . + Xn ≥                   ψ           Xi .      (11)
in our approach. Furthermore, whereas the recent preprint of                  m−1                                             i∈s
s∈Ωm
Shlyakhtenko [8] proves the analogue of the monotonicity
fact for non-commutative or “free” probability theory, his                We motivate the form (11) using the following almost trivial
method implies a proof for the classical case only assuming               fact, which is proved in the Appendix for the reader’s conve-
ﬁniteness of all moments, while our direct proof requires only            nience. Let [n] = {1, 2, . . . , n}.
ﬁnite variance assumptions. Our proof also reveals in a simple
Fact I: For arbitrary non-negative numbers {a2 : i ∈ [n]},
i
manner the cases of equality in (6) (c.f., Schultz [9]). Although
we do not write it out for brevity, the monotonicity of entropy                                        n−1
a2 =
i                         a2 ,
i                 (12)
for standardized sums of d-dimensional random vectors has                                              m−1
s∈Ωm i∈s                      i∈[n]
an identical proof.
We recall that for a random variable X with density f ,                where the ﬁrst sum on the left is taken over the collection
the entropy is H(X) = −E[log f (X)]. For a differentiable                 Ωm = {s ⊂ [n] : |s| = m} of sets containing m indices.
∂
density, the score function is ρX (x) = ∂x log f (x), and the
2
Fisher information is I(X) = E[ρX (X)]. They are linked by                   If Fact I is thought of as (m, n)-additivity, then (8) and
an integral form of the de Bruijn identity due to Barron [6],             (3) represent the (m, n)-superadditivity of inverse Fisher in-
which permits certain convolution inequalities for I to translate         formation and entropy power respectively. In the case of
into corresponding inequalities for H.                                    normal random variables, the inverse Fisher information and
Underlying our inequalities is the demonstration for inde-             the entropy power equal the variance. Thus in that case (8)
pendent, not necessarily identically distributed (i.n.i.d.) ran-          and (3) become Fact I with a2 equal to the variance of Xi .
i
dom variables with absolutely continuous densities that
II. S CORE F UNCTIONS AND P ROJECTIONS
n−1              2                               The ﬁrst tool we need is a projection property of score
I(X1 + . . . + Xn ) ≤                     ws I              Xi    (7)
m−1                         i∈s
functions of sums of independent random variables, which is
s∈Ωm
well-known for smooth densities (c.f., Blachman [11]). For
for any non-negative weights ws that add to 1 over all subsets            completeness, we give the proof. As shown by Johnson and
s ⊂ {1, . . . , n} of size m. Optimizing over w yields an                 Barron [7], it is sufﬁcient that the densities are absolutely
inequality for inverse Fisher information that extends the                continuous; see [7][Appendix 1] for an explanation of why
original inequality of Stam:                                              this is so.
1                  1                     1                     Lemma I:[C ONVOLUTION IDENTITY FOR SCORE FUNC -
≥    n−1                                .   (8)
I(X1 + . . . + Xn )      m−1
I(     i∈s   Xi )             TIONS ] If V1 and V2 are independent random variables, and
s∈Ωm
V1 has an absolutely continuous density with score ρV1 , then
Alternatively, using a scaling property of Fisher information          V1 + V2 has the score
to re-express our core inequality (7), we see that the Fisher
information of the sum is bounded by a convex combination                          ρV1 +V2 (v) = E[ρV1 (V1 )|V1 + V2 = v]                     (13)
of Fisher informations of scaled partial sums:

i∈s   Xi
I(X1 + . . . + Xn ) ≤          ws I                        .    (9)      Proof: Let fV1 and fV be the densities of V1 and
n−1
s∈Ωm            ws     m−1                    V = V1 + V2 respectively. Then, either bringing the derivative
inside the integral for the smooth case, or via the more general        Jacobsen [15], Rubin and Vitale [16], Efron and Stein [17],
formalism in [7],                                                       and Takemura [18]; these works include applications of such
∂                                                  decompositions to experimental design, linear models, U -
fV (v) =     E[fV1 (v − V2 )]                                statistics, and jackknife theory. The Appendix contains a brief
∂v
= E[fV1 (v − V2 )]                           (14)    proof of Fact II for the convenience of the reader.
= E[fV1 (v − V2 )ρV1 (v − V2 )]                         We say that a function f : Rd → R is an additive function
so that                                                                 if there exist functions fi : R → R such that f (x1 , . . . , xd ) =
i∈[d] fi (xi ).
fV (v)        fV1 (v − V2 )
ρV (v) =         =E                   ρV1 (v − V2 )
fV (v)           fV (v)                           (15)    Lemma II:[VARIANCE DROP]Let ψ : Rn−1 → R. Suppose,
= E[ρV1 (V1 )|V1 + V2 = v].                                 for each j ∈ [n], ψj = ψ(X1 , . . . , Xj−1 , Xj+1 , . . . , Xn ) has
mean 0. Then
n           2
The second tool we need is a “variance drop lemma”, which                       E            ψj         ≤ (n − 1)                  E[ψj ]2 .          (19)
goes back at least to Hoeffding’s seminal work [12] on U -                             j=1                                   j∈[n]
statistics (see his Theorem 5.2). An equivalent statement of
the variance drop lemma was formulated in ABBN [3]. In [4],             Equality can hold only if ψ is an additive function.
we prove and use a more general result to study the i.n.i.d.                 Proof: By the Cauchy-Schwartz inequality, for any Vj ,
case.
2
First we need to recall a decomposition of functions in                                 1                          1
Vj        ≤                 [Vj ]2 ,             (20)
L2 (Rn ), which is nothing but the Analysis of Variance                                    n                          n
j∈[n]                       j∈[n]
(ANOVA) decomposition of a statistic. The following con-
ventions are useful. [n] is the index set {1, 2, . . . , n}. For any    so that
2
s ⊂ [n], Xs stands for the collection of random variables
E               Vj        ≤n                E[Vj ]2 .             (21)
{Xi : i ∈ s}. For any j ∈ [n], Ej ψ denotes the conditional
j∈[n]                    j∈[n]
expectation of ψ, given all random variables other than Xj ,
i.e.,                                                                       ¯                                                    ¯
Let Es be the operation that produces the component Es ψ =
ψs (see the appendix for a further characterization of it); then
Ej ψ(x1 , . . . , xn ) = E[ψ(X1 , . . . , Xn )|Xi = xi   ∀i = j] (16)                            2                                             2
E            ψj        =E                            ¯
Es ψ j
averages out the dependence on the j-th coordinate.
j∈[n]                          s⊂[n] j∈[n]
Fact II:[ANOVA D ECOMPOSITION] Suppose ψ : Rn → R                                                    (a)
2
satisﬁes Eψ 2 (X1 , . . . , Xn ) < ∞, i.e., ψ ∈ L2 , for indepen-                                    =              E              ¯
Es ψ j
dent random variables X1 , X2 , . . . , Xn . For s ⊂ [n], deﬁne                                             s⊂[n]           j ∈s
/
(22)
the orthogonal linear subspaces                                                                      (b)                                             2
≤              (n − 1)                 ¯
E Es ψ j
Hs = {ψ ∈ L2 : Ej ψ = ψ1{j ∈s} ∀j ∈ [n]}
/                             (17)                                        s⊂[n]                  j ∈s
/
(c)
of functions depending only on the variables indexed by                                              = (n − 1)                     E[ψj ]2 .
s. Then L2 is the orthogonal direct sum of this family of                                                               j∈[n]
subspaces, i.e., any ψ ∈ L2 can be written in the form                  Here, (a) and (c) employ the orthogonal decomposition of
ψ=            ψs ,                      (18)    Fact II and Parseval’s theorem. The inequality (b) is based
s⊂[n]
on two facts: ﬁrstly, Ej ψj = ψj since ψj is independent of
¯
Xj , and hence Es ψj = 0 if j ∈ s; secondly, we can ignore the
where ψs ∈ Hs .                                                         null set φ in the outer sum since the mean of a score function
is 0, and therefore {j : j ∈ s} in the inner sum has at most
/
Remark: In the language of ANOVA familiar to statisticians,                                                    ¯
n − 1 elements. For equality to hold, Es ψj can only be non-
when φ is the empty set, ψφ is the mean; ψ{1} , ψ{2} , . . . , ψ{n}     zero when s has exactly 1 element, i.e., each ψj must consist
are the main effects; {ψs : |s| = 2} are the pairwise                   only of main effects and no interactions, so that it must be
interactions, and so on. Fact II implies that for any subset            additive.
s ⊂ [n], the function {R:R⊂s} ψR is the best approximation
(in mean square) to ψ that depends only on the collection Xs                         III. M ONOTONICITY IN THE IID CASE
of random variables.                                                       For i.i.d. random variables, inequalities (2) and (3) reduce
to the monotonicity H(Yn ) ≥ H(Ym ) for n > m, where
Remark: The historical roots of this decomposition lie in                                                               n
the work of von Mises [13] and Hoeffding [12]. For various                                              1
Yn = √                     Xi .                        (23)
reﬁnements and interpretations, see Kurkjian and Zelen [14],                                             n            i=1
For clarity of presentation of ideas, we focus ﬁrst on the i.i.d.                   Theorem I: Suppose Xi are i.i.d. random variables with
case, beginning with Fisher information.                                            densities. Suppose X1 has mean 0 and ﬁnite variance, and
n
Proposition I:[M ONOTONICITY OF F ISHER INFORMA -                                                               1
Yn = √                  Xi                   (29)
TION ]For i.i.d. random variables with absolutely continuous                                                     n        i=1
densities,
Then
I(Yn ) ≤ I(Yn−1 ),                                 (24)                        H(Yn ) ≥ H(Yn−1 ).                            (30)
with equality iff X1 is normal or I(Yn ) = ∞.                                       The two sides are ﬁnite and equal iff X1 is normal.

Proof: We use the following notational conventions: The                             Proof: Recall the integral form of the de Bruijn identity,
(unnormalized) sum is Vn = i∈[n] Xi , the leave-one-out sum                         which is now a standard method to “lift” results from Fisher
leaving out Xj is V (j) = i=j Xi , and the normalized leave-                        divergence to relative entropy. This identity was ﬁrst stated
1
one-out sum is Y (j) = √n−1 i=j Xi .                                                in its differential form by Stam [2] (and attributed by him to
1                                                de Bruijn), and proved in its integral form by Barron [6]: if
√
If X = aX, then ρX (X ) = a ρX (X); hence
Xt is equal in distribution to X + tZ, where Z is normally
√                                                                distributed independent of X, then
ρYn (Yn ) = nρVn (Vn )
∞
(a) √                                                                                                                          1
= nE[ρV (j) (V (j) )|Vn ]                                             H(X) =     1
2    log(2πev) −     1
2            I(Xt ) −       dt   (31)
0                   v+t
n
=         E[ρY (j) (Y (j) )|Vn ]          (25)                      is valid in the case that the variances of Z and X are both v.
n−1
n                                         This has the advantage of positivity of the integrand but the
(b)      1
=                          E[ρY (j) (Y (j) )|Yn ].              disadvantage that is seems to depend on v. One can use
n(n − 1)       j=1                                                                     ∞
1   1
log v =                   −                          (32)
Here, (a) follows from application of Lemma I to Vn = V +                    (j)                             0           1+t v+t
Xj , keeping in mind that Yn−1 (hence V (j) ) has an absolutely                     to re-express it in the form
continuous density, while (b) follows from symmetry. Set ρj =                                                                 ∞
1
ρY (j) (Y (j) ); then we have                                                          H(X) =     1
2   log(2πe) −     1
2            I(Xt ) −       dt.   (33)
0                  1+t
n
1                                                Combining this with Proposition I, the proof is ﬁnished.
ρYn (Yn ) =                       E         ρj Yn .              (26)
n(n − 1)             j=1                                                       IV. E XTENSIONS
Since the length of a vector is not less than the length of its                        For the case of independent, non-identically distributed
projection (i.e., by Cauchy-Schwartz inequality),                                   (i.n.i.d.) summands, we need a general version of the “variance
drop” lemma.
n        2
1
I(Yn ) = E[ρYn (Yn )]2 ≤                      E            ρj       .   (27)   Lemma III:[VARIANCE DROP : G ENERAL VERSION]Suppose
n(n − 1)        j=1
we are given a class of functions ψ (s) : R|s| → R for any
Lemma II yields                                                                 s ∈ Ωm , and Eψ (s) (X1 , . . . , Xm ) = 0 for each s. Let w be
any probability distribution on Ωm . Deﬁne
n         2
E          ρj       ≤ (n − 1)            E[ρj ]2 = (n − 1)nI(Yn−1 ), (28)                     U (X1 , . . . , Xn ) =              ws ψ (s) (Xs ),      (34)
j=1                         j∈[n]                                                                                   s∈Ωm

which gives the inequality of Proposition I on substitution into                    where we write ψ (s) (Xs ) for a function of Xs .Then
(27). The inequality implied by Lemma II can be tight only                                              n−1
if each ρj is an additive function, but we already know that                               EU 2 ≤                              2
ws E[ψ (s) (Xs )]2 ,     (35)
m−1
ρj is a function of the sum. The only functions that are both                                                        s∈Ωm

additive and functions of the sum are linear functions of the                       and equality can hold only if each ψ (s) is an additive function
sum; hence the two sides of (24) can be ﬁnite and equal only                        (in the sense deﬁned earlier).
if the score ρj is linear, i.e., if all the Xi are normal. It is
trivial to check that X1 normal or I(Yn ) = ∞ imply equality.                       Remark: When ψ (s) = ψ (i.e., all the ψ (s) are the same), ψ is
symmetric in its arguments, and w is uniform, then U deﬁned
We can now prove the monotonicity result for entropy in                           above is a U -statistic of degree m with symmetric, mean zero
the i.i.d. case.                                                                    kernel ψ. Lemma III then becomes the well-known bound for
the variance of a U -statistic shown by Hoeffding [12], namely                   B. Proof of Fact II
EU 2 ≤ m Eψ 2 .
n                                                                         Let Es denote the integrating out of the variables in s, so
that Ej = E{j} . Keeping in mind that the order of integrating
This gives our core inequality (7).
out independent variables does not matter (i.e., the Ej are
Proposition II:Let {Xi } be independent random variables                         commuting projection operators in L2 ), we can write
with densities and ﬁnite variances. Deﬁne                                                                 n
φ=           [Ej + (I − Ej )]φ
Tn =           Xi        and          (s)
Tm = T (s) =                Xi ,   (36)
j=1
i∈[n]                                               i∈s
=                  Ej          (I − Ej )φ             (42)
where s ∈ Ωm = {s ⊂ [n] : |s| = m}. Let w be any
(s)                                                            s⊂[n] j ∈s
/           j∈s
probability distribution on Ωm . If each Tm has an absolute
continuous density, then                                                                             =           φs ,
s⊂[n]
n−1            2   (s)
I(Tn ) ≤                 ws I(Tm ),         (37)
m−1                                                       where
s∈Ωm

where ws = w({s}). Both sides can be ﬁnite and equal only                                             ¯
φs = Es φ ≡ Esc                    (I − Ej )φ.            (43)
if each Xi is normal.                                                                                                        j ∈s
/

In order to show that the subspaces Hs are orthogonal, observe
Proof: In the sequel, for convenience, we abuse notation                   that for any s1 and s2 , there is at least one j such that s1 is
by using ρ to denote several different score functions; ρ(Y )                    contained in the image of Ej and s2 is contained in the image
always means ρY (Y ). For each j, Lemma I and the fact that                      of (I − Ej ); hence every vector in s1 is orthogonal to every
(s)
Tm has an absolutely continuous density imply                                    vector in s2 .
ρ(Tn ) = E ρ             Xi     Tn .                     (38)                                   R EFERENCES
i∈s
[1] C. Shannon, “A mathematical theory of communication,” Bell System
Taking a convex combinations of these identities gives, for any                       Tech. J., vol. 27, pp. 379–423, 623–656, 1948.
{ws } such that s∈Ωm ws = 1,                                                      [2] A. Stam, “Some inequalities satisﬁed by the quantities of information
of Fisher and Shannon,” Information and Control, vol. 2, pp. 101–112,
1959.
ρ(Tn ) =            ws E ρ             Xi     Tn                        [3] S. Artstein, K. M. Ball, F. Barthe, and A. Naor, “Solution of Shannon’s
s∈Ωm               i∈s                                         problem on the monotonicity of entropy,” J. Amer. Math. Soc., vol. 17,
(39)        no. 4, pp. 975–982 (electronic), 2004.
=E           ws ρ(T (s) ) Tn .                              [4] M. Madiman and A. Barron, “Generalized entropy power inequalities
and monotonicity properties of information,” Submitted, 2006.
s∈Ωm                                                   [5] R. Shimizu, “On Fisher’s amount of information for location family,” in
By applying the Cauchy-Schwartz inequality and Lemma III                              Statistical Distributions in Scientiﬁc Work, G. et al, Ed. Reidel, 1975,
vol. 3, pp. 305–312.
in succession, we get                                                             [6] A. Barron, “Entropy and the central limit theorem,” Ann. Probab.,
2                                    vol. 14, pp. 336–342, 1986.
I(Tn ) ≤ E               ws ρ(T (s) )                                    [7] O. Johnson and A. Barron, “Fisher information inequalities and the
central limit theorem,” Probab. Theory Related Fields, vol. 129, no. 3,
s∈Ωm
pp. 391–409, 2004.
n−1                                                     [8] D. Shlyakhtenko, “A free analogue of Shannon’s problem on
≤                       E[ws ρ(T (s) )]2                (40)        monotonicity of entropy,” Preprint, 2005. [Online]. Available:
m−1
s∈Ωm                                               arxiv:math.OA/0510103
n−1                                                     [9] H. Schultz, “Semicircularity, gaussianity and monotonicity of entropy,”
=                        2
ws I(T (s) ).                               Preprint, 2005. [Online]. Available: arxiv: math.OA/0512492
m−1                                                    [10] A. Dembo, T. Cover, and J. Thomas, “Information-theoretic inequali-
s∈Ωm
ties,” IEEE Trans. Inform. Theory, vol. 37, no. 6, pp. 1501–1518, 1991.
The application of Lemma III can yield equality only if                          [11] N. Blachman, “The convolution inequality for entropy powers,” IEEE
each ρ(T (s) ) is additive; since the score ρ(T (s) ) is already a                    Trans. Information Theory, vol. IT-11, pp. 267–271, 1965.
[12] W. Hoeffding, “A class of statistics with asymptotically normal distri-
function of the sum T (s) , it must in fact be a linear function,                     bution,” Ann. Math. Stat., vol. 19, no. 3, pp. 293–325, 1948.
so that each Xi must be normal.                                                  [13] R. von Mises, “On the asymptotic distribution of differentiable statistical
functions,” Ann. Math. Stat., vol. 18, no. 3, pp. 309–348, 1947.
A PPENDIX                                          [14] B. Kurkjian and M. Zelen, “A calculus for factorial arrangements,” Ann.
A. Proof of Fact I                                                                    Math. Statist., vol. 33, pp. 600–619, 1962.
[15] R. L. Jacobsen, “Linear algebra and ANOVA,” Ph.D. dissertation,
Cornell University, 1968.
[16] H. Rubin and R. A. Vitale, “Asymptotic distribution of symmetric
a2 =
¯s                  a2 =
i                        a2
i                            statistics,” Ann. Statist., vol. 8, no. 1, pp. 165–170, 1980.
s∈Ωm           s∈Ωm i∈s           i∈[n] S i,|s|=m                             [17] B. Efron and C. Stein, “The jackknife estimate of variance,” Ann. Stat.,
n−1 2               n−1                        (41)        vol. 9, no. 3, pp. 586–596, 1981.
=                 a =                                 a2 .          [18] A. Takemura, “Tensor analysis of ANOVA decomposition,” J. Amer.
m−1 i               m−1                  i                 Statist. Assoc., vol. 78, no. 384, pp. 894–900, 1983.
i∈[n]                                   i∈[n]

```
DOCUMENT INFO
Shared By:
Categories:
Stats:
 views: 21 posted: 8/2/2010 language: English pages: 5