208 by fanzhongqing

VIEWS: 8 PAGES: 11

									                                                                                                                                            1




               An interpretation of identification entropy
                                                            Rudolf Ahlswede and Ning Cai



   Abstract— After Ahlswede introduced identification for source              the expected number of checkings in the worst case for a best
coding he discovered identification entropy and demonstrated                  code, and finally, if v’s are chosen by a RV V independent
that it plays a role analogously to classical entropy in Shannon’s           of U and defined by Prob(V = v) = Qv for v ∈ V = U, we
Noiseless Source Coding.
   We give now even more insight into this functional interpreting           consider
its two factors.
  Index Terms— Source coding for identification, identification
                                                                                             LC (P, Q) =          Qv LC (P, v),         (1.4)
entropy, operational justification                                                                           v∈U

                                                                             the average number of expected checkings, if code C is used,
                         I. I NTRODUCTION                                    and also
                                                                                             L(P, Q) = min LC (P, Q)                (1.5)
   A. Terminology                                                                                             C
   Identification in Source Coding started in [3]. Then identifi-              the average number of expected checkings for a best code.
cation entropy was discovered and its operational significance                  A natural special case is the mean number of expected
in noiseless source coding was demonstrated in [4].                          checkings
   Familiarity with that paper is helpful, but not necessary here.
                                                                                                     N
As far as possible we use its notation.                                                                 1
                                                                                        ¯
                                                                                        LC (P ) =         LC (P, u), if U = [N ],       (1.6)
   Differences come from the fact that we use now a q-ary                                               N
coding alphabet X = {0, 1, . . . , q−1}, whereas earlier only the                                   u=1

case q = 2 was considered and it was remarked only that all                                                        1      1
                                                                             which equals LC (P, Q) for Q =        N,..., N   , and
results generalize to arbitrary q. In particular the identification
                                                                                                    ¯           ¯
                                                                                                    L(P ) = min LC (P ).                (1.7)
entropy, abbreviated as ID-entropy, for the source (U, P, U )                                                C
has the form
                                                                               Another special case of some “intuitive appeal” is the case
                             q                          2                    Q = P . Here we write
                HI,q (P ) =               1−           Pu    .       (1.1)
                            q−1
                                                u∈U                                            L(P, P ) = min LC (P, P ).               (1.8)
                                                                                                             C
  Shannon (1948) has shown that a source (U, P, U ) with
output U satisfying Prob (U = u) = Pu , can be encoded in a                     It is known that Huffman codes minimize the expected code
prefix code C = {cu : u ∈ U} ⊂ {0, 1, . . . , q − 1}∗ such that               length for PC.
for the q-ary entropy                                                           This is not always the case for L(P ) and the other quantities
                                                                             in identification.
  Hq (P ) =         −Pu logq Pu ≤              Pu ||cu || ≤ Hq (P ) + 1,        In this paper an important incentive comes from Theorem
              u∈U                        u∈U                                 4 of [4]:
where ||cu || is the length of cu .                                             For P N = (2−ℓ1 , . . . , 2−ℓN ), that is with 2-powers as
   We use a prefix code C for another purpose, namely                         probabilities L(P N , P N ) = HI (P N ). Here the assumption
noiseless identification, that is every user who wants to                     means that there is a complete prefix code (i.e. equality holds
know whether a v (v ∈ U) of his interest is the actual                       in Kraft’s equality).
source output or not can consider the RV C with C =                             B. A terminology involving proper common prefices
cu = (cu1 , . . . , cu||cu || ) if U = u and check whether C =                  The quantity LC (P, Q) is defined below also for the case
(C1 , C2 , . . . ) coincides with cv in the first, second etc. letter         of not necessarily independent U and V . It is conveniently
and stop when the first different letter occurs or when C = cu .              described in a terminology involving proper common prefices.
Let LC (P, u) be the expected number of checkings, if code C                    For an encoding c : U → X ∗ we define for two words
is used.                                                                     w, w′ ∈ X ∗       cp(w, w′ ) as the number of proper common
   Related quantities are                                                    prefices including the empty word, which equals the length of
                                                                             the maximal proper common prefix plus 1.
                      LC (P ) = max LC (P, v),                       (1.2)      For example cp(11, 000) = 1, cp(0110, 0100) = 3 and
                                   v∈U
                                                                             cp(1001, 1000) = 4 (since the proper common prefices are
that is, the expected number of checkings for a person in the
                                                                             ∅, 01, 100).
worst case, if code C is used,
                                                                                Now with encoding c for PC C and RV’s U and V
                         L(P ) = min LC (P ),                        (1.3)   cp(cU , cV ) measures the time steps it takes to decide whether
                                     C
                                                                             U and V are equal, that is, the checking time or waiting time,
  Both authors are with the University of Bielefeld.                         which we denote by
                                                                                                                                                 2


                                                                                      (L)            (L)
                                                                         where PC , and QC are row vectors obtained by deleting
                    WC (U, V ) = cp(cU , cV ).                   (1.9)   the components y ∈ X ≤L .
                                                                                             /
                                                                            Sometimes the expressions (1.17) or (1.19) are more conve-
Clearly, we can write the expected waiting time as
                                                                         nient for the investigation of LC (P, Q). For example it is easy
                  EWC (U, V ) = Ecp(cU , cV ).                  (1.10)   to see that Λ and therefore also Λ(L) are positive semidefinite.
                                                                         Indeed, let ∆ (resp. ∆(L) ) be a matrix whose rows are
  It is readily verified that for independent U , V , that is,            labelled by sequences in X ∗ (resp. X ≤L ) and whose columns
P r(U = u, V = v) = Pu Qv                                                are labelled by sequences in X ∗ (resp. X ≤L−1 ∪ {empty
         EWC (U, V ) = LC (P, Q) = Ecp(cU , cV ).               (1.11)   sequence}) such that its (x, y)-entry is

  We give now another description for EWC (U, V ). For a                                ∗            1       if y is a proper prefix of x
                                                                                       δy (x) =
word w ∈ X ∗ and a code C define as subset of U                                                       0       otherwise.
      U(C, w) = {u ∈ U : cu        has proper prefix w}          (1.12)   Then
and its indicator function 1U (C,w) . Now                                        ∆ ∆T = Λ and ∆(L) (∆(L) )T = Λ(L)                         (1.21)
      E cp(cU , cV ) =           P r(U = u, V = v)cp(cu , cv )           and hence Λ and Λ(L) are positive semidefinite.
                         u,v∈U                                              Therefore by (1.19) LC (P, P ) is (∪)-convex in P .
                    =            P r(U = u, V = v)×                         Furthermore for sources (U, P ) with |U| = 2k and the
                         u,v∈U
                                                                         block code C = {0, 1}k the uniform distribution on U achieves
                                                                         min LC (P, P ).1
                               1U (C,w) (u)1U (C,w)(v)                     P
                                                                            Another interesting observation on (1.20) is that as the w-
                          w                                                                  (L)              (L)
                                                                         th component of PC ∆(L) (resp. QC ∆(L) ) is P U(C, w)
                    =          P r U ∈ U(C, w), V ∈ U(C, w)              (resp. Q U(C, w) ), application of the Cauchy-Schwarz in-
                         w
                                                                         equality to (1.20) yields
and by (1.11).                                                                                           2
                                                                                (L)            (L)
                                                                             PC Λ(L) (QC )T
 E WC (U, V ) =          P r U ∈ U(C, w), V ∈ U(C, w) . (1.13)                               (L)             (L)        (L)        (L)
                    w                                                                 ≤ PC Λ(L) (PC )T · QC Λ(L) (QC )T                    (1.22)
  C. A matrix notation                                                   and equality holds iff for all w
  Next we look at the double infinite matrix
                                                                                               P U(C, w) = Q U(C, w) .
                  Λ = cp(w, w′ )         w∈X ∗ ,w ′ ∈X ∗
                                                                (1.14)
                                                                         We state this in equivalent form as
and its minor Λ(L) labelled by sequences in X ≤L .                        Lemma 1:
  Henceforth we assume that U and V are independent and
                                                                                             LC (P, Q)2 ≤ LC (P, P )LC (Q, Q)              (1.23)
have distributions P and Q. We can then use (1.11)
  For a prefix code C P induces the distribution PC and Q                 and equality holds iff for all w
induces the distribution QC , when for u, v ∈ U
                                                                                               P U(C, w) = Q U(C, w) ,
                  PC (cu ) = Pu , QC (cv ) = Qv                 (1.15)
                                                                           which implies LC (P, Q) = LC (P, P ) = LC (Q, Q).
and                                                                        This suggests to introduce
         PC (x) = QC (x) = 0        for x ∈ X ∗            C.   (1.16)                                           LC (P, Q)2
                                                                                            µC (P, Q) =                          ≤1
                                                                                                             LC (P, P )LC (Q, Q)
Viewing both, PC and QC as row vectors, then for the corre-
sponding column vector QT equation (1.11) can be written in
                         C
                                                                         as a measure of similarity of sources P and Q with respect to
the form                                                                 the code C.
                  LC (P, Q) = PC ΛQT .
                                    C                (1.17)                 Intuitively we feel that for a good code for source P and
                                                                         Q as user distribution P and Q should be very dissimilar,
It is clear from (1.10) that a non-complete prefix code, that is          because then the user waits less time until he knows that the
one for which the Kraft sum is smaller than 1, can be improved           output of U is not what he wants.
for identification by shortening a suitable codeword. Hence an               This idea will be used later for code construction. Actually
optimal ID source code is necessarily complete. In such a                it is clear even in the general case where U and V are not
code                                                                     necessarily independent.
                      max cu ≤ |U| − 1                   (1.19)             To simplify the discussion we assume here that the alphabet
                         u∈U
                                                                         X is binary, i.e. q = 2.
and one can replace Λ by its submatrix Λ(L) for L = |U| − 1.
This implies                                                               1 A proof is given in the forthcoming Ph.D. thesis “L-identification for

                                   (L)            (L)
                                                                         sources” written by C. Heup at the Department of Mathematics of the
                 LC (P, Q) = PC Λ(L) (QC )T ,                   (1.20)   University of Bielefeld.
                                                                                                                                             3



   Then the first bit of a codeword partitions the source U into        a tree T, where Ui corresponds to the set of codewords Ci ,
           ¯                         ¯
two parts U(i1 ); i1 = 0, 1; where U(i1 ) = {u ∈ U : cu1 = i1 }.       U0 ∪ U1 = U.
   By (1.13) to minimize E WC (U, V ) one has to choose a                The leaves are labelled so that U0 = {1, 2, . . . , N0 } and
                                ¯              ¯
partition such that P r U ∈ U(i1 ), V ∈ U(i1 ) ’s are small            U1 = {N0 + 1, . . . , N0 + N1 }, N0 + N1 = N . Using
simultaneously for i1 = 0, 1. To construct a good code one             probabilities
                                    ¯        ¯
can continue this line: partition U (i1 ) to U(i1 , i2 )’s such that                   Qi =       Pu ,    i = 0, 1
           ¯                  ¯              ¯              ¯
P r U ∈ U(i1 , i2 ), V ∈ U(i1 , i2 ) | U ∈ U (i1 ), V ∈ U(i1 ) ’s                                   u∈Ui
are as small as possible for i1 , i2 = 0, 1 and so on.
   When U and V are independent the requirement for a                  we can give the decomposition in
                                               ¯
good code is that the difference between P U(i1 , . . . , ik ) and       Lemma 2: [4] For a code C for (U, P N )
    ¯
Q U(i1 , . . . , ik ) is large.
                                                                       LC ((P1 , . . . , PN ), (P1 , . . . , PN )) =
   We call this the LOCAL UNBALANCE PRINCIPLE in
                                                                                       P1        PN0           P1       PN0
contrast to the GLOBAL BALANCE PRINCIPLE below.                        1 + LC0             ,...,            ,     ,...,        Q2 +
                                                                                                                                 0
   Another extremal case is that U and V are equal with                               Q0          Q0          Q0        Q0
probability one and in this case one may never use the                        PN0 +1             PN0 +N1          PN0 +1       PN0 +N1
                                                                       LC1                ,...,                ,         ,...,           Q2 .
                                                                                                                                          1
unbalance principle. However in this case the identification                       Q1                Q1              Q1           Q1
for the source makes no sense: The user knows that his                   This readily yields
output definitely comes! But still we can investigate the                 Theorem 1: [4] For every source (U, P N )
problem by assuming that with high probability U = V .
More specifically, we consider the limit of E WC (Uk , Vk )                            L(P N ) ≥ L(P N , P N ) ≥ HI (P N ).
for a sequence of random variables (Uk , Vk )∞ such that                 Proof: We proceed by induction on N . The base case
                                                    k=1
Uk converges to Vk in probability. Then it follows from                N = 2 can be established as folows. For N = 2 and any
Proposition 1 that E WC (Uk , Vk ) converges to the average            C LC (P 2 , P 2 ) ≥ P1 + P2 = 1, but
length of codewords, the classical object in source coding!                                       2
                                                                               HI (P 2 ) = 2(1 − P1 − (1 − P1 )2 )
In this sense identification for sources is a generalization of                                         2
source coding (data compression).                                                          = 2(2P1 − 2P1 ) = 4P1 (1 − P1 ) ≤ 1.
   One of the discoveries of [4] is that ID-entropy is a lower
                                                                         For the induction step use for any code C the decomposition
bound to LC (P, P ). In Section 2 we repeat the original proof
                                                                       formula in Lemma 2 above and of course the desired inequality
and we give in Section 3 another proof of this fact via two
                                                                       for N0 and N1 as induction hypothesis.
basic tools, Lemma 3 and Lemma 4 for LC (P n , P n ), where
P n is the distribution of a memoryless source. It provides a            LC ((P1 , . . . , PN ), (P1 , . . . , PN ))
                                                                 q
clear information theoretical meaning of the two factors q−1
                                                                                                                2
      1−           2
                  Pu                                                                                       Pu
and                    of ID-entropy.                                        ≥1+2 1−                                   Q2
                                                                                                                        0
            u∈U                                                                                            Q0
                                                                                                u∈U0
   Next we consider in Section 4 sufficient and necessary
                                                                                                                 2
conditions for a prefix code C to achieve the ID-entropy                                                    Pu
                                                                                    +2 1−                              Q2
                                                                                                                        1
lower bound for LC (P, P ). Quite surprisingly it turns out                                                Q1
                                                                                                 u∈U1
that the ID-entropy bound for ID-time is achieved by a
variable length code iff the Shannon entropy bound for                       ≥ HI (Q) + Q2 HI (P (0) ) + Q2 HI (P (1) ) = HI (P N ),
                                                                                         0                1
the average length of codewords is achieved by the same
code (Theorem 2).                                                      where Q = (Q0 , Q1 ), 1 ≥ HI (Q), P (i) = Pu           , and the
                                                                                                                     Qi
   Finally we end the paper in Section 5 with a global balance                                                           u∈Ui
                                                                       grouping identity is used for the equality. This holds for every
principle to find good codes (Theorem 3).
                                                                       C and therefore also for min LC (P N ).
                                                                                                  C
                                                                         The approach readily extends also to the q-ary case.
 II. A N   OPERATIONAL JUSTIFICATION OF        ID- ENTROPY     AS
                  LOWER BOUND FOR       LC (P, P )
                                                                       III. A N   ALTERNATIVE PROOF OF THE ID- ENTROPY LOWER
  Recall from the Introduction that for q = 2                                            BOUND FOR LC (P, P )
                           N
                                 2                                        First we establish Lemma 3 below, which holds for the more
      HI (P ) = 2 1 −           Pu   for P = (P1 . . . PN ).                                                       ∞
                                                                       general case E WC (U, V ). Let (U n , V n ) n=1 be a discrete
                          u=1
                                                                       memoryless correlated source with generic pair of variables
  We repeat the first main result for L(P, P ) from [4].                (U, V ). Again U n serves as (random) source and V n serves as
  Central in our derivation is a proof by induction based on           random user. For a given code C for (U, V ) let C n be the code
a decomposition formula for trees.                                     obtained by encoding the components of sequence un ∈ U n
  Starting from the root a binary tree T goes via 0 to the             iteratively. That is, for all un ∈ U n
subtree T0 and via 1 to the subtree T1 with sets of leaves U0
and U1 , respectively. A code C for (U, P ) can be viewed as                                 cnn = (cu1 , cu2 , . . . , cun ).
                                                                                              u                                          (3.1)
                                                                                                                                                     4



  Lemma 3:                                                                          Proof: For given ε > 0 we choose δ > 0 such that for a
                                                                                                                                    n
                                                       n−1                        τ > 0 and sufficiently large n for familiar sets TP,δ of typical
 E WC n (U n , V n ) = E WC (U, V ) 1 +                      P r(U t = V t )      sequences
                                                       t=1
                                                                                                            n
                                                                                                     P n (TP,δ ) > 1 − 2−nτ
                                                                          (3.2)
and therefore                                                                                       n
                                                                                  and for all un ∈ TP,δ
                                                                                                                               ε
                                          E WC (U, V )                                               P (un ) < 2−n(H(P )− 2 ) .
       lim E WC n (U n , V n ) =                        .                 (3.3)
       n→∞                              1 − P r(U = V )
                                                    n                             Since for a prefix code C
  Proof: Since P r(U n = V n ) =                           P r(Ut = Vt ) =
                                                   t=1                                          |{un ∈ U n : cU n ≤ Ln }| ≤ q Ln               (3.10)
P rn (U = V ) (3.3) follows from (3.2) immediately by the
summation formula for geometric series.
  To show (3.2) we define first for all t ≥ 2 random variables                        P r( cU n ≤ Ln ) = P r( cV n ≤ Ln )
                                        t−1            t−1
                                                                                                    / n                    n
                                                                                          ≤ P r(V n ∈ TP,δ ) + P r(V n ∈ TP,δ , cV n ≤ Ln )
                             0 if U    =V                                                                                                  ε
                  Zt =                                                    (3.4)           < 2−nτ + |{un : cun < Ln }| · 2−n(H(P )− 2 )
                             1 otherwise.                                                                                 ε
                                                                                          ≤ 2−nτ + q Ln 2−n(H(P )− 2 ) .                       (3.11)
and for t = 1 we let Z1 be a constant for convenience of
notation. Further we let Wt be the waiting time for the random                    However, (3.8) implies that
user V n in the t-th block.
                                                                                                          q Ln ≤ 2n(H(P )−ε) .
   Conditional on Zt = 1 it is defined like WC (U, V ) in (1.9)
and conditional on Zt = 0 obviously P r(Wt = 0 | Zt = 0) =                        This together with (3.11) yields
1, because the random user has made his decision before the                                                                   ε

t’s step. Moreover by the definition of C n                                               P r( cU n ≤ Ln ) < 2−nτ + 2−n 2 < 2−nδ                (3.12)

                E[Wt | Zt = 1] = E WC (U, V )                             (3.5)   for δ min τ , 4 .
                                                                                                 2
                                                                                                   ε

                                                                                     Next, for the distribution P and the code C over U n we
and consequently                                                                                                ˜ ˜                ˜     ˜
                                                                                  construct a related source (U , P ) and a code C over U as
                                                                                  follows.
 E[E(Wt | Zt )]                                                                                    ˜
                                                                                     The new set U contains {un ∈ U n : cun ≤ Ln } and
       P r(U t−1 = V t−1 )E WC (U, V )                  for t = 2, 3, . . . , n                     ˜
                                                                                  for its elements P (un ) = P n (un ) and the new ∼-coding is
   =                                                                              ˜
                                                                                  cu n = cun .
       E WC (U, V )                                     for t = 1
                                                                                                                               ˜        ˜
                                                                                     Now we define the additional elements in U with its P and
                                                                          (3.6)
                                                                                  ˜
                                                                                  c.
where (3.6) holds in case t = 1, because the random user has                         We partition {un ∈ U n : cun > Ln } into subsets Sj (1 ≤
to wait for the first outcome. Therefore it follows that                           j ≤ J) according to the Ln -th prefix and use letter gj to
                                  n                    n                                                        ≈
                                                                                                                                          ˜
                                                                                  represent Sj and put the set U = {gj : 1 ≤ j ≤ J} into U so
EWC n (U n , V n ) = EW n =             EWt =                E[E(Wt | Zt )]
                                                                                  that
                                  t=1               t=1                                                                          ≈
                                            n−1
                                                                                                ˜
                                                                                                U = {un ∈ U n : cun ≤ Ln } ∪ U .
                  = EWC (U, V ) +                 P r(U t , V t )EWC (U, V )                    ˜
                                                                                  Then we define P (gj ) =                 P (un ) and let cgj be the
                                                                                                                                          ˜
                                            t=1                                                                  un ∈Sj
as we wanted to show.                                                             common Ln -th prefix of the cun ’s for the un ’s in Sj . That
  Next we consider the case where U and V are independent                         is, we consider all un sharing the same Ln -th prefix in cun as
and identically distributed with distribution P so that                           a single element. Obviously,
                                                   n                                                                      ˜ ˜
                                                                                                    LC (P n , P n ) ≥ LC (P , P ).
                                                                                                                       ˜                       (3.13)
                  n      n    n         n
          P r(U = u , V           =v )=                 Put · Pvt .       (3.7)
                                                  t=1
                                                                                              ˜      ˜
                                                                                  Finally let Un and Vn be random variables for the new source
                                                                                                                           ˜
                                                                                  and new random user with distribution P and let Z be a
  More specifically we are looking for a lower bound on
                                                                                  random variable such that
LC (P n , P n ) for all prefix codes C over U n .
  Lemma 4: For all ε > 0 there exists an η > 0 such that for                                0   if both cU n and cVn are larger than Ln
sufficiently large n and all positive integers                                        Z=
                                                                                            1   otherwise.
                Ln = ⌊n H(P ) − ε (log q)−1 ⌋                             (3.8)   Then
for all prefix codes C over U n                                                        ˜ ˜
                                                                                  LC (P , P ) = EW = E(W | Z) ≥ P r(Z = 0)E(W | Z = 0)
                                                                                   ˜
                                                   Ln −1                                                                                       ≈ ≈
                                                                                             = P r( cU n ≥ Ln )P r( cV n ≥ Ln ) · L≈ (P , P )
             LC (P n , P n ) > (1 − 2−nη )                    q −t .      (3.9)                                                    C
                                                       t=0                                                                             (3.14)
                                                                                                                                                                    5


                                                                ≈
where W is the random waiting time, P is the common                                         Letting n → ∞ we obtain a geometric distribution.
                                          ≈
                             ˜        ˜           ˜
conditional distribution of Un given Un ∈ U , and Vn given                                    The expected waiting time is
      ≈      ≈          ˜           ≈    ≈
˜                      P (g)                                                                                 ∞
Vn ∈ U, i.e. P (gj ) = ≈ for gj ∈ U and C is the restriction
                              ˜
                              P (U)                                                               EW =            ℓP r(U = V )ℓ−1 1 − P r(U = V )
         ≈
   ˜
of C to U . ≈                                                                                               ℓ=0
                                                                                                             ∞                                 ∞
  Notice that C is a block code of length Ln . In order to bound                                        =         (ℓ + 1)P r(U = V )ℓ −             P r(U = V )ℓ
    ≈ ≈                     ≈
                                                                      Ln
L≈ (P , P ) we extend U to a set of cardinality q in the case                                               ℓ=0                               ℓ=0
  C                                                                                                          ∞
of necessity and assign zero probabilities and a codeword of                                                                                 1
                     ≈                                                                                  =         P r(U = V )ℓ =                               (3.19)
length Ln not in C. This little modification obviously does not                                                                        1 − P r(U = V )
                                ≈ ≈                                                                         ℓ=0
change the value of L≈ (P , P ). Thus, if we denote the uniform                                                   1
                            C
                                                   ≈
                                                                                            which equals     1−
                                                                                                                  P     2
                                                                                                                       Pu   in the case of independent, identically
                                   ¯    ¯
distribution over the extended set U by P , we have                                                                u
                                                                                            distributed random variables.
                                ≈ ≈                                                            (Actually (3.2) holds for all stationary sources and we
                                           ¯ ¯
                         L≈ (P , P ) ≥ L≈ (P , P )                                 (3.15)
                          C                    ¯
                                               C                                            choose a memoryless source for simplicity.) In general (3.3)
       ≈                                       ≈                                            has the form
        ¯                         ¯
where C is a bijective block code U → X Ln .
                     ˜ ω) = ∅ iff the length of ω is smaller
  It is clear that U(C,                                                                         lim EWC n (U n , V n )
                                                                                               n→∞
than Ln − 1 and                                                                                                                      n−1
              ˜
          U(C, ω) = X L∗ −1 , if ω = ℓ ≤ Ln − 1.                                               = EWC (U, V ) · lim              1+         P r(U t = V t ) .   (3.20)
                                                                                                                       n→∞
                                                                                                                                     t=1
Then it follows from (1.13) that
                     Ln −1                                      Ln                          By monotonicity the limit at the right hand side and therefore
        ¯ ¯
    L≈ (P , P ) =               t
                               q [q   Ln −t
                                              ·q   −Ln 2
                                                         ] =          q   −t
                                                                               .   (3.16)   also at the left hand side exists and equals a positive finite or
     ¯
     C                                                                                      infinite value.
                      t=0                                       t=0
Finally we combine (3.12), (3.13), (3.14), (3.15) and (3.16)                                   When it is finite one may replace P r(U = V )t−1 , P r(U =
and Lemma 4 follows.                                                                        V ) and P r(U = V )t in the first lines of (3.19) by P r(U t−1 =
  An immediate consequence is                                                               V t−1 ), P r(Ut = Vt | U t−1 = V t−1 ) and P r(U t = V t ),
  Corollary 1:                                                                              respectively, and obtain
                                              ∞                                                              n−1
                                            q
                 lim L(P n , P n ) ≥            .      q −t =
                                                       (3.17)                                  lim      1+         P r(U t = V t )
            n→∞
                                 t=0
                                          q−1                                                 n→∞
                                                                                                             t=1
Furthermore for independent, identically distributed random                                       ∞
variables U, V with distribution P we have                                                    =         tP r(U t−1 = V t−1 ) · P r(Ut = Vt | U t−1 = V t−1 )
                                                           2
                         P r(U = V ) =                    Pu                                      t=0

                                                   u∈U                                        = EL,                                                            (3.21)
and from (3.3) and (3.17) follows the ID-entropy bound.
                                                                                            the expectation of random leaving time L for a stationary
  Corollary 2: (See Theorem 2 of [4])
                                                                                            source.
                             q               2                                              Thus (3.20) is rewritten as
                 LC (P, P ) ≥      1−       Pu .         (3.18)
                           q−1
                                       u∈U                                                           lim E WC n (U n , V n ) = E WC (U, V )EL.                 (3.22)
This derivation provides a clear information theoretical mean-                                       n→∞
                                       q
ing to the two factors in ID-entropy: q−1 is a universal lower
                                                                                            Now the information theoretical meaning of (3.22) is quite
bound on the ID-waiting time for a discrete memoryless source
                                                                                            clear. One encodes a source (U n , V n )∞ with alphabet U
                                                                                                                                    n=1
with an independent user having the same distribution P .
       1
      P 2 is the cost paid for coding the source componen-
                                                                                            component by component by a variable length code C. The first
   1−    P   u                                                                              term at the right hand side of (3.22) is the expected waiting
     u∈U
twise and leaving time for the random user in the following                                 time in a block and the second term is the expected waiting
sense.                                                                                      time for different Ut and Vt .
   Let us imagine the following procedure:
At a unit of time the random source U n outputs a symbol
                                                                                               IV. S UFFICIENT AND NECESSARY CONDITIONS FOR A
Ut and the random user V n , who wants to know whether
                                                                                              PREFIX CODE   C TO ACHIEVE THE ID- ENTROPY LOWER
U n = V n , checks whether Ut coincides with his own symbol
                                                                                                               BOUND OF LC (P, P )
Vt . He will end if not. Then the waiting time for him is ℓ with
probability                                                                                    Quite surprisingly the ID-entropy bound to ID-waiting time
    P r(U    ℓ−1
                   =V    ℓ−1
                               )P r(Uℓ = Vℓ )                                               is achieved by a variable length code iff the Shannon entropy
                                                                                            bound to the average lengths of codewords is achieved by the
           = P r(U = V )ℓ−1 1 − P r(U = V ) for ℓ ≤ n.                                      same code.
                                                                                                                                                                           6



  For the proof we use a simple consequence of the Cauchy-                                        alphabet U(α) (C) and distribution P(α) such that for all u ∈
Schwarz inequality, which states for two sequences of real                                        U(α) (C) and X ′ = {cu : u ∈ U1 (C)}
numbers (a1 , a2 , . . . , ak ) and (b1 , b2 , . . . , bk ) that
                                                                                                                    P(α) (u) = P −1 U(α) (C) Pu .
                 k                 2            k                    k
                      ai b i           ≤              a2                  b2              (4.1)     Then (4.3) and (4.4) imply that (ii) holds for all C(α) , α ∈
                                                       i                   i
                i=1                             i=1              i=1                              U1 (C) and for all β ∈ U1 (C)
with equality iff for some constant, say γ, ai = γ bi for all i                                                         Pβ = |U1 (C)|−1 P U1 (C) .                   (4.7)
or bi = c ai for all i.
Choosing bi = 1 for all i one has                                                                 Next we apply (4.3) to all ω with U(C, ω) and ω = 1 and
                                                                                                  obtain
                               k            2              k
                                                                                                            P r U ∈ U1 (C) = q − |U1 (C)| q −1 ,
                                                                                                                    /                                (4.8)
                                       ai       ≤k              a2
                                                                 i                        (4.2)
                             i=1                       i=1                                        which with (4.7) yields for all β ∈ U1 (C)
with equality iff a1 = a2 = · · · = ak .                                                                                         Pβ = q −1 .                         (4.9)
   Theorem 2: Let C be a prefix code. Then the following
statements are equivalent                                                                         Moreover, by the induction hypothesis for all C(α) and P(α) ,
   (i)    Pu cu = H(P )                                                                           α ∈ U1 (C)
        u∈U                                                                                                                                         
 (ii) For all ω ∈ X ∗ with U(C, ω) = ∅                                                                                      q 
                                                                                                    LC(α) (P(α) , P(α) ) =       1 − q2            2
                                                                                                                                                 Pu  (4.10)
                               P U(C, ω) = q −                       ω
                                                                                          (4.3)                            q−1
                                                                                                                                                   u∈U(α) (C)
                         ′
        and for all u, u ∈ U such that    cu = cu′ and such                                       as by (4.3)
        that cu and cu′ share the same prefix of length cu − 1                                                              P U(α) (C) = q −1                        (4.11)
        implies
                               Pu = Pu′ .                (4.4)                                    for all α ∈ X ∆ = X {cu : u ∈ U1 (C)} (say).
                                                                                                    Finally, like in the proof of (1.11) we have
(iii)
                                q                                                                  LC (P, P ) = 1 +             P 2 U(α) (C) LC(α) (P(α) , P(α) )
                                                  2
                 LC (P, P ) =          1−      Pu .                                       (4.5)                         α∈X ∆
                              q−1                                                                                                                              
                                           u∈U
   Proof: It is well-known that (i) is equivalent to                                                                        1    1 − q 2                   2
                                                                                                      =1+                                                  Pu 
(i’) For all u ∈ U                                                                                                      q(q − 1)
                                                                                                                α∈X ∆                         u∈U(α) (C)
          cu = −[log q]−1 log Pu or Pu = q −                              cu
                                                                                .         (4.6)                     ∆
                                                                                                                 |X |      q                        2
                                                                                                      =1+               −                          Pu
Notice that for (i) the code C is necessarily complete. We shall                                                q(q − 1) q − 1
                                                                                                                                         /
                                                                                                                                        u∈U1 (C)
show that                                                                                                       q − |U1 (C)|    q                 2      q
                   (i′ ) ⇒ (ii) ⇒ (iii) ⇒ (i′ ).                                                      =1+                    −                   Pu +       |U1 (C)|q −2
                                                                                                                  q(q − 1)     q−1                      q−1
                                                                                                                                           u∈U
  Ad (i’) ⇒ (ii): For all ω with U(C, ω) = ∅ the code                                                      q                     2
Cω obtained by deleting the common prefix ω from all the                                               =            1−           Pu , that is (4.5),
                                                                                                          q−1
codewords cu , u ∈ U(C, ω), is a complete code on U(C, ω),                                                                u∈U

because C is a complete code. That is,                                                            where the second equality holds by (4.10), the third equality
                                                                                                  holds, because {U1 (C), U(α) (C), α ∈ X ′ } is a partition of U,
                                        q −[    cu − ω ]
                                                                =1
                                                                                                  and the fourth equality follows from (4.9) and the definition
                      u∈U (C,ω)
                                                                                                  of X ∆ .
and consequently by (4.6)                                                                           Ad (iii) ⇒ (i’): Again we proceed by induction on the
                                                                                                  maximum length of codewords.
        P U(C, ω) =                         Pu =                         q−    cu
                                                                                                    Suppose first that for a code C         ℓmax (C) = 1. Then
                        u∈U (C,ω)                     u∈U (C,ω)                                   LC (P, P ) = 1 and |U| ≤ q. Applying (4.2) to the ID-entropy
                          − ω
                     =q                               q(       cu − ω )
                                                                               = q−   ω
                                                                                          .       we get
                                       u∈U (C,ω)
                                                                                                             q                      2         q
   Ad (ii) ⇒ (iii): Suppose (4.3) holds for all ω and we prove                                                          1−         Pu    ≤       (1 − |U|−1 )
                                                                                                            q−1                              q−1
                                                                                                                             u∈U
(iii) by induction on ℓmax (C) = max cu .
                                                    u∈U
   In case ℓmax (C) = 1 both sides of (4.5) are one. Assume                                       with equality iff P is the uniform distribution. On the other
                                                                                                                           q                   q       1
(iii) holds for all codes C ′ with ℓmax (C ′ ) ≤ L − 1 and let                                    hand, since |U| ≤ q, q−1 (1 − |U|−1 ) ≤ q−1 1 − q = 1
ℓmax (C) = L. Let U1 (C) and U(α) (C), be as in the proof of                                      and the equality holds iff |U| = q. Then (4.5) holds iff P is
(1.11) and let C(α) be the prefix code for the source with                                         uniform and |U| = q, i.e. (4.6).
                                                                                                                                                                        7



   Assume now that the implication (iii) ⇒ (i’) holds for all                                      By (4.13) the first inequality holds iff Pu = q −            cu
                                                                                                                                                                     for
                                                                                                                      q−1
codes with maximum lengths ≤ L − 1 and that C is a prefix
                                                                                                 u ∈ U ∩ U ′ and            Pu(i,j) = q −(L−1) for i = 1, 2, . . . , k;
code of maximum length ℓmax (C) = L.                                                                                  j=0
   Without loss of generality we can assume that C is complete,                                  it follows from (4.2) that the last inequality holds and with
because otherwise we can add “dummy” symbols with 0                                              equality iff
probability to U and assign to them suitable codewords so
that the Kraft sum equals 1, but this does not change equality                                      Pu(i,0) = Pu(i,1) = · · · = Pu(i,q−1) for i = 1, 2, . . . , k.
(4.5).                                                                                           In order to have
   Having completeness we can assume that for (ak) ≤ q L−1
there are kq symbols u(i, j) (1 ≤ i ≤ k, 0 ≤ j ≤ q − 1) in U                                                                      q                   2
                                                                                                                  LC (P, P ) =       1−              Pu
with cu(i,j) = L and such that cu(i,0) , cu(i,1) , . . . , cu(i,q−1)                                                             q−1
                                                                                                                                               u∈U
share a prefix ωi of length L − 1 for i = 1, 2, . . . , k.
   Let u(1), . . . , u(k) be k “new symbols” not in the original                                 the two inequalities in (4.15) must be equalities. However, this
U and consider                                                                                   is equivalent with (4.6), i.e. (i’).

           U ′ = U{u(i, j) : 1 ≤ i ≤ k, 0 ≤ j ≤ q − 1}
                                                                                                 V. A    GLOBAL BALANCE PRINCIPLE TO FIND GOOD CODES
                  ∪ {u(i) : 1 ≤ i ≤ k}
                                                                                                   In case U and V are independent and identically distributed
and the probability distribution P ′ defined by                                                   there is no gain in using the local unbalance principle (LUP).
           
           Pu′
                           if u′ ∈ U ∩ U ′                                                      But in this case Corollary 1 and (4.2) provide a way to find a
      ′
     Pu′ =   q−1                                                                        (4.12)   good code. We first rewrite Corollary 1 as
                 Pu(i,j) if u′ = u(i) for some i.
                                                                                                                                 P r U ∈ U(C, ω), V ∈ U(C, ω) .
           
                  j=0                                                                              EWC (U, V ) =
                                                 ′                                 ′     ′                            n ω∈X n
Next we define a prefix code C for the source (U , P ) by
using C as follows:                                                                              By the assumptions on U and V with their distribution P
                          cu′        if u′ ∈ U ∩ U ′                                                             LC (P, P ) =               P 2 U(C, ω) .           (5.1)
               c′ ′ =
                u                                                                       (4.13)
                          ωi         if u′ = u(i) for some i.                                                                    n ω∈X n

Then for u′ ∈ U ∩ U ′ c′ ′ = cu′ and c′
                         u
                                              ′
                                      u(1) = cu(2) =                                             Notice that in case Pn,C                      P U(C, ω)    is a con-
         ′                                                                                                                             ω∈X n
· · · = cu(k) = L − 1.
    Therefore by induction hypothesis                                                            stant           P 2 U(C, ω)     is minimized by choosing the
                                                                                                         ω∈X n
                                  q                                                              P U(C, ω) ’s uniformly. This gives us a global balance prin-
           LC ′ (P ′ , P ′ ) ≥                   1−                  ′2
                                                                    Pu′                 (4.14)   ciple (GBT) for finding good codes.
                                 q−1
                                                         u′ ∈U ′                                    We shall see the roles of both, the LUP and the GBP in the
and equality holds iff Pu = q −                      cu
                                                                 for u ∈ U ∩ U ′ and             proof of the following coding theorem for DMS’s.
q−1                                                                                                 Theorem 3: For a DMS (U n , V n )∞ with generic distri-
                                                                                                                                        n=1
                 ′
      Pu(i,j) = Pu(i) = q −(L−1) for i = 1, 2, . . . , k. Further-
j=0
                                                                                                 bution PUV = P Q, i.e. the generic random variables U and
more, it follows from (4.2) and the definition of LC (P, P )                                      V are independent and PU = P , PV = Q
and LC ′ (P ′ , P ′ ) that
                                                                                                                                        1if P = Q
                                           2
                                             k        q−1
                                                                                                                 lim L(P n , Qn ) =      q                (5.2)
                                                                                                             n→∞                         if P = Q.
                                                                                                                                        q−1
  LC (P, P ) = LC ′ (P ′ , P ′ ) +                              Pu(i,j)                           Proof: Trivially LC (P, Q) ≥ 1 and by Corollary 2       q
                                                                                                                                                           q−1
                                         i=1          j=0                                                                       n    n
                                                                                                 is a lower bound to lim L(P , P ). Hence we only have to
                                 k                                                                                    n→∞
                                                                                                 construct codes to achieve asymptotically the bounds in (5.2).
       = LC ′ (P ′ , P ′ ) +            ′2
                                       Pu(i)                                                        Case P = Q: We choose a δ > 0 so that for sufficiently
                                 i=1
                                                         k
                                                                                                 large n
            q                           ′2                        ′2
                                                                                                                          n       n
                                                                                                                         TP,δ ∩ TQ,δ = ∅                  (5.3)
       ≥              1−               Pu′       +               Pu(i)
           q−1                                       i=1
                           u′ ∈U ′
                                                                                                 and for a θ > 0
                                                             k
          q                              2                              q               ′2
       =              1−                Pu           +              1−                 Pu(i)                 n                      n
                                                                                                         P (TP,δ ) > 1 − 2nθ and Q(TQ,δ ) > 1 − 2nθ .               (5.4)
         q−1                                              i=1
                                                                       q−1
                           u∈U ∩U ′
                                                                                     2       Partition U n into two parts U0 and U1 such that U0 ⊃ TP,δ  n
                                                      k                  q−1                                  n
            q                           2
                                                                                                 and U1 ⊃ TQ,δ .
       =       1 −                     Pu −                 q −1             Pu(i,j)  
                                                                                          
           q−1                                                                                     To simplify matters we assume q = 2. This does not loose
                           u∈U ∩U ′                  i=1                 j=0
                                                                                                 generality since enlarging the alphabet cannot make things
            q                                                                                    worse.
                                   2
       ≥       1−                 Pu .                                                  (4.15)     Let ℓi = ⌈log |Ui |⌉ and ψi : Ui → 2[ℓi ] for i = 1, 2. Then
           q−1
                           u∈U                                                                   we define a code C by cun = i, ψi (un ) if un ∈ Ui and show
                                                                                                                                                                                8



that LC (P n , Qn ) is arbitrarily close to one if n is sufficiently                                If the type of un is not in Pn (> α), we arbitrarily choose a
large. Actually it immediately follows from Proposition 1                                          sequence in X ℓn as ψ1 (un ).
                                                                                                                               ˜
                                                                                                      For any fixed t ≤ ℓn , P ∈ Pn (> α), and xt ∈ X t let
LC (P n , Qn ) =                             P n (cun )Qn (cu′n )cp(cun , cu′n )                       ˜                                     n
                                                                                                   U(P , xt ) be the set of sequences in TP such that xt is a
                                                                                                                                             ˜
                            un ,u′n ∈U n                                                                          n
                                                                                                   prefix of ψ1 (u ). Then it is not hard to see that for all xt , x′t
      =                          P n (cun )Qn (cu′n )cp(cun , cu′n )                               with t ≤ ℓn
          un ∈U0 u′n ∈U0
                                                                                                                         ˜              ˜
                                                                                                                      |U(P , xt )| − |U(P , x′t )| ≤ 1.
          +                              P n (cun )Qn (cu′n )cp(cun , cu′n )
              un ∈U0 u′n ∈U1                                                                       More specifically for all t ≤ ℓn and xt ∈ X t
          +                              P n (cun )Qn (cu′n )cp(cun , cu′n )                                                  k                         k

              un ∈U         u′n ∈U
                                                                                                          ˜
                                                                                                       |U(P , xt )| =             aj q   j−1−t
                                                                                                                                                 or           aj q j−1−t + 1,
                        1            0
                                                                                                                        j=t+1                         j=t+1
          +                              P n (cun )Qn (cu′n )cp(cun , cu′n )
              un ∈U     1   u′n ∈U   1
                                                                                                                 k
                                                                                                        n
                             n
                                                                                                   if |TP | =
                                                                                                        ˜             aj q j−1 with ak = 0, 0 ≤ aj ≤ q − 1 for
      < ℓ0               P (cun )                      Qn (cu′n ) +                P n (cun )×                  j=1
              un ∈U0                       u′n ∈U0                    un ∈U0                       j = 1, 2, . . . , k − 1.
                                                                                                     Let U(xt ) =             ˜
                                                                                                                            U(P , xt ) (here it does not matter whether
                        Qn (cu′ ) +                    P n (cun )             Qn (cu′n )                                  ˜
                              n                                                                                       all P
          u′n ∈U1                           un ∈U1                  u′n ∈U0                         ˜
                                                                                                   P ∈ Pn (> α) or not).
          + ℓ1                P n (cun )                  Qn (cu′n )                                  Thus we partition U n into q t parts as {U(xt ) : xt ∈ X t }
                 un ∈U1                       u′n ∈U1                                              for t ≤ ℓn .
                                                                                                      By the AEP (the asymptotic equipartition property) the
      ≤                 P n (cun )                    Qn (cu′n ) +                P n (cun )×      difference of the conditional probability of the event that the
           un ∈U    0                    u′n ∈U   1                   un ∈U   1                    output of U n is in U(xt ) given that the type of U n is in
                                                                                                   Pn (> α) and q −1 is not larger than
                        Qn (cu′n ) + ⌈n log |U|⌉                          (cun )×
          u′n ∈U0                                                un ∈U0                                                       min        |TP |−1 < 2−nα .
                                                                                                                                           ˜
                                                                                                                        ˜
                                                                                                                        P ∈Pn (>α)

                        Qn (cu′n ) +                    P n (cun )                Qn (cu′n )       Recalling that with probability 1 − 2−nθ U n has type in Pn (>
          u′n ∈U0                           un ∈U1                   u′n ∈U0                       α) and the assumption that V n has the same distribution as
                                                                                                   U n , we obtain that
      ≤ 1 + ⌈n log |U|⌉                               Qn (cu′n ) +                P n (cun )
                                         u′n ∈U   0                  un ∈U    1
                                                                                                       P r U n ∈ U(xt ) = P r V n ∈ U(xt ) = P n U(xt )
and therefore                                                                                      and for all xt ∈ X t
 LC (P n , Qn ) < 1+⌈n log |U|⌉2−nθ+1 → 1 as n → ∞, (5.5)                                             (1−2−nθ )(q −t − 2−nα )

where the second inequality holds because                                                                  ≤ P n U(xt ) ≤ (1 − 2−nθ )(q −t + 2−nα ) + 2−nθ ,
                                                                                                   which implies that for all xt ∈ X t
               ℓi = ⌈log |Ui |⌉ ≤ ⌈log |U n |⌉ for i = 0, 1
                                                                                                        |P n U(xt ) − q −t | ≤ 2−nθ + 2−nα < 2−2nβ ,                        (5.8)
and the last inequality follows from (5.4).
                                                                                                              1
  Case P = Q: Now we let P = Q. For 0 < α < H(P ) let                                              when β     4 min(θ, α).
Pn (> α) be the set of n-types (n-empirical distributions) P  ˜                                      Recall that Ψ1 is a function from U n to X ℓn and that the
               n      nα
on U with |TP | > 2 . Then there is a positive θ such that
               ˜                                                                                   definition of U(xt ), U(xℓn ) is actually the inverse image of
the empirical distribution of the output U n (resp. V n ) is in                                    X ℓn under Ψ1 , i.e. U(X ℓn ) = Ψ−1 (X ℓn ).
                                                                                                                                    1
Pn (> α) with probability larger than 1 − 2nθ .                                                                                       log |U (xn )|
                                                                                                     Let furthermore ℓ∗ (xℓn )            log q     and let Ψ2 be
  Next we choose an integer ℓn such that for
                                                                                                   a function on U n such that its restriction on U(xℓn ) is an
                                                                                                                     ∗  ℓn
                     1              n                                                              injection into X ℓ (x ) for all xℓn . Then our decoding function
                β       min(θ, α) 2 2 β < q ℓn ≤ 2nβ .         (5.6)
                     4                                                                             is defined as
                          n     ˜                               n                                                           c = (Ψ1 , Ψ2 ).                   (5.9)
Label sequences in TP for P ∈ Pn (> α) by 0, 1, . . . , |TP | −
                          ˜                                     ˜
                                           n        ℓn
1 and let Ψ1 be a mapping from U to X , where X =                                                  To estimate LC (P n , P n ) we introduce an auxiliary source with
{0, 1, . . . , q − 1} as follows.                                                                  alphabet X ℓn and probability distribution P ∗ such that for all
                      ˜
  If un has type P in Pn (> α) and got an index ind(un ) with                                      xℓn ∈ X ℓn
q-ary representation (xk , xk−1 , . . . , x2 , x1 ) i.e. ind(un ) =                                                   P ∗ (xℓn ) = P n U(xℓn ) .
 k
                                               n
      xi q i−1 for 0 ≤ xi ≤ q − 1, k = ⌈log |TP |⌉, then let
                                              ˜                                                    We divide the waiting time for identification with code C into
i=0
                                                                                                   two parts according to the two components Ψ1 and Ψ2 in
                            Ψ1 (un ) = (x1 , x2 , . . . , xℓn ).                           (5.7)   (5.9), and we let W1 and W2 be the random waiting times
                                                                                                                                                          9



of the two parts, respectively. Now let Z be a binary random                             Finally by combining (5.10), (5.11), (5.12), and (5.13) with
variable such that                                                                       the choice of β in (5.6) we have that
                                 0 if Ψ1 (U n ) = Ψ1 (V n )                                                                       q
                   Z=                                                                                      lim LC (P n , P n ) ≤     ,
                                 1 otherwise.                                                             n→∞                    q−1

Then                                                                                     the desired inequality.
                                                                                            It is interesting that the limits of the waiting time of ID-
LC (P n , P n ) = E(W1 + W2 ) = E W1 + E E(W2 | Z)                                       codes in the left hand side of (5.2) are independent of the
  = E W1 + P r(Z = 1)E(W2 | Z = 1)                                                       generic distributions P and Q and only depend on whether
                                                                                         they are equal.
  = E W1 +                  P n Ψ1 (U n ) = xℓ1 P n Ψ1 (V n ) = xℓn                         In the case that they are not equal it is even independent
                     x ℓn                                                                of the alphabet size. In particular in case P = Q, we have
                   × E(W2 | Z = 1)                                                       seen in the proof that the key step is how to distribute the first
                                                                                         symbol and the local unbalance principle (LUP) is applied in
                                                 2
  = E W1 +                   P n U(xℓn )               E(W2 | Z = 1).                    the second step. Moreover for a good code the random user
                     x ℓn                                                                with exponentially vanishing probability needs to wait for the
                                                                                (5.10)   second symbol. So the remaining parts of codewords are not
       ∗
Let C be the code for the auxiliary source with encoding                                 so important.
function c∗ = Ψ1 . Then we have that                                                        Similarly in the case P = Q, where we use instead of the
                                                                                         LUP the GBP, the key parts of codewords is a relatively small
                             E W1 = LC ∗ (P ∗ , P ∗ )                           (5.11)   prefix (in the proof it is the ℓn -th prefix) and after that the
and with the notation in Corollary 1 U(C ∗ , xt ) = U(xt ) and                           user with exponentially small probability has to wait. Thus
P ∗ U(C ∗ , xt ) = P n U(xt ) for xt ∈ X t with t ≤ ℓn . For                             again the remaining part of codewords is less important.
all xt ∈ X t , t ≤ ℓn , we denote
                       δ(xt ) = q −t − pn U(xt ) .
Then we have for all t ≤ ℓn                            δ(xt ) = 0 and by (5.8)                                    A PPENDIX I
                                             xt ∈X t                                              C OMMENTS     ON GENERALIZED ENTROPIES
δ(xt ) < 2−2nβ .
  Now we apply Corollary 1 to estimate                                                      After the discovery of ID-entropies in [4] work of Tsallis
                            ℓn
                                                                                         [13] and also [14] was brought to our attention. The equalities
 LC ∗ (P ∗ , P ∗ ) =                    P ∗ U(C ∗ , xt )
                                                                2                        (1) and (2) in [14] are here (A.1) and (A.2). The letter q used
                          t=0 xt ∈X t
                                                                                         there corresponds to our letter α, because for us q gives the
       ℓn                                       ℓn                                       alphabet size. The generalization of Boltzmann’s entropy
                                        2                                       2
  =                  P n U(xt )             =                  q −t − δ(xt )
                                                                                                            H(P ) = −k           Pu lnPu
      t=0 xt ∈X t                               t=0 xt ∈X t
       ℓn                                                                                is
              t   −2t
  =          q ·q           − 2q −t             δ(xt ) +              δ(xt )2                                          1
                                                                                                                                     N
                                                                                                                                            α
      t=0                             xt ∈X t              xt ∈X t                                      Sα (P ) = k             1−         Pu         (A.1)
       ℓn            ℓn                          ∞
                                                                                                                      α−1            u=1
                                                                   ℓn +1
                                                               q      − 1 −4nβ
  ≤         q −t +          q t · 2−4nβ <             q −t +             2               for any real α = 1. Notice that lim Sα (P ) = H(P ), which
      t=0            t=0                        t=0
                                                                    q−1                                                   α→1
        q   1 ℓn +1 −4nβ                                                                 can be named S1 (P ).
  <       +    q   2     .                                                      (5.12)     One readily verifies that for product-distributions P × Q for
       q−1 q−1
                                                                                         independent random variables
Moreover by definition of Ψ2 and W2
                                                      n log |U|                                                                 (α − 1)
                                                                                          Sα (P ×Q) = Sα (P )+Sα (Q)−                   Sα (P )Sα (Q) (A.2)
                     E W2 | Z = 1) ≤                                                                                               k
                                                        log q
and in (5.12) we have shown that                                                           Since in all cases Sα ≥ 0, α < 1, α = 1 and α >
                                                                                         1 respectively correspond to superadditivity, additivity and
                                        2
                   P n U(xℓn )              ≤ q −ℓn + q ℓn · 2−4nβ .                     subadditivity (also called for the purposes in statistical physics
            x ℓn                                                                         superextensitivity, extensitivity, and subextensitivity).
Consequently                                                                               We recall the grouping identity of [4].
                                        2
                                                                                           For a partition (U1 , U2 ) of U = {1, 2, . . . , N }, Qi =
                     P n U(xt )              E(W2 | Z = 1)                                                  (i)   Pu
                                                                                           u∈Ui Pu and Pu = Qi for u ∈ Ui (i = 1, 2)
        xℓn ∈X ℓn
                                                        n log |U|                                                                           P (i)
                     ≤ [q −ℓn + q ℓn 2−4nβ ]                      .             (5.13)          HI,q (P ) = HI,q (Q) +          Q2 HI,q (
                                                                                                                                 i                )   (A.3)
                                                          log q                                                                             Qi
                                                                                                                            i
                                                                                                                                                         10



where Q = (Q1 , Q2 ). This implies                                               [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ].
                                                                                   So, while the entropies of order α and the entropies of type
                HI,q (P × Q) = HI,q (Q) +                 Q2 HI,q (P )
                                                           j                     α are different for α = 1, we see that the bijection
                                                      j
                                                                                                        1
     and since                                                                                   t→        log2 [(21−α − 1)t + 1]
                                                                                                       1−α
                          q−1 q                                 q−1
 (1 −            Q2 ) =
                  j              (1 −                 Q2 ) =
                                                       j            HI,q (Q)        connects them. Therefore, we may ask what the advantage
                           q q−1                                 q
            j                                     j                              is in dealing with entropies of type α. We meanwhile also
or                                                                               learned that the book [2] gives a comprehensive discussion.
                                          q−1                                    Also Dar´ czy’s contribution [6], where “type α” is named
                                                                                            o
                              Q2 = 1 −
                               j              HI,q (Q)
                          j
                                           q                                     “degree α”, gives an enlightening analysis.
                                                                                    Note that R´ nyi entropies (α = 1) are additive, but not
                                                                                                 e
     we get
                                                                                 subadditive (except for α = 0) and not recursive, and they
                                       q−1                                       have not the branching property nor the sum property, that
HI,q (P × Q) = HI,q (Q)+ HI,q (P )−         HI,q (Q)HI,q (P ),
                                         q                                       is, there exists a measurable function g on (0, 1) such that
                                                         (A.4)
                                     q                                                                                          N
which is (A.2) for α = 2 and k = q−1 .                                                           α
   We have been told by several experts in physics that the                                     HN (P1 , P2 , . . . , PN ) =         g(Pi ).
                                                                                                                               i=1
operational significance of the quantities Sα (for α = 1) in
statistical physics seems not to be undisputed.                                  Entropies of type α, on the other hand, are not additive but
   In contrast it was demonstrated in [4] (see Section 2) the                    do have the subadditivity property and the sum property and
significance of identification entropy, which is formally close                    furthermore are additive of degree α:
to, but essentially different from Sα for two reasons: always
                    q                                                            H α (P1 Q1 , P1 Q2 , . . . , P1 QN , P2 Q1 , P2 Q2 , . . . , P2 QN ,
                                                                                   MN
α = 2 and k = q−1 is uniquely determined and depends on
the alphabet size q!                                                                       . . . , PM Q1 , PM Q2 , . . . , PM QN )
                                                                                       α                            α
   We also have discussed the coding theoretical meanings of                        = HM (P1 , P2 , . . . , PM ) + HN (Q1 , Q2 , . . . , QN )
                                      N
the factors        q
                        and    1−          2
                                          Pu .
                                                                                                  α
                                                                                     + (21−α − 1)HM (P1 , P2 , . . . , PM )HN (Q1 , Q2 , . . . , QN )
                                                                                                                            α
                  q−1
                                    u=1
   More recently we learned from referees that already in 1967                   [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Q1 , Q2 , . . . , QN ) ∈ P([N ]);
                                                    α
Havrda and Charv´ t [7] introduced the entropies {HN } of type
                  a                                                              M = 2, 3, . . . ; N = 2, 3, . . . ].
α:                                                                                 strong additive of degree α:
                                                           N
                                                                                 H α (P1 Q11 , P1 Q12 , . . . , P1 Q1N , P2 Q21 , P2 Q22 ,
      α
     HN (P1 , P2 , . . . , PN ) = (21−α − 1)−1 (                Piα − 1) (A.5)     MN

                                                          i=1                               . . . , P2 Q2N , . . . , PM QM1 , PM QM2 , . . . , PM QMN )
                                                                α                                                    M
[(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . , 0 = 0]                           α
                                                                                    = HM (P1 , P2 , . . . , PM ) +              α
                                                                                                                           Pjα HN (Qj1 , Qj2 , . . . , QjN )
                 α
         lim    HN (P1 , P2 , . . . , PN )   = HN (P1 , P2 , . . . , PN ),                                           j=1
         α→1

the Boltzmann/Gibbs/Shannon entropy. So, it is reasonable to                     [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Qj1 , Qj2 , . . . , QjN )          ∈
define                                                                            P([N ]); j = 1, 2, . . . , M ; M = 2, 3, . . . ; N = 2, 3, . . . ].
                                                                                   recursive of degree α:
             1
            HN (P1 , P2 , . . . , PN ) = HN (P1 , P2 , . . . , PN ).                 α                           α
                                                                                    HN (P1 , P2 , . . . ,PN ) = HN −1 (P1 + P2 , P3 , . . . , PN )
  This is a generalization of the BGS-entropy different from                                                                  P1             P2
the R´ nyi entropies of order α = 1 (which according to [2]
     e
                                                                                                                        α
                                                                                                         + (P1 + P2 )α H2 (           ,            )
                                                                                                                            P1 + P2 P1 + P2
                        u
were introduced by Sch¨ tzenberger [9]) given by
                                                                                 [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 3, 4, . . . with P1 +P2 > 0].
                                                                    N
                                            1                                      (In consequence entropies of type α also have the branching
            α HN (P1 , P2 , . . . , PN ) =     log2                     Piα ,
                                           1−α                                   property.)
                                                                i=1
                                                                                   It is clear now that for binary alphabet the ID-entropy is
[(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ].                           exactly the entropy of type α = 2.
   Comparison shows that                                                           However, prior to [13] there are hardly any applications or
                                                                                 operational justifications of the entropy of type α.
     α HN (P1 , P2 , . . . , PN )
                                                                                   Moreover the q-ary case did not exist at all and therefore
                  1
            =                         α
                     log2 [(21−α − 1)HN (P1 , P2 , . . . , PN ) + 1]             the name ID-entropy is well justified.
                 1−α
                                                                                   We feel that it must be said that in many papers (with
and                                                                              several coauthors) Tsallis at least developed ideas to promote
          α
         HN (P1 , P2 , . . . , PN )                                              non standard equilibrium theory in Statistical Physics using
                                                                                 generalized entropies Sα and generalized concepts of inner
                 = (21−α − 1)−1 [2(1−α)α HN (P1 ,P2 ,...,PN ) − 1]               energy.
                                                                                11



   Our attention has been drawn also to the papers [5], [11],
[12] with possibilities of connections to our work.
   Recently a clear cut progress was made by C. Heup in
his forthcoming thesis with a generalization of ID-entropy
motivated by L-identification.

                             R EFERENCES
 [1] S. Abe, Axioms and uniqueness theorem for Tsallis entropy, Phys. Lett.
     A 271, No. 1-2, 74–79, 2000.
             e               o
 [2] J. Acz´ l and Z. Dar´ czy, On Measures of Information and their
     Characterizations, Mathematics in Science and Engineering, Vol. 115,
     Academic Press, New York-London, 1975.
 [3] R. Ahlswede, General Theory of Information Transfer, in a special
     issue “General Theory of Information Transfer and Combinatorics” of
     Discrete Applied Mathematics, to appear.
 [4] R. Ahlswede, Identification entropy, General Theory of Information
     Transfer and Combinatorics, Report on a Research Project at the ZIF
     (Center of interdisciplinary studies) in Bielefeld Oct. 1, 2002 – August
                                                                 a
     31, 2004, edit R. Ahlswede with the assistance of L. B¨ umer and N.
     Cai, to appear.
                                               e
 [5] L.L. Campbell, A coding theorem and R´ nyi’s entropy, Information and
     Control 8, 423–429, 1965.
             o
 [6] Z. Dar´ czy, Generalized information functions, Information and Control
     16, 36–51, 1970.
                         a
 [7] J. Havrda, F. Charv´ t, Quantification method of classification processes,
     concept of structural a-entropy, Kybernetika (Prague) 3, 30–35, 1967 .
          e
 [8] A. R´ nyi, On measures of entropy and information, Proc. 4th Berkeley
     Sympos. Math. Statist. and Prob., Vol. I pp. 547–561 Univ. California
     Press, Berkeley, 1961.
                u
 [9] M. P. Sch¨ tzenberger, Contribution aux applications statistiques de la
     thorie de l’information, Publ. Inst. Statist. Univ. Paris 3, No. 1-2, 3–
     117, 1954.
[10] C.E. Shannon, A mathematical theory of communication, Bell Syst.
     Techn. J. 27, 379-423, 623-656, 1948.
[11] B.D. Sharma and H.C. Gupta, Entropy as an optimal measure, Infor-
     mation theory (Proc. Internat. CNRS Colloq., Cachan, 1977) (French),
     151–159, Colloq. Internat. CNRS, 276, CNRS, Paris, 1978.
[12] F. Topsoe, Game-theoretical equilibrium, maximum entropy and min-
     imum information discrimination, Maximum entropy and Bayesian
     methods (Paris, 1992), 15–23, Fund. Theories Phys., 53, Kluwer Acad.
     Publ., 1993.
[13] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J.
     Statist. Phys. 52, No. 1-2, 479–487, 1988.
[14] C. Tsallis, R.S. Mendes, A.R. Plastino, The role of constraints within
     generalized nonextensive statistics, Physica A 261, 534-554, 1998.

								
To top