# 208 by fanzhongqing

VIEWS: 8 PAGES: 11

• pg 1
```									                                                                                                                                            1

An interpretation of identiﬁcation entropy
Rudolf Ahlswede and Ning Cai

Abstract— After Ahlswede introduced identiﬁcation for source              the expected number of checkings in the worst case for a best
coding he discovered identiﬁcation entropy and demonstrated                  code, and ﬁnally, if v’s are chosen by a RV V independent
that it plays a role analogously to classical entropy in Shannon’s           of U and deﬁned by Prob(V = v) = Qv for v ∈ V = U, we
Noiseless Source Coding.
We give now even more insight into this functional interpreting           consider
its two factors.
Index Terms— Source coding for identiﬁcation, identiﬁcation
LC (P, Q) =          Qv LC (P, v),         (1.4)
entropy, operational justiﬁcation                                                                           v∈U

the average number of expected checkings, if code C is used,
I. I NTRODUCTION                                    and also
L(P, Q) = min LC (P, Q)                (1.5)
A. Terminology                                                                                             C
Identiﬁcation in Source Coding started in [3]. Then identiﬁ-              the average number of expected checkings for a best code.
cation entropy was discovered and its operational signiﬁcance                  A natural special case is the mean number of expected
in noiseless source coding was demonstrated in [4].                          checkings
Familiarity with that paper is helpful, but not necessary here.
N
As far as possible we use its notation.                                                                 1
¯
LC (P ) =         LC (P, u), if U = [N ],       (1.6)
Differences come from the fact that we use now a q-ary                                               N
coding alphabet X = {0, 1, . . . , q−1}, whereas earlier only the                                   u=1

case q = 2 was considered and it was remarked only that all                                                        1      1
which equals LC (P, Q) for Q =        N,..., N   , and
results generalize to arbitrary q. In particular the identiﬁcation
¯           ¯
L(P ) = min LC (P ).                (1.7)
entropy, abbreviated as ID-entropy, for the source (U, P, U )                                                C
has the form
Another special case of some “intuitive appeal” is the case
q                          2                    Q = P . Here we write
HI,q (P ) =               1−           Pu    .       (1.1)
q−1
u∈U                                            L(P, P ) = min LC (P, P ).               (1.8)
C
Shannon (1948) has shown that a source (U, P, U ) with
output U satisfying Prob (U = u) = Pu , can be encoded in a                     It is known that Huffman codes minimize the expected code
preﬁx code C = {cu : u ∈ U} ⊂ {0, 1, . . . , q − 1}∗ such that               length for PC.
for the q-ary entropy                                                           This is not always the case for L(P ) and the other quantities
in identiﬁcation.
Hq (P ) =         −Pu logq Pu ≤              Pu ||cu || ≤ Hq (P ) + 1,        In this paper an important incentive comes from Theorem
u∈U                        u∈U                                 4 of [4]:
where ||cu || is the length of cu .                                             For P N = (2−ℓ1 , . . . , 2−ℓN ), that is with 2-powers as
We use a preﬁx code C for another purpose, namely                         probabilities L(P N , P N ) = HI (P N ). Here the assumption
noiseless identiﬁcation, that is every user who wants to                     means that there is a complete preﬁx code (i.e. equality holds
know whether a v (v ∈ U) of his interest is the actual                       in Kraft’s equality).
source output or not can consider the RV C with C =                             B. A terminology involving proper common preﬁces
cu = (cu1 , . . . , cu||cu || ) if U = u and check whether C =                  The quantity LC (P, Q) is deﬁned below also for the case
(C1 , C2 , . . . ) coincides with cv in the ﬁrst, second etc. letter         of not necessarily independent U and V . It is conveniently
and stop when the ﬁrst different letter occurs or when C = cu .              described in a terminology involving proper common preﬁces.
Let LC (P, u) be the expected number of checkings, if code C                    For an encoding c : U → X ∗ we deﬁne for two words
is used.                                                                     w, w′ ∈ X ∗       cp(w, w′ ) as the number of proper common
Related quantities are                                                    preﬁces including the empty word, which equals the length of
the maximal proper common preﬁx plus 1.
LC (P ) = max LC (P, v),                       (1.2)      For example cp(11, 000) = 1, cp(0110, 0100) = 3 and
v∈U
cp(1001, 1000) = 4 (since the proper common preﬁces are
that is, the expected number of checkings for a person in the
∅, 01, 100).
worst case, if code C is used,
Now with encoding c for PC C and RV’s U and V
L(P ) = min LC (P ),                        (1.3)   cp(cU , cV ) measures the time steps it takes to decide whether
C
U and V are equal, that is, the checking time or waiting time,
Both authors are with the University of Bielefeld.                         which we denote by
2

(L)            (L)
where PC , and QC are row vectors obtained by deleting
WC (U, V ) = cp(cU , cV ).                   (1.9)   the components y ∈ X ≤L .
/
Sometimes the expressions (1.17) or (1.19) are more conve-
Clearly, we can write the expected waiting time as
nient for the investigation of LC (P, Q). For example it is easy
EWC (U, V ) = Ecp(cU , cV ).                  (1.10)   to see that Λ and therefore also Λ(L) are positive semideﬁnite.
Indeed, let ∆ (resp. ∆(L) ) be a matrix whose rows are
It is readily veriﬁed that for independent U , V , that is,            labelled by sequences in X ∗ (resp. X ≤L ) and whose columns
P r(U = u, V = v) = Pu Qv                                                are labelled by sequences in X ∗ (resp. X ≤L−1 ∪ {empty
EWC (U, V ) = LC (P, Q) = Ecp(cU , cV ).               (1.11)   sequence}) such that its (x, y)-entry is

We give now another description for EWC (U, V ). For a                                ∗            1       if y is a proper preﬁx of x
δy (x) =
word w ∈ X ∗ and a code C deﬁne as subset of U                                                       0       otherwise.
U(C, w) = {u ∈ U : cu        has proper preﬁx w}          (1.12)   Then
and its indicator function 1U (C,w) . Now                                        ∆ ∆T = Λ and ∆(L) (∆(L) )T = Λ(L)                         (1.21)
E cp(cU , cV ) =           P r(U = u, V = v)cp(cu , cv )           and hence Λ and Λ(L) are positive semideﬁnite.
u,v∈U                                              Therefore by (1.19) LC (P, P ) is (∪)-convex in P .
=            P r(U = u, V = v)×                         Furthermore for sources (U, P ) with |U| = 2k and the
u,v∈U
block code C = {0, 1}k the uniform distribution on U achieves
min LC (P, P ).1
1U (C,w) (u)1U (C,w)(v)                     P
Another interesting observation on (1.20) is that as the w-
w                                                                  (L)              (L)
th component of PC ∆(L) (resp. QC ∆(L) ) is P U(C, w)
=          P r U ∈ U(C, w), V ∈ U(C, w)              (resp. Q U(C, w) ), application of the Cauchy-Schwarz in-
w
equality to (1.20) yields
and by (1.11).                                                                                           2
(L)            (L)
PC Λ(L) (QC )T
E WC (U, V ) =          P r U ∈ U(C, w), V ∈ U(C, w) . (1.13)                               (L)             (L)        (L)        (L)
w                                                                 ≤ PC Λ(L) (PC )T · QC Λ(L) (QC )T                    (1.22)
C. A matrix notation                                                   and equality holds iff for all w
Next we look at the double inﬁnite matrix
P U(C, w) = Q U(C, w) .
Λ = cp(w, w′ )         w∈X ∗ ,w ′ ∈X ∗
(1.14)
We state this in equivalent form as
and its minor Λ(L) labelled by sequences in X ≤L .                        Lemma 1:
Henceforth we assume that U and V are independent and
LC (P, Q)2 ≤ LC (P, P )LC (Q, Q)              (1.23)
have distributions P and Q. We can then use (1.11)
For a preﬁx code C P induces the distribution PC and Q                 and equality holds iff for all w
induces the distribution QC , when for u, v ∈ U
P U(C, w) = Q U(C, w) ,
PC (cu ) = Pu , QC (cv ) = Qv                 (1.15)
which implies LC (P, Q) = LC (P, P ) = LC (Q, Q).
and                                                                        This suggests to introduce
PC (x) = QC (x) = 0        for x ∈ X ∗            C.   (1.16)                                           LC (P, Q)2
µC (P, Q) =                          ≤1
LC (P, P )LC (Q, Q)
Viewing both, PC and QC as row vectors, then for the corre-
sponding column vector QT equation (1.11) can be written in
C
as a measure of similarity of sources P and Q with respect to
the form                                                                 the code C.
LC (P, Q) = PC ΛQT .
C                (1.17)                 Intuitively we feel that for a good code for source P and
Q as user distribution P and Q should be very dissimilar,
It is clear from (1.10) that a non-complete preﬁx code, that is          because then the user waits less time until he knows that the
one for which the Kraft sum is smaller than 1, can be improved           output of U is not what he wants.
for identiﬁcation by shortening a suitable codeword. Hence an               This idea will be used later for code construction. Actually
optimal ID source code is necessarily complete. In such a                it is clear even in the general case where U and V are not
code                                                                     necessarily independent.
max cu ≤ |U| − 1                   (1.19)             To simplify the discussion we assume here that the alphabet
u∈U
X is binary, i.e. q = 2.
and one can replace Λ by its submatrix Λ(L) for L = |U| − 1.
This implies                                                               1 A proof is given in the forthcoming Ph.D. thesis “L-identiﬁcation for

(L)            (L)
sources” written by C. Heup at the Department of Mathematics of the
LC (P, Q) = PC Λ(L) (QC )T ,                   (1.20)   University of Bielefeld.
3

Then the ﬁrst bit of a codeword partitions the source U into        a tree T, where Ui corresponds to the set of codewords Ci ,
¯                         ¯
two parts U(i1 ); i1 = 0, 1; where U(i1 ) = {u ∈ U : cu1 = i1 }.       U0 ∪ U1 = U.
By (1.13) to minimize E WC (U, V ) one has to choose a                The leaves are labelled so that U0 = {1, 2, . . . , N0 } and
¯              ¯
partition such that P r U ∈ U(i1 ), V ∈ U(i1 ) ’s are small            U1 = {N0 + 1, . . . , N0 + N1 }, N0 + N1 = N . Using
simultaneously for i1 = 0, 1. To construct a good code one             probabilities
¯        ¯
can continue this line: partition U (i1 ) to U(i1 , i2 )’s such that                   Qi =       Pu ,    i = 0, 1
¯                  ¯              ¯              ¯
P r U ∈ U(i1 , i2 ), V ∈ U(i1 , i2 ) | U ∈ U (i1 ), V ∈ U(i1 ) ’s                                   u∈Ui
are as small as possible for i1 , i2 = 0, 1 and so on.
When U and V are independent the requirement for a                  we can give the decomposition in
¯
good code is that the difference between P U(i1 , . . . , ik ) and       Lemma 2: [4] For a code C for (U, P N )
¯
Q U(i1 , . . . , ik ) is large.
LC ((P1 , . . . , PN ), (P1 , . . . , PN )) =
We call this the LOCAL UNBALANCE PRINCIPLE in
P1        PN0           P1       PN0
contrast to the GLOBAL BALANCE PRINCIPLE below.                        1 + LC0             ,...,            ,     ,...,        Q2 +
0
Another extremal case is that U and V are equal with                               Q0          Q0          Q0        Q0
probability one and in this case one may never use the                        PN0 +1             PN0 +N1          PN0 +1       PN0 +N1
LC1                ,...,                ,         ,...,           Q2 .
1
unbalance principle. However in this case the identiﬁcation                       Q1                Q1              Q1           Q1
for the source makes no sense: The user knows that his                   This readily yields
output deﬁnitely comes! But still we can investigate the                 Theorem 1: [4] For every source (U, P N )
problem by assuming that with high probability U = V .
More speciﬁcally, we consider the limit of E WC (Uk , Vk )                            L(P N ) ≥ L(P N , P N ) ≥ HI (P N ).
for a sequence of random variables (Uk , Vk )∞ such that                 Proof: We proceed by induction on N . The base case
k=1
Uk converges to Vk in probability. Then it follows from                N = 2 can be established as folows. For N = 2 and any
Proposition 1 that E WC (Uk , Vk ) converges to the average            C LC (P 2 , P 2 ) ≥ P1 + P2 = 1, but
length of codewords, the classical object in source coding!                                       2
HI (P 2 ) = 2(1 − P1 − (1 − P1 )2 )
In this sense identiﬁcation for sources is a generalization of                                         2
source coding (data compression).                                                          = 2(2P1 − 2P1 ) = 4P1 (1 − P1 ) ≤ 1.
One of the discoveries of [4] is that ID-entropy is a lower
For the induction step use for any code C the decomposition
bound to LC (P, P ). In Section 2 we repeat the original proof
formula in Lemma 2 above and of course the desired inequality
and we give in Section 3 another proof of this fact via two
for N0 and N1 as induction hypothesis.
basic tools, Lemma 3 and Lemma 4 for LC (P n , P n ), where
P n is the distribution of a memoryless source. It provides a            LC ((P1 , . . . , PN ), (P1 , . . . , PN ))
q
clear information theoretical meaning of the two factors q−1
2
1−           2
Pu                                                                                       Pu
and                    of ID-entropy.                                        ≥1+2 1−                                   Q2
0
u∈U                                                                                            Q0
u∈U0
Next we consider in Section 4 sufﬁcient and necessary
2
conditions for a preﬁx code C to achieve the ID-entropy                                                    Pu
+2 1−                              Q2
1
lower bound for LC (P, P ). Quite surprisingly it turns out                                                Q1
u∈U1
that the ID-entropy bound for ID-time is achieved by a
variable length code iff the Shannon entropy bound for                       ≥ HI (Q) + Q2 HI (P (0) ) + Q2 HI (P (1) ) = HI (P N ),
0                1
the average length of codewords is achieved by the same
code (Theorem 2).                                                      where Q = (Q0 , Q1 ), 1 ≥ HI (Q), P (i) = Pu           , and the
Qi
Finally we end the paper in Section 5 with a global balance                                                           u∈Ui
grouping identity is used for the equality. This holds for every
principle to ﬁnd good codes (Theorem 3).
C and therefore also for min LC (P N ).
C
The approach readily extends also to the q-ary case.
II. A N   OPERATIONAL JUSTIFICATION OF        ID- ENTROPY     AS
LOWER BOUND FOR       LC (P, P )
III. A N   ALTERNATIVE PROOF OF THE ID- ENTROPY LOWER
Recall from the Introduction that for q = 2                                            BOUND FOR LC (P, P )
N
2                                        First we establish Lemma 3 below, which holds for the more
HI (P ) = 2 1 −           Pu   for P = (P1 . . . PN ).                                                       ∞
general case E WC (U, V ). Let (U n , V n ) n=1 be a discrete
u=1
memoryless correlated source with generic pair of variables
We repeat the ﬁrst main result for L(P, P ) from [4].                (U, V ). Again U n serves as (random) source and V n serves as
Central in our derivation is a proof by induction based on           random user. For a given code C for (U, V ) let C n be the code
a decomposition formula for trees.                                     obtained by encoding the components of sequence un ∈ U n
Starting from the root a binary tree T goes via 0 to the             iteratively. That is, for all un ∈ U n
subtree T0 and via 1 to the subtree T1 with sets of leaves U0
and U1 , respectively. A code C for (U, P ) can be viewed as                                 cnn = (cu1 , cu2 , . . . , cun ).
u                                          (3.1)
4

Lemma 3:                                                                          Proof: For given ε > 0 we choose δ > 0 such that for a
n
n−1                        τ > 0 and sufﬁciently large n for familiar sets TP,δ of typical
E WC n (U n , V n ) = E WC (U, V ) 1 +                      P r(U t = V t )      sequences
t=1
n
P n (TP,δ ) > 1 − 2−nτ
(3.2)
and therefore                                                                                       n
and for all un ∈ TP,δ
ε
E WC (U, V )                                               P (un ) < 2−n(H(P )− 2 ) .
lim E WC n (U n , V n ) =                        .                 (3.3)
n→∞                              1 − P r(U = V )
n                             Since for a preﬁx code C
Proof: Since P r(U n = V n ) =                           P r(Ut = Vt ) =
t=1                                          |{un ∈ U n : cU n ≤ Ln }| ≤ q Ln               (3.10)
P rn (U = V ) (3.3) follows from (3.2) immediately by the
summation formula for geometric series.
To show (3.2) we deﬁne ﬁrst for all t ≥ 2 random variables                        P r( cU n ≤ Ln ) = P r( cV n ≤ Ln )
t−1            t−1
/ n                    n
≤ P r(V n ∈ TP,δ ) + P r(V n ∈ TP,δ , cV n ≤ Ln )
0 if U    =V                                                                                                  ε
Zt =                                                    (3.4)           < 2−nτ + |{un : cun < Ln }| · 2−n(H(P )− 2 )
1 otherwise.                                                                                 ε
≤ 2−nτ + q Ln 2−n(H(P )− 2 ) .                       (3.11)
and for t = 1 we let Z1 be a constant for convenience of
notation. Further we let Wt be the waiting time for the random                    However, (3.8) implies that
user V n in the t-th block.
q Ln ≤ 2n(H(P )−ε) .
Conditional on Zt = 1 it is deﬁned like WC (U, V ) in (1.9)
and conditional on Zt = 0 obviously P r(Wt = 0 | Zt = 0) =                        This together with (3.11) yields
1, because the random user has made his decision before the                                                                   ε

t’s step. Moreover by the deﬁnition of C n                                               P r( cU n ≤ Ln ) < 2−nτ + 2−n 2 < 2−nδ                (3.12)

E[Wt | Zt = 1] = E WC (U, V )                             (3.5)   for δ min τ , 4 .
2
ε

Next, for the distribution P and the code C over U n we
and consequently                                                                                                ˜ ˜                ˜     ˜
construct a related source (U , P ) and a code C over U as
follows.
E[E(Wt | Zt )]                                                                                    ˜
The new set U contains {un ∈ U n : cun ≤ Ln } and
P r(U t−1 = V t−1 )E WC (U, V )                  for t = 2, 3, . . . , n                     ˜
for its elements P (un ) = P n (un ) and the new ∼-coding is
=                                                                              ˜
cu n = cun .
E WC (U, V )                                     for t = 1
˜        ˜
Now we deﬁne the additional elements in U with its P and
(3.6)
˜
c.
where (3.6) holds in case t = 1, because the random user has                         We partition {un ∈ U n : cun > Ln } into subsets Sj (1 ≤
to wait for the ﬁrst outcome. Therefore it follows that                           j ≤ J) according to the Ln -th preﬁx and use letter gj to
n                    n                                                        ≈
˜
represent Sj and put the set U = {gj : 1 ≤ j ≤ J} into U so
EWC n (U n , V n ) = EW n =             EWt =                E[E(Wt | Zt )]
that
t=1               t=1                                                                          ≈
n−1
˜
U = {un ∈ U n : cun ≤ Ln } ∪ U .
= EWC (U, V ) +                 P r(U t , V t )EWC (U, V )                    ˜
Then we deﬁne P (gj ) =                 P (un ) and let cgj be the
˜
t=1                                                                  un ∈Sj
as we wanted to show.                                                             common Ln -th preﬁx of the cun ’s for the un ’s in Sj . That
Next we consider the case where U and V are independent                         is, we consider all un sharing the same Ln -th preﬁx in cun as
and identically distributed with distribution P so that                           a single element. Obviously,
n                                                                      ˜ ˜
LC (P n , P n ) ≥ LC (P , P ).
˜                       (3.13)
n      n    n         n
P r(U = u , V           =v )=                 Put · Pvt .       (3.7)
t=1
˜      ˜
Finally let Un and Vn be random variables for the new source
˜
and new random user with distribution P and let Z be a
More speciﬁcally we are looking for a lower bound on
random variable such that
LC (P n , P n ) for all preﬁx codes C over U n .
Lemma 4: For all ε > 0 there exists an η > 0 such that for                                0   if both cU n and cVn are larger than Ln
sufﬁciently large n and all positive integers                                        Z=
1   otherwise.
Ln = ⌊n H(P ) − ε (log q)−1 ⌋                             (3.8)   Then
for all preﬁx codes C over U n                                                        ˜ ˜
LC (P , P ) = EW = E(W | Z) ≥ P r(Z = 0)E(W | Z = 0)
˜
Ln −1                                                                                       ≈ ≈
= P r( cU n ≥ Ln )P r( cV n ≥ Ln ) · L≈ (P , P )
LC (P n , P n ) > (1 − 2−nη )                    q −t .      (3.9)                                                    C
t=0                                                                             (3.14)
5

≈
where W is the random waiting time, P is the common                                         Letting n → ∞ we obtain a geometric distribution.
≈
˜        ˜           ˜
conditional distribution of Un given Un ∈ U , and Vn given                                    The expected waiting time is
≈      ≈          ˜           ≈    ≈
˜                      P (g)                                                                                 ∞
Vn ∈ U, i.e. P (gj ) = ≈ for gj ∈ U and C is the restriction
˜
P (U)                                                               EW =            ℓP r(U = V )ℓ−1 1 − P r(U = V )
≈
˜
of C to U . ≈                                                                                               ℓ=0
∞                                 ∞
Notice that C is a block code of length Ln . In order to bound                                        =         (ℓ + 1)P r(U = V )ℓ −             P r(U = V )ℓ
≈ ≈                     ≈
Ln
L≈ (P , P ) we extend U to a set of cardinality q in the case                                               ℓ=0                               ℓ=0
C                                                                                                          ∞
of necessity and assign zero probabilities and a codeword of                                                                                 1
≈                                                                                  =         P r(U = V )ℓ =                               (3.19)
length Ln not in C. This little modiﬁcation obviously does not                                                                        1 − P r(U = V )
≈ ≈                                                                         ℓ=0
change the value of L≈ (P , P ). Thus, if we denote the uniform                                                   1
C
≈
which equals     1−
P     2
Pu   in the case of independent, identically
¯    ¯
distribution over the extended set U by P , we have                                                                u
distributed random variables.
≈ ≈                                                            (Actually (3.2) holds for all stationary sources and we
¯ ¯
L≈ (P , P ) ≥ L≈ (P , P )                                 (3.15)
C                    ¯
C                                            choose a memoryless source for simplicity.) In general (3.3)
≈                                       ≈                                            has the form
¯                         ¯
where C is a bijective block code U → X Ln .
˜ ω) = ∅ iff the length of ω is smaller
It is clear that U(C,                                                                         lim EWC n (U n , V n )
n→∞
than Ln − 1 and                                                                                                                      n−1
˜
U(C, ω) = X L∗ −1 , if ω = ℓ ≤ Ln − 1.                                               = EWC (U, V ) · lim              1+         P r(U t = V t ) .   (3.20)
n→∞
t=1
Then it follows from (1.13) that
Ln −1                                      Ln                          By monotonicity the limit at the right hand side and therefore
¯ ¯
L≈ (P , P ) =               t
q [q   Ln −t
·q   −Ln 2
] =          q   −t
.   (3.16)   also at the left hand side exists and equals a positive ﬁnite or
¯
C                                                                                      inﬁnite value.
t=0                                       t=0
Finally we combine (3.12), (3.13), (3.14), (3.15) and (3.16)                                   When it is ﬁnite one may replace P r(U = V )t−1 , P r(U =
and Lemma 4 follows.                                                                        V ) and P r(U = V )t in the ﬁrst lines of (3.19) by P r(U t−1 =
An immediate consequence is                                                               V t−1 ), P r(Ut = Vt | U t−1 = V t−1 ) and P r(U t = V t ),
Corollary 1:                                                                              respectively, and obtain
∞                                                              n−1
q
lim L(P n , P n ) ≥            .      q −t =
(3.17)                                  lim      1+         P r(U t = V t )
n→∞
t=0
q−1                                                 n→∞
t=1
Furthermore for independent, identically distributed random                                       ∞
variables U, V with distribution P we have                                                    =         tP r(U t−1 = V t−1 ) · P r(Ut = Vt | U t−1 = V t−1 )
2
P r(U = V ) =                    Pu                                      t=0

u∈U                                        = EL,                                                            (3.21)
and from (3.3) and (3.17) follows the ID-entropy bound.
the expectation of random leaving time L for a stationary
Corollary 2: (See Theorem 2 of [4])
source.
q               2                                              Thus (3.20) is rewritten as
LC (P, P ) ≥      1−       Pu .         (3.18)
q−1
u∈U                                                           lim E WC n (U n , V n ) = E WC (U, V )EL.                 (3.22)
This derivation provides a clear information theoretical mean-                                       n→∞
q
ing to the two factors in ID-entropy: q−1 is a universal lower
Now the information theoretical meaning of (3.22) is quite
bound on the ID-waiting time for a discrete memoryless source
clear. One encodes a source (U n , V n )∞ with alphabet U
n=1
with an independent user having the same distribution P .
1
P 2 is the cost paid for coding the source componen-
component by component by a variable length code C. The ﬁrst
1−    P   u                                                                              term at the right hand side of (3.22) is the expected waiting
u∈U
twise and leaving time for the random user in the following                                 time in a block and the second term is the expected waiting
sense.                                                                                      time for different Ut and Vt .
Let us imagine the following procedure:
At a unit of time the random source U n outputs a symbol
IV. S UFFICIENT AND NECESSARY CONDITIONS FOR A
Ut and the random user V n , who wants to know whether
PREFIX CODE   C TO ACHIEVE THE ID- ENTROPY LOWER
U n = V n , checks whether Ut coincides with his own symbol
BOUND OF LC (P, P )
Vt . He will end if not. Then the waiting time for him is ℓ with
probability                                                                                    Quite surprisingly the ID-entropy bound to ID-waiting time
P r(U    ℓ−1
=V    ℓ−1
)P r(Uℓ = Vℓ )                                               is achieved by a variable length code iff the Shannon entropy
bound to the average lengths of codewords is achieved by the
= P r(U = V )ℓ−1 1 − P r(U = V ) for ℓ ≤ n.                                      same code.
6

For the proof we use a simple consequence of the Cauchy-                                        alphabet U(α) (C) and distribution P(α) such that for all u ∈
Schwarz inequality, which states for two sequences of real                                        U(α) (C) and X ′ = {cu : u ∈ U1 (C)}
numbers (a1 , a2 , . . . , ak ) and (b1 , b2 , . . . , bk ) that
P(α) (u) = P −1 U(α) (C) Pu .
k                 2            k                    k
ai b i           ≤              a2                  b2              (4.1)     Then (4.3) and (4.4) imply that (ii) holds for all C(α) , α ∈
i                   i
i=1                             i=1              i=1                              U1 (C) and for all β ∈ U1 (C)
with equality iff for some constant, say γ, ai = γ bi for all i                                                         Pβ = |U1 (C)|−1 P U1 (C) .                   (4.7)
or bi = c ai for all i.
Choosing bi = 1 for all i one has                                                                 Next we apply (4.3) to all ω with U(C, ω) and ω = 1 and
obtain
k            2              k
P r U ∈ U1 (C) = q − |U1 (C)| q −1 ,
/                                (4.8)
ai       ≤k              a2
i                        (4.2)
i=1                       i=1                                        which with (4.7) yields for all β ∈ U1 (C)
with equality iff a1 = a2 = · · · = ak .                                                                                         Pβ = q −1 .                         (4.9)
Theorem 2: Let C be a preﬁx code. Then the following
statements are equivalent                                                                         Moreover, by the induction hypothesis for all C(α) and P(α) ,
(i)    Pu cu = H(P )                                                                           α ∈ U1 (C)
u∈U                                                                                                                                         
(ii) For all ω ∈ X ∗ with U(C, ω) = ∅                                                                                      q 
LC(α) (P(α) , P(α) ) =       1 − q2            2
Pu  (4.10)
P U(C, ω) = q −                       ω
(4.3)                            q−1
u∈U(α) (C)
′
and for all u, u ∈ U such that    cu = cu′ and such                                       as by (4.3)
that cu and cu′ share the same preﬁx of length cu − 1                                                              P U(α) (C) = q −1                        (4.11)
implies
Pu = Pu′ .                (4.4)                                    for all α ∈ X ∆ = X {cu : u ∈ U1 (C)} (say).
Finally, like in the proof of (1.11) we have
(iii)
q                                                                  LC (P, P ) = 1 +             P 2 U(α) (C) LC(α) (P(α) , P(α) )
2
LC (P, P ) =          1−      Pu .                                       (4.5)                         α∈X ∆
q−1                                                                                                                              
u∈U
Proof: It is well-known that (i) is equivalent to                                                                        1    1 − q 2                   2
=1+                                                  Pu 
(i’) For all u ∈ U                                                                                                      q(q − 1)
α∈X ∆                         u∈U(α) (C)
cu = −[log q]−1 log Pu or Pu = q −                              cu
.         (4.6)                     ∆
|X |      q                        2
=1+               −                          Pu
Notice that for (i) the code C is necessarily complete. We shall                                                q(q − 1) q − 1
/
u∈U1 (C)
show that                                                                                                       q − |U1 (C)|    q                 2      q
(i′ ) ⇒ (ii) ⇒ (iii) ⇒ (i′ ).                                                      =1+                    −                   Pu +       |U1 (C)|q −2
q(q − 1)     q−1                      q−1
u∈U
Ad (i’) ⇒ (ii): For all ω with U(C, ω) = ∅ the code                                                      q                     2
Cω obtained by deleting the common preﬁx ω from all the                                               =            1−           Pu , that is (4.5),
q−1
codewords cu , u ∈ U(C, ω), is a complete code on U(C, ω),                                                                u∈U

because C is a complete code. That is,                                                            where the second equality holds by (4.10), the third equality
holds, because {U1 (C), U(α) (C), α ∈ X ′ } is a partition of U,
q −[    cu − ω ]
=1
and the fourth equality follows from (4.9) and the deﬁnition
u∈U (C,ω)
of X ∆ .
and consequently by (4.6)                                                                           Ad (iii) ⇒ (i’): Again we proceed by induction on the
maximum length of codewords.
P U(C, ω) =                         Pu =                         q−    cu
Suppose ﬁrst that for a code C         ℓmax (C) = 1. Then
u∈U (C,ω)                     u∈U (C,ω)                                   LC (P, P ) = 1 and |U| ≤ q. Applying (4.2) to the ID-entropy
− ω
=q                               q(       cu − ω )
= q−   ω
.       we get
u∈U (C,ω)
q                      2         q
Ad (ii) ⇒ (iii): Suppose (4.3) holds for all ω and we prove                                                          1−         Pu    ≤       (1 − |U|−1 )
q−1                              q−1
u∈U
(iii) by induction on ℓmax (C) = max cu .
u∈U
In case ℓmax (C) = 1 both sides of (4.5) are one. Assume                                       with equality iff P is the uniform distribution. On the other
q                   q       1
(iii) holds for all codes C ′ with ℓmax (C ′ ) ≤ L − 1 and let                                    hand, since |U| ≤ q, q−1 (1 − |U|−1 ) ≤ q−1 1 − q = 1
ℓmax (C) = L. Let U1 (C) and U(α) (C), be as in the proof of                                      and the equality holds iff |U| = q. Then (4.5) holds iff P is
(1.11) and let C(α) be the preﬁx code for the source with                                         uniform and |U| = q, i.e. (4.6).
7

Assume now that the implication (iii) ⇒ (i’) holds for all                                      By (4.13) the ﬁrst inequality holds iff Pu = q −            cu
for
q−1
codes with maximum lengths ≤ L − 1 and that C is a preﬁx
u ∈ U ∩ U ′ and            Pu(i,j) = q −(L−1) for i = 1, 2, . . . , k;
code of maximum length ℓmax (C) = L.                                                                                  j=0
Without loss of generality we can assume that C is complete,                                  it follows from (4.2) that the last inequality holds and with
because otherwise we can add “dummy” symbols with 0                                              equality iff
probability to U and assign to them suitable codewords so
that the Kraft sum equals 1, but this does not change equality                                      Pu(i,0) = Pu(i,1) = · · · = Pu(i,q−1) for i = 1, 2, . . . , k.
(4.5).                                                                                           In order to have
Having completeness we can assume that for (ak) ≤ q L−1
there are kq symbols u(i, j) (1 ≤ i ≤ k, 0 ≤ j ≤ q − 1) in U                                                                      q                   2
LC (P, P ) =       1−              Pu
with cu(i,j) = L and such that cu(i,0) , cu(i,1) , . . . , cu(i,q−1)                                                             q−1
u∈U
share a preﬁx ωi of length L − 1 for i = 1, 2, . . . , k.
Let u(1), . . . , u(k) be k “new symbols” not in the original                                 the two inequalities in (4.15) must be equalities. However, this
U and consider                                                                                   is equivalent with (4.6), i.e. (i’).

U ′ = U{u(i, j) : 1 ≤ i ≤ k, 0 ≤ j ≤ q − 1}
V. A    GLOBAL BALANCE PRINCIPLE TO FIND GOOD CODES
∪ {u(i) : 1 ≤ i ≤ k}
In case U and V are independent and identically distributed
and the probability distribution P ′ deﬁned by                                                   there is no gain in using the local unbalance principle (LUP).

Pu′
                if u′ ∈ U ∩ U ′                                                      But in this case Corollary 1 and (4.2) provide a way to ﬁnd a
′
Pu′ =   q−1                                                                        (4.12)   good code. We ﬁrst rewrite Corollary 1 as
      Pu(i,j) if u′ = u(i) for some i.
P r U ∈ U(C, ω), V ∈ U(C, ω) .

j=0                                                                              EWC (U, V ) =
′                                 ′     ′                            n ω∈X n
Next we deﬁne a preﬁx code C for the source (U , P ) by
using C as follows:                                                                              By the assumptions on U and V with their distribution P
cu′        if u′ ∈ U ∩ U ′                                                             LC (P, P ) =               P 2 U(C, ω) .           (5.1)
c′ ′ =
u                                                                       (4.13)
ωi         if u′ = u(i) for some i.                                                                    n ω∈X n

Then for u′ ∈ U ∩ U ′ c′ ′ = cu′ and c′
u
′
u(1) = cu(2) =                                             Notice that in case Pn,C                      P U(C, ω)    is a con-
′                                                                                                                             ω∈X n
· · · = cu(k) = L − 1.
Therefore by induction hypothesis                                                            stant           P 2 U(C, ω)     is minimized by choosing the
ω∈X n
q                                                              P U(C, ω) ’s uniformly. This gives us a global balance prin-
LC ′ (P ′ , P ′ ) ≥                   1−                  ′2
Pu′                 (4.14)   ciple (GBT) for ﬁnding good codes.
q−1
u′ ∈U ′                                    We shall see the roles of both, the LUP and the GBP in the
and equality holds iff Pu = q −                      cu
for u ∈ U ∩ U ′ and             proof of the following coding theorem for DMS’s.
q−1                                                                                                 Theorem 3: For a DMS (U n , V n )∞ with generic distri-
n=1
′
Pu(i,j) = Pu(i) = q −(L−1) for i = 1, 2, . . . , k. Further-
j=0
bution PUV = P Q, i.e. the generic random variables U and
more, it follows from (4.2) and the deﬁnition of LC (P, P )                                      V are independent and PU = P , PV = Q
and LC ′ (P ′ , P ′ ) that
1if P = Q
            2
k        q−1
lim L(P n , Qn ) =      q                (5.2)
n→∞                         if P = Q.
q−1
LC (P, P ) = LC ′ (P ′ , P ′ ) +                              Pu(i,j)                           Proof: Trivially LC (P, Q) ≥ 1 and by Corollary 2       q
q−1
i=1          j=0                                                                       n    n
is a lower bound to lim L(P , P ). Hence we only have to
k                                                                                    n→∞
construct codes to achieve asymptotically the bounds in (5.2).
= LC ′ (P ′ , P ′ ) +            ′2
Pu(i)                                                        Case P = Q: We choose a δ > 0 so that for sufﬁciently
i=1
k
large n
q                           ′2                        ′2
n       n
TP,δ ∩ TQ,δ = ∅                  (5.3)
≥              1−               Pu′       +               Pu(i)
q−1                                       i=1
u′ ∈U ′
and for a θ > 0
k
q                              2                              q               ′2
=              1−                Pu           +              1−                 Pu(i)                 n                      n
P (TP,δ ) > 1 − 2nθ and Q(TQ,δ ) > 1 − 2nθ .               (5.4)
q−1                                              i=1
q−1
u∈U ∩U ′
                                                                   2       Partition U n into two parts U0 and U1 such that U0 ⊃ TP,δ  n
k                  q−1                                  n
q                           2
and U1 ⊃ TQ,δ .
=       1 −                     Pu −                 q −1             Pu(i,j)  

q−1                                                                                     To simplify matters we assume q = 2. This does not loose
u∈U ∩U ′                  i=1                 j=0
generality since enlarging the alphabet cannot make things
q                                                                                    worse.
2
≥       1−                 Pu .                                                  (4.15)     Let ℓi = ⌈log |Ui |⌉ and ψi : Ui → 2[ℓi ] for i = 1, 2. Then
q−1
u∈U                                                                   we deﬁne a code C by cun = i, ψi (un ) if un ∈ Ui and show
8

that LC (P n , Qn ) is arbitrarily close to one if n is sufﬁciently                                If the type of un is not in Pn (> α), we arbitrarily choose a
large. Actually it immediately follows from Proposition 1                                          sequence in X ℓn as ψ1 (un ).
˜
For any ﬁxed t ≤ ℓn , P ∈ Pn (> α), and xt ∈ X t let
LC (P n , Qn ) =                             P n (cun )Qn (cu′n )cp(cun , cu′n )                       ˜                                     n
U(P , xt ) be the set of sequences in TP such that xt is a
˜
un ,u′n ∈U n                                                                          n
preﬁx of ψ1 (u ). Then it is not hard to see that for all xt , x′t
=                          P n (cun )Qn (cu′n )cp(cun , cu′n )                               with t ≤ ℓn
un ∈U0 u′n ∈U0
˜              ˜
|U(P , xt )| − |U(P , x′t )| ≤ 1.
+                              P n (cun )Qn (cu′n )cp(cun , cu′n )
un ∈U0 u′n ∈U1                                                                       More speciﬁcally for all t ≤ ℓn and xt ∈ X t
+                              P n (cun )Qn (cu′n )cp(cun , cu′n )                                                  k                         k

un ∈U         u′n ∈U
˜
|U(P , xt )| =             aj q   j−1−t
or           aj q j−1−t + 1,
1            0
j=t+1                         j=t+1
+                              P n (cun )Qn (cu′n )cp(cun , cu′n )
un ∈U     1   u′n ∈U   1
k
n
n
if |TP | =
˜             aj q j−1 with ak = 0, 0 ≤ aj ≤ q − 1 for
< ℓ0               P (cun )                      Qn (cu′n ) +                P n (cun )×                  j=1
un ∈U0                       u′n ∈U0                    un ∈U0                       j = 1, 2, . . . , k − 1.
Let U(xt ) =             ˜
U(P , xt ) (here it does not matter whether
Qn (cu′ ) +                    P n (cun )             Qn (cu′n )                                  ˜
n                                                                                       all P
u′n ∈U1                           un ∈U1                  u′n ∈U0                         ˜
P ∈ Pn (> α) or not).
+ ℓ1                P n (cun )                  Qn (cu′n )                                  Thus we partition U n into q t parts as {U(xt ) : xt ∈ X t }
un ∈U1                       u′n ∈U1                                              for t ≤ ℓn .
By the AEP (the asymptotic equipartition property) the
≤                 P n (cun )                    Qn (cu′n ) +                P n (cun )×      difference of the conditional probability of the event that the
un ∈U    0                    u′n ∈U   1                   un ∈U   1                    output of U n is in U(xt ) given that the type of U n is in
Pn (> α) and q −1 is not larger than
Qn (cu′n ) + ⌈n log |U|⌉                          (cun )×
u′n ∈U0                                                un ∈U0                                                       min        |TP |−1 < 2−nα .
˜
˜
P ∈Pn (>α)

Qn (cu′n ) +                    P n (cun )                Qn (cu′n )       Recalling that with probability 1 − 2−nθ U n has type in Pn (>
u′n ∈U0                           un ∈U1                   u′n ∈U0                       α) and the assumption that V n has the same distribution as
U n , we obtain that
≤ 1 + ⌈n log |U|⌉                               Qn (cu′n ) +                P n (cun )
u′n ∈U   0                  un ∈U    1
P r U n ∈ U(xt ) = P r V n ∈ U(xt ) = P n U(xt )
and therefore                                                                                      and for all xt ∈ X t
LC (P n , Qn ) < 1+⌈n log |U|⌉2−nθ+1 → 1 as n → ∞, (5.5)                                             (1−2−nθ )(q −t − 2−nα )

where the second inequality holds because                                                                  ≤ P n U(xt ) ≤ (1 − 2−nθ )(q −t + 2−nα ) + 2−nθ ,
which implies that for all xt ∈ X t
ℓi = ⌈log |Ui |⌉ ≤ ⌈log |U n |⌉ for i = 0, 1
|P n U(xt ) − q −t | ≤ 2−nθ + 2−nα < 2−2nβ ,                        (5.8)
and the last inequality follows from (5.4).
1
Case P = Q: Now we let P = Q. For 0 < α < H(P ) let                                              when β     4 min(θ, α).
Pn (> α) be the set of n-types (n-empirical distributions) P  ˜                                      Recall that Ψ1 is a function from U n to X ℓn and that the
n      nα
on U with |TP | > 2 . Then there is a positive θ such that
˜                                                                                   deﬁnition of U(xt ), U(xℓn ) is actually the inverse image of
the empirical distribution of the output U n (resp. V n ) is in                                    X ℓn under Ψ1 , i.e. U(X ℓn ) = Ψ−1 (X ℓn ).
1
Pn (> α) with probability larger than 1 − 2nθ .                                                                                       log |U (xn )|
Let furthermore ℓ∗ (xℓn )            log q     and let Ψ2 be
Next we choose an integer ℓn such that for
a function on U n such that its restriction on U(xℓn ) is an
∗  ℓn
1              n                                                              injection into X ℓ (x ) for all xℓn . Then our decoding function
β       min(θ, α) 2 2 β < q ℓn ≤ 2nβ .         (5.6)
4                                                                             is deﬁned as
n     ˜                               n                                                           c = (Ψ1 , Ψ2 ).                   (5.9)
Label sequences in TP for P ∈ Pn (> α) by 0, 1, . . . , |TP | −
˜                                     ˜
n        ℓn
1 and let Ψ1 be a mapping from U to X , where X =                                                  To estimate LC (P n , P n ) we introduce an auxiliary source with
{0, 1, . . . , q − 1} as follows.                                                                  alphabet X ℓn and probability distribution P ∗ such that for all
˜
If un has type P in Pn (> α) and got an index ind(un ) with                                      xℓn ∈ X ℓn
q-ary representation (xk , xk−1 , . . . , x2 , x1 ) i.e. ind(un ) =                                                   P ∗ (xℓn ) = P n U(xℓn ) .
k
n
xi q i−1 for 0 ≤ xi ≤ q − 1, k = ⌈log |TP |⌉, then let
˜                                                    We divide the waiting time for identiﬁcation with code C into
i=0
two parts according to the two components Ψ1 and Ψ2 in
Ψ1 (un ) = (x1 , x2 , . . . , xℓn ).                           (5.7)   (5.9), and we let W1 and W2 be the random waiting times
9

of the two parts, respectively. Now let Z be a binary random                             Finally by combining (5.10), (5.11), (5.12), and (5.13) with
variable such that                                                                       the choice of β in (5.6) we have that
0 if Ψ1 (U n ) = Ψ1 (V n )                                                                       q
Z=                                                                                      lim LC (P n , P n ) ≤     ,
1 otherwise.                                                             n→∞                    q−1

Then                                                                                     the desired inequality.
It is interesting that the limits of the waiting time of ID-
LC (P n , P n ) = E(W1 + W2 ) = E W1 + E E(W2 | Z)                                       codes in the left hand side of (5.2) are independent of the
= E W1 + P r(Z = 1)E(W2 | Z = 1)                                                       generic distributions P and Q and only depend on whether
they are equal.
= E W1 +                  P n Ψ1 (U n ) = xℓ1 P n Ψ1 (V n ) = xℓn                         In the case that they are not equal it is even independent
x ℓn                                                                of the alphabet size. In particular in case P = Q, we have
× E(W2 | Z = 1)                                                       seen in the proof that the key step is how to distribute the ﬁrst
symbol and the local unbalance principle (LUP) is applied in
2
= E W1 +                   P n U(xℓn )               E(W2 | Z = 1).                    the second step. Moreover for a good code the random user
x ℓn                                                                with exponentially vanishing probability needs to wait for the
(5.10)   second symbol. So the remaining parts of codewords are not
∗
Let C be the code for the auxiliary source with encoding                                 so important.
function c∗ = Ψ1 . Then we have that                                                        Similarly in the case P = Q, where we use instead of the
LUP the GBP, the key parts of codewords is a relatively small
E W1 = LC ∗ (P ∗ , P ∗ )                           (5.11)   preﬁx (in the proof it is the ℓn -th preﬁx) and after that the
and with the notation in Corollary 1 U(C ∗ , xt ) = U(xt ) and                           user with exponentially small probability has to wait. Thus
P ∗ U(C ∗ , xt ) = P n U(xt ) for xt ∈ X t with t ≤ ℓn . For                             again the remaining part of codewords is less important.
all xt ∈ X t , t ≤ ℓn , we denote
δ(xt ) = q −t − pn U(xt ) .
Then we have for all t ≤ ℓn                            δ(xt ) = 0 and by (5.8)                                    A PPENDIX I
xt ∈X t                                              C OMMENTS     ON GENERALIZED ENTROPIES
δ(xt ) < 2−2nβ .
Now we apply Corollary 1 to estimate                                                      After the discovery of ID-entropies in [4] work of Tsallis
ℓn
[13] and also [14] was brought to our attention. The equalities
LC ∗ (P ∗ , P ∗ ) =                    P ∗ U(C ∗ , xt )
2                        (1) and (2) in [14] are here (A.1) and (A.2). The letter q used
t=0 xt ∈X t
there corresponds to our letter α, because for us q gives the
ℓn                                       ℓn                                       alphabet size. The generalization of Boltzmann’s entropy
2                                       2
=                  P n U(xt )             =                  q −t − δ(xt )
H(P ) = −k           Pu lnPu
t=0 xt ∈X t                               t=0 xt ∈X t
ℓn                                                                                is
t   −2t
=          q ·q           − 2q −t             δ(xt ) +              δ(xt )2                                          1
N
α
t=0                             xt ∈X t              xt ∈X t                                      Sα (P ) = k             1−         Pu         (A.1)
ℓn            ℓn                          ∞
α−1            u=1
ℓn +1
q      − 1 −4nβ
≤         q −t +          q t · 2−4nβ <             q −t +             2               for any real α = 1. Notice that lim Sα (P ) = H(P ), which
t=0            t=0                        t=0
q−1                                                   α→1
q   1 ℓn +1 −4nβ                                                                 can be named S1 (P ).
<       +    q   2     .                                                      (5.12)     One readily veriﬁes that for product-distributions P × Q for
q−1 q−1
independent random variables
Moreover by deﬁnition of Ψ2 and W2
n log |U|                                                                 (α − 1)
Sα (P ×Q) = Sα (P )+Sα (Q)−                   Sα (P )Sα (Q) (A.2)
E W2 | Z = 1) ≤                                                                                               k
log q
and in (5.12) we have shown that                                                           Since in all cases Sα ≥ 0, α < 1, α = 1 and α >
2
P n U(xℓn )              ≤ q −ℓn + q ℓn · 2−4nβ .                     subadditivity (also called for the purposes in statistical physics
x ℓn                                                                         superextensitivity, extensitivity, and subextensitivity).
Consequently                                                                               We recall the grouping identity of [4].
2
For a partition (U1 , U2 ) of U = {1, 2, . . . , N }, Qi =
P n U(xt )              E(W2 | Z = 1)                                                  (i)   Pu
u∈Ui Pu and Pu = Qi for u ∈ Ui (i = 1, 2)
xℓn ∈X ℓn
n log |U|                                                                           P (i)
≤ [q −ℓn + q ℓn 2−4nβ ]                      .             (5.13)          HI,q (P ) = HI,q (Q) +          Q2 HI,q (
i                )   (A.3)
log q                                                                             Qi
i
10

where Q = (Q1 , Q2 ). This implies                                               [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ].
So, while the entropies of order α and the entropies of type
HI,q (P × Q) = HI,q (Q) +                 Q2 HI,q (P )
j                     α are different for α = 1, we see that the bijection
j
1
and since                                                                                   t→        log2 [(21−α − 1)t + 1]
1−α
q−1 q                                 q−1
(1 −            Q2 ) =
j              (1 −                 Q2 ) =
j            HI,q (Q)        connects them. Therefore, we may ask what the advantage
q q−1                                 q
j                                     j                              is in dealing with entropies of type α. We meanwhile also
or                                                                               learned that the book [2] gives a comprehensive discussion.
q−1                                    Also Dar´ czy’s contribution [6], where “type α” is named
o
Q2 = 1 −
j              HI,q (Q)
j
q                                     “degree α”, gives an enlightening analysis.
Note that R´ nyi entropies (α = 1) are additive, but not
e
we get
subadditive (except for α = 0) and not recursive, and they
q−1                                       have not the branching property nor the sum property, that
HI,q (P × Q) = HI,q (Q)+ HI,q (P )−         HI,q (Q)HI,q (P ),
q                                       is, there exists a measurable function g on (0, 1) such that
(A.4)
q                                                                                          N
which is (A.2) for α = 2 and k = q−1 .                                                           α
We have been told by several experts in physics that the                                     HN (P1 , P2 , . . . , PN ) =         g(Pi ).
i=1
operational signiﬁcance of the quantities Sα (for α = 1) in
statistical physics seems not to be undisputed.                                  Entropies of type α, on the other hand, are not additive but
In contrast it was demonstrated in [4] (see Section 2) the                    do have the subadditivity property and the sum property and
signiﬁcance of identiﬁcation entropy, which is formally close                    furthermore are additive of degree α:
to, but essentially different from Sα for two reasons: always
q                                                            H α (P1 Q1 , P1 Q2 , . . . , P1 QN , P2 Q1 , P2 Q2 , . . . , P2 QN ,
MN
α = 2 and k = q−1 is uniquely determined and depends on
the alphabet size q!                                                                       . . . , PM Q1 , PM Q2 , . . . , PM QN )
α                            α
We also have discussed the coding theoretical meanings of                        = HM (P1 , P2 , . . . , PM ) + HN (Q1 , Q2 , . . . , QN )
N
the factors        q
and    1−          2
Pu .
α
+ (21−α − 1)HM (P1 , P2 , . . . , PM )HN (Q1 , Q2 , . . . , QN )
α
q−1
u=1
More recently we learned from referees that already in 1967                   [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Q1 , Q2 , . . . , QN ) ∈ P([N ]);
α
Havrda and Charv´ t [7] introduced the entropies {HN } of type
a                                                              M = 2, 3, . . . ; N = 2, 3, . . . ].
α:                                                                                 strong additive of degree α:
N
H α (P1 Q11 , P1 Q12 , . . . , P1 Q1N , P2 Q21 , P2 Q22 ,
α
HN (P1 , P2 , . . . , PN ) = (21−α − 1)−1 (                Piα − 1) (A.5)     MN

i=1                               . . . , P2 Q2N , . . . , PM QM1 , PM QM2 , . . . , PM QMN )
α                                                    M
[(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . , 0 = 0]                           α
= HM (P1 , P2 , . . . , PM ) +              α
Pjα HN (Qj1 , Qj2 , . . . , QjN )
α
lim    HN (P1 , P2 , . . . , PN )   = HN (P1 , P2 , . . . , PN ),                                           j=1
α→1

the Boltzmann/Gibbs/Shannon entropy. So, it is reasonable to                     [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Qj1 , Qj2 , . . . , QjN )          ∈
deﬁne                                                                            P([N ]); j = 1, 2, . . . , M ; M = 2, 3, . . . ; N = 2, 3, . . . ].
recursive of degree α:
1
HN (P1 , P2 , . . . , PN ) = HN (P1 , P2 , . . . , PN ).                 α                           α
HN (P1 , P2 , . . . ,PN ) = HN −1 (P1 + P2 , P3 , . . . , PN )
This is a generalization of the BGS-entropy different from                                                                  P1             P2
the R´ nyi entropies of order α = 1 (which according to [2]
e
α
+ (P1 + P2 )α H2 (           ,            )
P1 + P2 P1 + P2
u
were introduced by Sch¨ tzenberger [9]) given by
[(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 3, 4, . . . with P1 +P2 > 0].
N
1                                      (In consequence entropies of type α also have the branching
α HN (P1 , P2 , . . . , PN ) =     log2                     Piα ,
1−α                                   property.)
i=1
It is clear now that for binary alphabet the ID-entropy is
[(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ].                           exactly the entropy of type α = 2.
Comparison shows that                                                           However, prior to [13] there are hardly any applications or
operational justiﬁcations of the entropy of type α.
α HN (P1 , P2 , . . . , PN )
Moreover the q-ary case did not exist at all and therefore
1
=                         α
log2 [(21−α − 1)HN (P1 , P2 , . . . , PN ) + 1]             the name ID-entropy is well justiﬁed.
1−α
We feel that it must be said that in many papers (with
and                                                                              several coauthors) Tsallis at least developed ideas to promote
α
HN (P1 , P2 , . . . , PN )                                              non standard equilibrium theory in Statistical Physics using
generalized entropies Sα and generalized concepts of inner
= (21−α − 1)−1 [2(1−α)α HN (P1 ,P2 ,...,PN ) − 1]               energy.
11

Our attention has been drawn also to the papers [5], [11],
[12] with possibilities of connections to our work.
Recently a clear cut progress was made by C. Heup in
his forthcoming thesis with a generalization of ID-entropy
motivated by L-identiﬁcation.

R EFERENCES
[1] S. Abe, Axioms and uniqueness theorem for Tsallis entropy, Phys. Lett.
A 271, No. 1-2, 74–79, 2000.
e               o
[2] J. Acz´ l and Z. Dar´ czy, On Measures of Information and their
Characterizations, Mathematics in Science and Engineering, Vol. 115,
[3] R. Ahlswede, General Theory of Information Transfer, in a special
issue “General Theory of Information Transfer and Combinatorics” of
Discrete Applied Mathematics, to appear.
[4] R. Ahlswede, Identiﬁcation entropy, General Theory of Information
Transfer and Combinatorics, Report on a Research Project at the ZIF
(Center of interdisciplinary studies) in Bielefeld Oct. 1, 2002 – August
a
31, 2004, edit R. Ahlswede with the assistance of L. B¨ umer and N.
Cai, to appear.
e
[5] L.L. Campbell, A coding theorem and R´ nyi’s entropy, Information and
Control 8, 423–429, 1965.
o
[6] Z. Dar´ czy, Generalized information functions, Information and Control
16, 36–51, 1970.
a
[7] J. Havrda, F. Charv´ t, Quantiﬁcation method of classiﬁcation processes,
concept of structural a-entropy, Kybernetika (Prague) 3, 30–35, 1967 .
e
[8] A. R´ nyi, On measures of entropy and information, Proc. 4th Berkeley
Sympos. Math. Statist. and Prob., Vol. I pp. 547–561 Univ. California
Press, Berkeley, 1961.
u
[9] M. P. Sch¨ tzenberger, Contribution aux applications statistiques de la
thorie de l’information, Publ. Inst. Statist. Univ. Paris 3, No. 1-2, 3–
117, 1954.
[10] C.E. Shannon, A mathematical theory of communication, Bell Syst.
Techn. J. 27, 379-423, 623-656, 1948.
[11] B.D. Sharma and H.C. Gupta, Entropy as an optimal measure, Infor-
mation theory (Proc. Internat. CNRS Colloq., Cachan, 1977) (French),
151–159, Colloq. Internat. CNRS, 276, CNRS, Paris, 1978.
[12] F. Topsoe, Game-theoretical equilibrium, maximum entropy and min-
imum information discrimination, Maximum entropy and Bayesian
methods (Paris, 1992), 15–23, Fund. Theories Phys., 53, Kluwer Acad.
Publ., 1993.
[13] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J.
Statist. Phys. 52, No. 1-2, 479–487, 1988.
[14] C. Tsallis, R.S. Mendes, A.R. Plastino, The role of constraints within
generalized nonextensive statistics, Physica A 261, 534-554, 1998.

```
To top