VIEWS: 8 PAGES: 11 POSTED ON: 5/15/2012 Public Domain
1 An interpretation of identiﬁcation entropy Rudolf Ahlswede and Ning Cai Abstract— After Ahlswede introduced identiﬁcation for source the expected number of checkings in the worst case for a best coding he discovered identiﬁcation entropy and demonstrated code, and ﬁnally, if v’s are chosen by a RV V independent that it plays a role analogously to classical entropy in Shannon’s of U and deﬁned by Prob(V = v) = Qv for v ∈ V = U, we Noiseless Source Coding. We give now even more insight into this functional interpreting consider its two factors. Index Terms— Source coding for identiﬁcation, identiﬁcation LC (P, Q) = Qv LC (P, v), (1.4) entropy, operational justiﬁcation v∈U the average number of expected checkings, if code C is used, I. I NTRODUCTION and also L(P, Q) = min LC (P, Q) (1.5) A. Terminology C Identiﬁcation in Source Coding started in [3]. Then identiﬁ- the average number of expected checkings for a best code. cation entropy was discovered and its operational signiﬁcance A natural special case is the mean number of expected in noiseless source coding was demonstrated in [4]. checkings Familiarity with that paper is helpful, but not necessary here. N As far as possible we use its notation. 1 ¯ LC (P ) = LC (P, u), if U = [N ], (1.6) Differences come from the fact that we use now a q-ary N coding alphabet X = {0, 1, . . . , q−1}, whereas earlier only the u=1 case q = 2 was considered and it was remarked only that all 1 1 which equals LC (P, Q) for Q = N,..., N , and results generalize to arbitrary q. In particular the identiﬁcation ¯ ¯ L(P ) = min LC (P ). (1.7) entropy, abbreviated as ID-entropy, for the source (U, P, U ) C has the form Another special case of some “intuitive appeal” is the case q 2 Q = P . Here we write HI,q (P ) = 1− Pu . (1.1) q−1 u∈U L(P, P ) = min LC (P, P ). (1.8) C Shannon (1948) has shown that a source (U, P, U ) with output U satisfying Prob (U = u) = Pu , can be encoded in a It is known that Huffman codes minimize the expected code preﬁx code C = {cu : u ∈ U} ⊂ {0, 1, . . . , q − 1}∗ such that length for PC. for the q-ary entropy This is not always the case for L(P ) and the other quantities in identiﬁcation. Hq (P ) = −Pu logq Pu ≤ Pu ||cu || ≤ Hq (P ) + 1, In this paper an important incentive comes from Theorem u∈U u∈U 4 of [4]: where ||cu || is the length of cu . For P N = (2−ℓ1 , . . . , 2−ℓN ), that is with 2-powers as We use a preﬁx code C for another purpose, namely probabilities L(P N , P N ) = HI (P N ). Here the assumption noiseless identiﬁcation, that is every user who wants to means that there is a complete preﬁx code (i.e. equality holds know whether a v (v ∈ U) of his interest is the actual in Kraft’s equality). source output or not can consider the RV C with C = B. A terminology involving proper common preﬁces cu = (cu1 , . . . , cu||cu || ) if U = u and check whether C = The quantity LC (P, Q) is deﬁned below also for the case (C1 , C2 , . . . ) coincides with cv in the ﬁrst, second etc. letter of not necessarily independent U and V . It is conveniently and stop when the ﬁrst different letter occurs or when C = cu . described in a terminology involving proper common preﬁces. Let LC (P, u) be the expected number of checkings, if code C For an encoding c : U → X ∗ we deﬁne for two words is used. w, w′ ∈ X ∗ cp(w, w′ ) as the number of proper common Related quantities are preﬁces including the empty word, which equals the length of the maximal proper common preﬁx plus 1. LC (P ) = max LC (P, v), (1.2) For example cp(11, 000) = 1, cp(0110, 0100) = 3 and v∈U cp(1001, 1000) = 4 (since the proper common preﬁces are that is, the expected number of checkings for a person in the ∅, 01, 100). worst case, if code C is used, Now with encoding c for PC C and RV’s U and V L(P ) = min LC (P ), (1.3) cp(cU , cV ) measures the time steps it takes to decide whether C U and V are equal, that is, the checking time or waiting time, Both authors are with the University of Bielefeld. which we denote by 2 (L) (L) where PC , and QC are row vectors obtained by deleting WC (U, V ) = cp(cU , cV ). (1.9) the components y ∈ X ≤L . / Sometimes the expressions (1.17) or (1.19) are more conve- Clearly, we can write the expected waiting time as nient for the investigation of LC (P, Q). For example it is easy EWC (U, V ) = Ecp(cU , cV ). (1.10) to see that Λ and therefore also Λ(L) are positive semideﬁnite. Indeed, let ∆ (resp. ∆(L) ) be a matrix whose rows are It is readily veriﬁed that for independent U , V , that is, labelled by sequences in X ∗ (resp. X ≤L ) and whose columns P r(U = u, V = v) = Pu Qv are labelled by sequences in X ∗ (resp. X ≤L−1 ∪ {empty EWC (U, V ) = LC (P, Q) = Ecp(cU , cV ). (1.11) sequence}) such that its (x, y)-entry is We give now another description for EWC (U, V ). For a ∗ 1 if y is a proper preﬁx of x δy (x) = word w ∈ X ∗ and a code C deﬁne as subset of U 0 otherwise. U(C, w) = {u ∈ U : cu has proper preﬁx w} (1.12) Then and its indicator function 1U (C,w) . Now ∆ ∆T = Λ and ∆(L) (∆(L) )T = Λ(L) (1.21) E cp(cU , cV ) = P r(U = u, V = v)cp(cu , cv ) and hence Λ and Λ(L) are positive semideﬁnite. u,v∈U Therefore by (1.19) LC (P, P ) is (∪)-convex in P . = P r(U = u, V = v)× Furthermore for sources (U, P ) with |U| = 2k and the u,v∈U block code C = {0, 1}k the uniform distribution on U achieves min LC (P, P ).1 1U (C,w) (u)1U (C,w)(v) P Another interesting observation on (1.20) is that as the w- w (L) (L) th component of PC ∆(L) (resp. QC ∆(L) ) is P U(C, w) = P r U ∈ U(C, w), V ∈ U(C, w) (resp. Q U(C, w) ), application of the Cauchy-Schwarz in- w equality to (1.20) yields and by (1.11). 2 (L) (L) PC Λ(L) (QC )T E WC (U, V ) = P r U ∈ U(C, w), V ∈ U(C, w) . (1.13) (L) (L) (L) (L) w ≤ PC Λ(L) (PC )T · QC Λ(L) (QC )T (1.22) C. A matrix notation and equality holds iff for all w Next we look at the double inﬁnite matrix P U(C, w) = Q U(C, w) . Λ = cp(w, w′ ) w∈X ∗ ,w ′ ∈X ∗ (1.14) We state this in equivalent form as and its minor Λ(L) labelled by sequences in X ≤L . Lemma 1: Henceforth we assume that U and V are independent and LC (P, Q)2 ≤ LC (P, P )LC (Q, Q) (1.23) have distributions P and Q. We can then use (1.11) For a preﬁx code C P induces the distribution PC and Q and equality holds iff for all w induces the distribution QC , when for u, v ∈ U P U(C, w) = Q U(C, w) , PC (cu ) = Pu , QC (cv ) = Qv (1.15) which implies LC (P, Q) = LC (P, P ) = LC (Q, Q). and This suggests to introduce PC (x) = QC (x) = 0 for x ∈ X ∗ C. (1.16) LC (P, Q)2 µC (P, Q) = ≤1 LC (P, P )LC (Q, Q) Viewing both, PC and QC as row vectors, then for the corre- sponding column vector QT equation (1.11) can be written in C as a measure of similarity of sources P and Q with respect to the form the code C. LC (P, Q) = PC ΛQT . C (1.17) Intuitively we feel that for a good code for source P and Q as user distribution P and Q should be very dissimilar, It is clear from (1.10) that a non-complete preﬁx code, that is because then the user waits less time until he knows that the one for which the Kraft sum is smaller than 1, can be improved output of U is not what he wants. for identiﬁcation by shortening a suitable codeword. Hence an This idea will be used later for code construction. Actually optimal ID source code is necessarily complete. In such a it is clear even in the general case where U and V are not code necessarily independent. max cu ≤ |U| − 1 (1.19) To simplify the discussion we assume here that the alphabet u∈U X is binary, i.e. q = 2. and one can replace Λ by its submatrix Λ(L) for L = |U| − 1. This implies 1 A proof is given in the forthcoming Ph.D. thesis “L-identiﬁcation for (L) (L) sources” written by C. Heup at the Department of Mathematics of the LC (P, Q) = PC Λ(L) (QC )T , (1.20) University of Bielefeld. 3 Then the ﬁrst bit of a codeword partitions the source U into a tree T, where Ui corresponds to the set of codewords Ci , ¯ ¯ two parts U(i1 ); i1 = 0, 1; where U(i1 ) = {u ∈ U : cu1 = i1 }. U0 ∪ U1 = U. By (1.13) to minimize E WC (U, V ) one has to choose a The leaves are labelled so that U0 = {1, 2, . . . , N0 } and ¯ ¯ partition such that P r U ∈ U(i1 ), V ∈ U(i1 ) ’s are small U1 = {N0 + 1, . . . , N0 + N1 }, N0 + N1 = N . Using simultaneously for i1 = 0, 1. To construct a good code one probabilities ¯ ¯ can continue this line: partition U (i1 ) to U(i1 , i2 )’s such that Qi = Pu , i = 0, 1 ¯ ¯ ¯ ¯ P r U ∈ U(i1 , i2 ), V ∈ U(i1 , i2 ) | U ∈ U (i1 ), V ∈ U(i1 ) ’s u∈Ui are as small as possible for i1 , i2 = 0, 1 and so on. When U and V are independent the requirement for a we can give the decomposition in ¯ good code is that the difference between P U(i1 , . . . , ik ) and Lemma 2: [4] For a code C for (U, P N ) ¯ Q U(i1 , . . . , ik ) is large. LC ((P1 , . . . , PN ), (P1 , . . . , PN )) = We call this the LOCAL UNBALANCE PRINCIPLE in P1 PN0 P1 PN0 contrast to the GLOBAL BALANCE PRINCIPLE below. 1 + LC0 ,..., , ,..., Q2 + 0 Another extremal case is that U and V are equal with Q0 Q0 Q0 Q0 probability one and in this case one may never use the PN0 +1 PN0 +N1 PN0 +1 PN0 +N1 LC1 ,..., , ,..., Q2 . 1 unbalance principle. However in this case the identiﬁcation Q1 Q1 Q1 Q1 for the source makes no sense: The user knows that his This readily yields output deﬁnitely comes! But still we can investigate the Theorem 1: [4] For every source (U, P N ) problem by assuming that with high probability U = V . More speciﬁcally, we consider the limit of E WC (Uk , Vk ) L(P N ) ≥ L(P N , P N ) ≥ HI (P N ). for a sequence of random variables (Uk , Vk )∞ such that Proof: We proceed by induction on N . The base case k=1 Uk converges to Vk in probability. Then it follows from N = 2 can be established as folows. For N = 2 and any Proposition 1 that E WC (Uk , Vk ) converges to the average C LC (P 2 , P 2 ) ≥ P1 + P2 = 1, but length of codewords, the classical object in source coding! 2 HI (P 2 ) = 2(1 − P1 − (1 − P1 )2 ) In this sense identiﬁcation for sources is a generalization of 2 source coding (data compression). = 2(2P1 − 2P1 ) = 4P1 (1 − P1 ) ≤ 1. One of the discoveries of [4] is that ID-entropy is a lower For the induction step use for any code C the decomposition bound to LC (P, P ). In Section 2 we repeat the original proof formula in Lemma 2 above and of course the desired inequality and we give in Section 3 another proof of this fact via two for N0 and N1 as induction hypothesis. basic tools, Lemma 3 and Lemma 4 for LC (P n , P n ), where P n is the distribution of a memoryless source. It provides a LC ((P1 , . . . , PN ), (P1 , . . . , PN )) q clear information theoretical meaning of the two factors q−1 2 1− 2 Pu Pu and of ID-entropy. ≥1+2 1− Q2 0 u∈U Q0 u∈U0 Next we consider in Section 4 sufﬁcient and necessary 2 conditions for a preﬁx code C to achieve the ID-entropy Pu +2 1− Q2 1 lower bound for LC (P, P ). Quite surprisingly it turns out Q1 u∈U1 that the ID-entropy bound for ID-time is achieved by a variable length code iff the Shannon entropy bound for ≥ HI (Q) + Q2 HI (P (0) ) + Q2 HI (P (1) ) = HI (P N ), 0 1 the average length of codewords is achieved by the same code (Theorem 2). where Q = (Q0 , Q1 ), 1 ≥ HI (Q), P (i) = Pu , and the Qi Finally we end the paper in Section 5 with a global balance u∈Ui grouping identity is used for the equality. This holds for every principle to ﬁnd good codes (Theorem 3). C and therefore also for min LC (P N ). C The approach readily extends also to the q-ary case. II. A N OPERATIONAL JUSTIFICATION OF ID- ENTROPY AS LOWER BOUND FOR LC (P, P ) III. A N ALTERNATIVE PROOF OF THE ID- ENTROPY LOWER Recall from the Introduction that for q = 2 BOUND FOR LC (P, P ) N 2 First we establish Lemma 3 below, which holds for the more HI (P ) = 2 1 − Pu for P = (P1 . . . PN ). ∞ general case E WC (U, V ). Let (U n , V n ) n=1 be a discrete u=1 memoryless correlated source with generic pair of variables We repeat the ﬁrst main result for L(P, P ) from [4]. (U, V ). Again U n serves as (random) source and V n serves as Central in our derivation is a proof by induction based on random user. For a given code C for (U, V ) let C n be the code a decomposition formula for trees. obtained by encoding the components of sequence un ∈ U n Starting from the root a binary tree T goes via 0 to the iteratively. That is, for all un ∈ U n subtree T0 and via 1 to the subtree T1 with sets of leaves U0 and U1 , respectively. A code C for (U, P ) can be viewed as cnn = (cu1 , cu2 , . . . , cun ). u (3.1) 4 Lemma 3: Proof: For given ε > 0 we choose δ > 0 such that for a n n−1 τ > 0 and sufﬁciently large n for familiar sets TP,δ of typical E WC n (U n , V n ) = E WC (U, V ) 1 + P r(U t = V t ) sequences t=1 n P n (TP,δ ) > 1 − 2−nτ (3.2) and therefore n and for all un ∈ TP,δ ε E WC (U, V ) P (un ) < 2−n(H(P )− 2 ) . lim E WC n (U n , V n ) = . (3.3) n→∞ 1 − P r(U = V ) n Since for a preﬁx code C Proof: Since P r(U n = V n ) = P r(Ut = Vt ) = t=1 |{un ∈ U n : cU n ≤ Ln }| ≤ q Ln (3.10) P rn (U = V ) (3.3) follows from (3.2) immediately by the summation formula for geometric series. To show (3.2) we deﬁne ﬁrst for all t ≥ 2 random variables P r( cU n ≤ Ln ) = P r( cV n ≤ Ln ) t−1 t−1 / n n ≤ P r(V n ∈ TP,δ ) + P r(V n ∈ TP,δ , cV n ≤ Ln ) 0 if U =V ε Zt = (3.4) < 2−nτ + |{un : cun < Ln }| · 2−n(H(P )− 2 ) 1 otherwise. ε ≤ 2−nτ + q Ln 2−n(H(P )− 2 ) . (3.11) and for t = 1 we let Z1 be a constant for convenience of notation. Further we let Wt be the waiting time for the random However, (3.8) implies that user V n in the t-th block. q Ln ≤ 2n(H(P )−ε) . Conditional on Zt = 1 it is deﬁned like WC (U, V ) in (1.9) and conditional on Zt = 0 obviously P r(Wt = 0 | Zt = 0) = This together with (3.11) yields 1, because the random user has made his decision before the ε t’s step. Moreover by the deﬁnition of C n P r( cU n ≤ Ln ) < 2−nτ + 2−n 2 < 2−nδ (3.12) E[Wt | Zt = 1] = E WC (U, V ) (3.5) for δ min τ , 4 . 2 ε Next, for the distribution P and the code C over U n we and consequently ˜ ˜ ˜ ˜ construct a related source (U , P ) and a code C over U as follows. E[E(Wt | Zt )] ˜ The new set U contains {un ∈ U n : cun ≤ Ln } and P r(U t−1 = V t−1 )E WC (U, V ) for t = 2, 3, . . . , n ˜ for its elements P (un ) = P n (un ) and the new ∼-coding is = ˜ cu n = cun . E WC (U, V ) for t = 1 ˜ ˜ Now we deﬁne the additional elements in U with its P and (3.6) ˜ c. where (3.6) holds in case t = 1, because the random user has We partition {un ∈ U n : cun > Ln } into subsets Sj (1 ≤ to wait for the ﬁrst outcome. Therefore it follows that j ≤ J) according to the Ln -th preﬁx and use letter gj to n n ≈ ˜ represent Sj and put the set U = {gj : 1 ≤ j ≤ J} into U so EWC n (U n , V n ) = EW n = EWt = E[E(Wt | Zt )] that t=1 t=1 ≈ n−1 ˜ U = {un ∈ U n : cun ≤ Ln } ∪ U . = EWC (U, V ) + P r(U t , V t )EWC (U, V ) ˜ Then we deﬁne P (gj ) = P (un ) and let cgj be the ˜ t=1 un ∈Sj as we wanted to show. common Ln -th preﬁx of the cun ’s for the un ’s in Sj . That Next we consider the case where U and V are independent is, we consider all un sharing the same Ln -th preﬁx in cun as and identically distributed with distribution P so that a single element. Obviously, n ˜ ˜ LC (P n , P n ) ≥ LC (P , P ). ˜ (3.13) n n n n P r(U = u , V =v )= Put · Pvt . (3.7) t=1 ˜ ˜ Finally let Un and Vn be random variables for the new source ˜ and new random user with distribution P and let Z be a More speciﬁcally we are looking for a lower bound on random variable such that LC (P n , P n ) for all preﬁx codes C over U n . Lemma 4: For all ε > 0 there exists an η > 0 such that for 0 if both cU n and cVn are larger than Ln sufﬁciently large n and all positive integers Z= 1 otherwise. Ln = ⌊n H(P ) − ε (log q)−1 ⌋ (3.8) Then for all preﬁx codes C over U n ˜ ˜ LC (P , P ) = EW = E(W | Z) ≥ P r(Z = 0)E(W | Z = 0) ˜ Ln −1 ≈ ≈ = P r( cU n ≥ Ln )P r( cV n ≥ Ln ) · L≈ (P , P ) LC (P n , P n ) > (1 − 2−nη ) q −t . (3.9) C t=0 (3.14) 5 ≈ where W is the random waiting time, P is the common Letting n → ∞ we obtain a geometric distribution. ≈ ˜ ˜ ˜ conditional distribution of Un given Un ∈ U , and Vn given The expected waiting time is ≈ ≈ ˜ ≈ ≈ ˜ P (g) ∞ Vn ∈ U, i.e. P (gj ) = ≈ for gj ∈ U and C is the restriction ˜ P (U) EW = ℓP r(U = V )ℓ−1 1 − P r(U = V ) ≈ ˜ of C to U . ≈ ℓ=0 ∞ ∞ Notice that C is a block code of length Ln . In order to bound = (ℓ + 1)P r(U = V )ℓ − P r(U = V )ℓ ≈ ≈ ≈ Ln L≈ (P , P ) we extend U to a set of cardinality q in the case ℓ=0 ℓ=0 C ∞ of necessity and assign zero probabilities and a codeword of 1 ≈ = P r(U = V )ℓ = (3.19) length Ln not in C. This little modiﬁcation obviously does not 1 − P r(U = V ) ≈ ≈ ℓ=0 change the value of L≈ (P , P ). Thus, if we denote the uniform 1 C ≈ which equals 1− P 2 Pu in the case of independent, identically ¯ ¯ distribution over the extended set U by P , we have u distributed random variables. ≈ ≈ (Actually (3.2) holds for all stationary sources and we ¯ ¯ L≈ (P , P ) ≥ L≈ (P , P ) (3.15) C ¯ C choose a memoryless source for simplicity.) In general (3.3) ≈ ≈ has the form ¯ ¯ where C is a bijective block code U → X Ln . ˜ ω) = ∅ iff the length of ω is smaller It is clear that U(C, lim EWC n (U n , V n ) n→∞ than Ln − 1 and n−1 ˜ U(C, ω) = X L∗ −1 , if ω = ℓ ≤ Ln − 1. = EWC (U, V ) · lim 1+ P r(U t = V t ) . (3.20) n→∞ t=1 Then it follows from (1.13) that Ln −1 Ln By monotonicity the limit at the right hand side and therefore ¯ ¯ L≈ (P , P ) = t q [q Ln −t ·q −Ln 2 ] = q −t . (3.16) also at the left hand side exists and equals a positive ﬁnite or ¯ C inﬁnite value. t=0 t=0 Finally we combine (3.12), (3.13), (3.14), (3.15) and (3.16) When it is ﬁnite one may replace P r(U = V )t−1 , P r(U = and Lemma 4 follows. V ) and P r(U = V )t in the ﬁrst lines of (3.19) by P r(U t−1 = An immediate consequence is V t−1 ), P r(Ut = Vt | U t−1 = V t−1 ) and P r(U t = V t ), Corollary 1: respectively, and obtain ∞ n−1 q lim L(P n , P n ) ≥ . q −t = (3.17) lim 1+ P r(U t = V t ) n→∞ t=0 q−1 n→∞ t=1 Furthermore for independent, identically distributed random ∞ variables U, V with distribution P we have = tP r(U t−1 = V t−1 ) · P r(Ut = Vt | U t−1 = V t−1 ) 2 P r(U = V ) = Pu t=0 u∈U = EL, (3.21) and from (3.3) and (3.17) follows the ID-entropy bound. the expectation of random leaving time L for a stationary Corollary 2: (See Theorem 2 of [4]) source. q 2 Thus (3.20) is rewritten as LC (P, P ) ≥ 1− Pu . (3.18) q−1 u∈U lim E WC n (U n , V n ) = E WC (U, V )EL. (3.22) This derivation provides a clear information theoretical mean- n→∞ q ing to the two factors in ID-entropy: q−1 is a universal lower Now the information theoretical meaning of (3.22) is quite bound on the ID-waiting time for a discrete memoryless source clear. One encodes a source (U n , V n )∞ with alphabet U n=1 with an independent user having the same distribution P . 1 P 2 is the cost paid for coding the source componen- component by component by a variable length code C. The ﬁrst 1− P u term at the right hand side of (3.22) is the expected waiting u∈U twise and leaving time for the random user in the following time in a block and the second term is the expected waiting sense. time for different Ut and Vt . Let us imagine the following procedure: At a unit of time the random source U n outputs a symbol IV. S UFFICIENT AND NECESSARY CONDITIONS FOR A Ut and the random user V n , who wants to know whether PREFIX CODE C TO ACHIEVE THE ID- ENTROPY LOWER U n = V n , checks whether Ut coincides with his own symbol BOUND OF LC (P, P ) Vt . He will end if not. Then the waiting time for him is ℓ with probability Quite surprisingly the ID-entropy bound to ID-waiting time P r(U ℓ−1 =V ℓ−1 )P r(Uℓ = Vℓ ) is achieved by a variable length code iff the Shannon entropy bound to the average lengths of codewords is achieved by the = P r(U = V )ℓ−1 1 − P r(U = V ) for ℓ ≤ n. same code. 6 For the proof we use a simple consequence of the Cauchy- alphabet U(α) (C) and distribution P(α) such that for all u ∈ Schwarz inequality, which states for two sequences of real U(α) (C) and X ′ = {cu : u ∈ U1 (C)} numbers (a1 , a2 , . . . , ak ) and (b1 , b2 , . . . , bk ) that P(α) (u) = P −1 U(α) (C) Pu . k 2 k k ai b i ≤ a2 b2 (4.1) Then (4.3) and (4.4) imply that (ii) holds for all C(α) , α ∈ i i i=1 i=1 i=1 U1 (C) and for all β ∈ U1 (C) with equality iff for some constant, say γ, ai = γ bi for all i Pβ = |U1 (C)|−1 P U1 (C) . (4.7) or bi = c ai for all i. Choosing bi = 1 for all i one has Next we apply (4.3) to all ω with U(C, ω) and ω = 1 and obtain k 2 k P r U ∈ U1 (C) = q − |U1 (C)| q −1 , / (4.8) ai ≤k a2 i (4.2) i=1 i=1 which with (4.7) yields for all β ∈ U1 (C) with equality iff a1 = a2 = · · · = ak . Pβ = q −1 . (4.9) Theorem 2: Let C be a preﬁx code. Then the following statements are equivalent Moreover, by the induction hypothesis for all C(α) and P(α) , (i) Pu cu = H(P ) α ∈ U1 (C) u∈U (ii) For all ω ∈ X ∗ with U(C, ω) = ∅ q LC(α) (P(α) , P(α) ) = 1 − q2 2 Pu (4.10) P U(C, ω) = q − ω (4.3) q−1 u∈U(α) (C) ′ and for all u, u ∈ U such that cu = cu′ and such as by (4.3) that cu and cu′ share the same preﬁx of length cu − 1 P U(α) (C) = q −1 (4.11) implies Pu = Pu′ . (4.4) for all α ∈ X ∆ = X {cu : u ∈ U1 (C)} (say). Finally, like in the proof of (1.11) we have (iii) q LC (P, P ) = 1 + P 2 U(α) (C) LC(α) (P(α) , P(α) ) 2 LC (P, P ) = 1− Pu . (4.5) α∈X ∆ q−1 u∈U Proof: It is well-known that (i) is equivalent to 1 1 − q 2 2 =1+ Pu (i’) For all u ∈ U q(q − 1) α∈X ∆ u∈U(α) (C) cu = −[log q]−1 log Pu or Pu = q − cu . (4.6) ∆ |X | q 2 =1+ − Pu Notice that for (i) the code C is necessarily complete. We shall q(q − 1) q − 1 / u∈U1 (C) show that q − |U1 (C)| q 2 q (i′ ) ⇒ (ii) ⇒ (iii) ⇒ (i′ ). =1+ − Pu + |U1 (C)|q −2 q(q − 1) q−1 q−1 u∈U Ad (i’) ⇒ (ii): For all ω with U(C, ω) = ∅ the code q 2 Cω obtained by deleting the common preﬁx ω from all the = 1− Pu , that is (4.5), q−1 codewords cu , u ∈ U(C, ω), is a complete code on U(C, ω), u∈U because C is a complete code. That is, where the second equality holds by (4.10), the third equality holds, because {U1 (C), U(α) (C), α ∈ X ′ } is a partition of U, q −[ cu − ω ] =1 and the fourth equality follows from (4.9) and the deﬁnition u∈U (C,ω) of X ∆ . and consequently by (4.6) Ad (iii) ⇒ (i’): Again we proceed by induction on the maximum length of codewords. P U(C, ω) = Pu = q− cu Suppose ﬁrst that for a code C ℓmax (C) = 1. Then u∈U (C,ω) u∈U (C,ω) LC (P, P ) = 1 and |U| ≤ q. Applying (4.2) to the ID-entropy − ω =q q( cu − ω ) = q− ω . we get u∈U (C,ω) q 2 q Ad (ii) ⇒ (iii): Suppose (4.3) holds for all ω and we prove 1− Pu ≤ (1 − |U|−1 ) q−1 q−1 u∈U (iii) by induction on ℓmax (C) = max cu . u∈U In case ℓmax (C) = 1 both sides of (4.5) are one. Assume with equality iff P is the uniform distribution. On the other q q 1 (iii) holds for all codes C ′ with ℓmax (C ′ ) ≤ L − 1 and let hand, since |U| ≤ q, q−1 (1 − |U|−1 ) ≤ q−1 1 − q = 1 ℓmax (C) = L. Let U1 (C) and U(α) (C), be as in the proof of and the equality holds iff |U| = q. Then (4.5) holds iff P is (1.11) and let C(α) be the preﬁx code for the source with uniform and |U| = q, i.e. (4.6). 7 Assume now that the implication (iii) ⇒ (i’) holds for all By (4.13) the ﬁrst inequality holds iff Pu = q − cu for q−1 codes with maximum lengths ≤ L − 1 and that C is a preﬁx u ∈ U ∩ U ′ and Pu(i,j) = q −(L−1) for i = 1, 2, . . . , k; code of maximum length ℓmax (C) = L. j=0 Without loss of generality we can assume that C is complete, it follows from (4.2) that the last inequality holds and with because otherwise we can add “dummy” symbols with 0 equality iff probability to U and assign to them suitable codewords so that the Kraft sum equals 1, but this does not change equality Pu(i,0) = Pu(i,1) = · · · = Pu(i,q−1) for i = 1, 2, . . . , k. (4.5). In order to have Having completeness we can assume that for (ak) ≤ q L−1 there are kq symbols u(i, j) (1 ≤ i ≤ k, 0 ≤ j ≤ q − 1) in U q 2 LC (P, P ) = 1− Pu with cu(i,j) = L and such that cu(i,0) , cu(i,1) , . . . , cu(i,q−1) q−1 u∈U share a preﬁx ωi of length L − 1 for i = 1, 2, . . . , k. Let u(1), . . . , u(k) be k “new symbols” not in the original the two inequalities in (4.15) must be equalities. However, this U and consider is equivalent with (4.6), i.e. (i’). U ′ = U{u(i, j) : 1 ≤ i ≤ k, 0 ≤ j ≤ q − 1} V. A GLOBAL BALANCE PRINCIPLE TO FIND GOOD CODES ∪ {u(i) : 1 ≤ i ≤ k} In case U and V are independent and identically distributed and the probability distribution P ′ deﬁned by there is no gain in using the local unbalance principle (LUP). Pu′ if u′ ∈ U ∩ U ′ But in this case Corollary 1 and (4.2) provide a way to ﬁnd a ′ Pu′ = q−1 (4.12) good code. We ﬁrst rewrite Corollary 1 as Pu(i,j) if u′ = u(i) for some i. P r U ∈ U(C, ω), V ∈ U(C, ω) . j=0 EWC (U, V ) = ′ ′ ′ n ω∈X n Next we deﬁne a preﬁx code C for the source (U , P ) by using C as follows: By the assumptions on U and V with their distribution P cu′ if u′ ∈ U ∩ U ′ LC (P, P ) = P 2 U(C, ω) . (5.1) c′ ′ = u (4.13) ωi if u′ = u(i) for some i. n ω∈X n Then for u′ ∈ U ∩ U ′ c′ ′ = cu′ and c′ u ′ u(1) = cu(2) = Notice that in case Pn,C P U(C, ω) is a con- ′ ω∈X n · · · = cu(k) = L − 1. Therefore by induction hypothesis stant P 2 U(C, ω) is minimized by choosing the ω∈X n q P U(C, ω) ’s uniformly. This gives us a global balance prin- LC ′ (P ′ , P ′ ) ≥ 1− ′2 Pu′ (4.14) ciple (GBT) for ﬁnding good codes. q−1 u′ ∈U ′ We shall see the roles of both, the LUP and the GBP in the and equality holds iff Pu = q − cu for u ∈ U ∩ U ′ and proof of the following coding theorem for DMS’s. q−1 Theorem 3: For a DMS (U n , V n )∞ with generic distri- n=1 ′ Pu(i,j) = Pu(i) = q −(L−1) for i = 1, 2, . . . , k. Further- j=0 bution PUV = P Q, i.e. the generic random variables U and more, it follows from (4.2) and the deﬁnition of LC (P, P ) V are independent and PU = P , PV = Q and LC ′ (P ′ , P ′ ) that 1if P = Q 2 k q−1 lim L(P n , Qn ) = q (5.2) n→∞ if P = Q. q−1 LC (P, P ) = LC ′ (P ′ , P ′ ) + Pu(i,j) Proof: Trivially LC (P, Q) ≥ 1 and by Corollary 2 q q−1 i=1 j=0 n n is a lower bound to lim L(P , P ). Hence we only have to k n→∞ construct codes to achieve asymptotically the bounds in (5.2). = LC ′ (P ′ , P ′ ) + ′2 Pu(i) Case P = Q: We choose a δ > 0 so that for sufﬁciently i=1 k large n q ′2 ′2 n n TP,δ ∩ TQ,δ = ∅ (5.3) ≥ 1− Pu′ + Pu(i) q−1 i=1 u′ ∈U ′ and for a θ > 0 k q 2 q ′2 = 1− Pu + 1− Pu(i) n n P (TP,δ ) > 1 − 2nθ and Q(TQ,δ ) > 1 − 2nθ . (5.4) q−1 i=1 q−1 u∈U ∩U ′ 2 Partition U n into two parts U0 and U1 such that U0 ⊃ TP,δ n k q−1 n q 2 and U1 ⊃ TQ,δ . = 1 − Pu − q −1 Pu(i,j) q−1 To simplify matters we assume q = 2. This does not loose u∈U ∩U ′ i=1 j=0 generality since enlarging the alphabet cannot make things q worse. 2 ≥ 1− Pu . (4.15) Let ℓi = ⌈log |Ui |⌉ and ψi : Ui → 2[ℓi ] for i = 1, 2. Then q−1 u∈U we deﬁne a code C by cun = i, ψi (un ) if un ∈ Ui and show 8 that LC (P n , Qn ) is arbitrarily close to one if n is sufﬁciently If the type of un is not in Pn (> α), we arbitrarily choose a large. Actually it immediately follows from Proposition 1 sequence in X ℓn as ψ1 (un ). ˜ For any ﬁxed t ≤ ℓn , P ∈ Pn (> α), and xt ∈ X t let LC (P n , Qn ) = P n (cun )Qn (cu′n )cp(cun , cu′n ) ˜ n U(P , xt ) be the set of sequences in TP such that xt is a ˜ un ,u′n ∈U n n preﬁx of ψ1 (u ). Then it is not hard to see that for all xt , x′t = P n (cun )Qn (cu′n )cp(cun , cu′n ) with t ≤ ℓn un ∈U0 u′n ∈U0 ˜ ˜ |U(P , xt )| − |U(P , x′t )| ≤ 1. + P n (cun )Qn (cu′n )cp(cun , cu′n ) un ∈U0 u′n ∈U1 More speciﬁcally for all t ≤ ℓn and xt ∈ X t + P n (cun )Qn (cu′n )cp(cun , cu′n ) k k un ∈U u′n ∈U ˜ |U(P , xt )| = aj q j−1−t or aj q j−1−t + 1, 1 0 j=t+1 j=t+1 + P n (cun )Qn (cu′n )cp(cun , cu′n ) un ∈U 1 u′n ∈U 1 k n n if |TP | = ˜ aj q j−1 with ak = 0, 0 ≤ aj ≤ q − 1 for < ℓ0 P (cun ) Qn (cu′n ) + P n (cun )× j=1 un ∈U0 u′n ∈U0 un ∈U0 j = 1, 2, . . . , k − 1. Let U(xt ) = ˜ U(P , xt ) (here it does not matter whether Qn (cu′ ) + P n (cun ) Qn (cu′n ) ˜ n all P u′n ∈U1 un ∈U1 u′n ∈U0 ˜ P ∈ Pn (> α) or not). + ℓ1 P n (cun ) Qn (cu′n ) Thus we partition U n into q t parts as {U(xt ) : xt ∈ X t } un ∈U1 u′n ∈U1 for t ≤ ℓn . By the AEP (the asymptotic equipartition property) the ≤ P n (cun ) Qn (cu′n ) + P n (cun )× difference of the conditional probability of the event that the un ∈U 0 u′n ∈U 1 un ∈U 1 output of U n is in U(xt ) given that the type of U n is in Pn (> α) and q −1 is not larger than Qn (cu′n ) + ⌈n log |U|⌉ (cun )× u′n ∈U0 un ∈U0 min |TP |−1 < 2−nα . ˜ ˜ P ∈Pn (>α) Qn (cu′n ) + P n (cun ) Qn (cu′n ) Recalling that with probability 1 − 2−nθ U n has type in Pn (> u′n ∈U0 un ∈U1 u′n ∈U0 α) and the assumption that V n has the same distribution as U n , we obtain that ≤ 1 + ⌈n log |U|⌉ Qn (cu′n ) + P n (cun ) u′n ∈U 0 un ∈U 1 P r U n ∈ U(xt ) = P r V n ∈ U(xt ) = P n U(xt ) and therefore and for all xt ∈ X t LC (P n , Qn ) < 1+⌈n log |U|⌉2−nθ+1 → 1 as n → ∞, (5.5) (1−2−nθ )(q −t − 2−nα ) where the second inequality holds because ≤ P n U(xt ) ≤ (1 − 2−nθ )(q −t + 2−nα ) + 2−nθ , which implies that for all xt ∈ X t ℓi = ⌈log |Ui |⌉ ≤ ⌈log |U n |⌉ for i = 0, 1 |P n U(xt ) − q −t | ≤ 2−nθ + 2−nα < 2−2nβ , (5.8) and the last inequality follows from (5.4). 1 Case P = Q: Now we let P = Q. For 0 < α < H(P ) let when β 4 min(θ, α). Pn (> α) be the set of n-types (n-empirical distributions) P ˜ Recall that Ψ1 is a function from U n to X ℓn and that the n nα on U with |TP | > 2 . Then there is a positive θ such that ˜ deﬁnition of U(xt ), U(xℓn ) is actually the inverse image of the empirical distribution of the output U n (resp. V n ) is in X ℓn under Ψ1 , i.e. U(X ℓn ) = Ψ−1 (X ℓn ). 1 Pn (> α) with probability larger than 1 − 2nθ . log |U (xn )| Let furthermore ℓ∗ (xℓn ) log q and let Ψ2 be Next we choose an integer ℓn such that for a function on U n such that its restriction on U(xℓn ) is an ∗ ℓn 1 n injection into X ℓ (x ) for all xℓn . Then our decoding function β min(θ, α) 2 2 β < q ℓn ≤ 2nβ . (5.6) 4 is deﬁned as n ˜ n c = (Ψ1 , Ψ2 ). (5.9) Label sequences in TP for P ∈ Pn (> α) by 0, 1, . . . , |TP | − ˜ ˜ n ℓn 1 and let Ψ1 be a mapping from U to X , where X = To estimate LC (P n , P n ) we introduce an auxiliary source with {0, 1, . . . , q − 1} as follows. alphabet X ℓn and probability distribution P ∗ such that for all ˜ If un has type P in Pn (> α) and got an index ind(un ) with xℓn ∈ X ℓn q-ary representation (xk , xk−1 , . . . , x2 , x1 ) i.e. ind(un ) = P ∗ (xℓn ) = P n U(xℓn ) . k n xi q i−1 for 0 ≤ xi ≤ q − 1, k = ⌈log |TP |⌉, then let ˜ We divide the waiting time for identiﬁcation with code C into i=0 two parts according to the two components Ψ1 and Ψ2 in Ψ1 (un ) = (x1 , x2 , . . . , xℓn ). (5.7) (5.9), and we let W1 and W2 be the random waiting times 9 of the two parts, respectively. Now let Z be a binary random Finally by combining (5.10), (5.11), (5.12), and (5.13) with variable such that the choice of β in (5.6) we have that 0 if Ψ1 (U n ) = Ψ1 (V n ) q Z= lim LC (P n , P n ) ≤ , 1 otherwise. n→∞ q−1 Then the desired inequality. It is interesting that the limits of the waiting time of ID- LC (P n , P n ) = E(W1 + W2 ) = E W1 + E E(W2 | Z) codes in the left hand side of (5.2) are independent of the = E W1 + P r(Z = 1)E(W2 | Z = 1) generic distributions P and Q and only depend on whether they are equal. = E W1 + P n Ψ1 (U n ) = xℓ1 P n Ψ1 (V n ) = xℓn In the case that they are not equal it is even independent x ℓn of the alphabet size. In particular in case P = Q, we have × E(W2 | Z = 1) seen in the proof that the key step is how to distribute the ﬁrst symbol and the local unbalance principle (LUP) is applied in 2 = E W1 + P n U(xℓn ) E(W2 | Z = 1). the second step. Moreover for a good code the random user x ℓn with exponentially vanishing probability needs to wait for the (5.10) second symbol. So the remaining parts of codewords are not ∗ Let C be the code for the auxiliary source with encoding so important. function c∗ = Ψ1 . Then we have that Similarly in the case P = Q, where we use instead of the LUP the GBP, the key parts of codewords is a relatively small E W1 = LC ∗ (P ∗ , P ∗ ) (5.11) preﬁx (in the proof it is the ℓn -th preﬁx) and after that the and with the notation in Corollary 1 U(C ∗ , xt ) = U(xt ) and user with exponentially small probability has to wait. Thus P ∗ U(C ∗ , xt ) = P n U(xt ) for xt ∈ X t with t ≤ ℓn . For again the remaining part of codewords is less important. all xt ∈ X t , t ≤ ℓn , we denote δ(xt ) = q −t − pn U(xt ) . Then we have for all t ≤ ℓn δ(xt ) = 0 and by (5.8) A PPENDIX I xt ∈X t C OMMENTS ON GENERALIZED ENTROPIES δ(xt ) < 2−2nβ . Now we apply Corollary 1 to estimate After the discovery of ID-entropies in [4] work of Tsallis ℓn [13] and also [14] was brought to our attention. The equalities LC ∗ (P ∗ , P ∗ ) = P ∗ U(C ∗ , xt ) 2 (1) and (2) in [14] are here (A.1) and (A.2). The letter q used t=0 xt ∈X t there corresponds to our letter α, because for us q gives the ℓn ℓn alphabet size. The generalization of Boltzmann’s entropy 2 2 = P n U(xt ) = q −t − δ(xt ) H(P ) = −k Pu lnPu t=0 xt ∈X t t=0 xt ∈X t ℓn is t −2t = q ·q − 2q −t δ(xt ) + δ(xt )2 1 N α t=0 xt ∈X t xt ∈X t Sα (P ) = k 1− Pu (A.1) ℓn ℓn ∞ α−1 u=1 ℓn +1 q − 1 −4nβ ≤ q −t + q t · 2−4nβ < q −t + 2 for any real α = 1. Notice that lim Sα (P ) = H(P ), which t=0 t=0 t=0 q−1 α→1 q 1 ℓn +1 −4nβ can be named S1 (P ). < + q 2 . (5.12) One readily veriﬁes that for product-distributions P × Q for q−1 q−1 independent random variables Moreover by deﬁnition of Ψ2 and W2 n log |U| (α − 1) Sα (P ×Q) = Sα (P )+Sα (Q)− Sα (P )Sα (Q) (A.2) E W2 | Z = 1) ≤ k log q and in (5.12) we have shown that Since in all cases Sα ≥ 0, α < 1, α = 1 and α > 1 respectively correspond to superadditivity, additivity and 2 P n U(xℓn ) ≤ q −ℓn + q ℓn · 2−4nβ . subadditivity (also called for the purposes in statistical physics x ℓn superextensitivity, extensitivity, and subextensitivity). Consequently We recall the grouping identity of [4]. 2 For a partition (U1 , U2 ) of U = {1, 2, . . . , N }, Qi = P n U(xt ) E(W2 | Z = 1) (i) Pu u∈Ui Pu and Pu = Qi for u ∈ Ui (i = 1, 2) xℓn ∈X ℓn n log |U| P (i) ≤ [q −ℓn + q ℓn 2−4nβ ] . (5.13) HI,q (P ) = HI,q (Q) + Q2 HI,q ( i ) (A.3) log q Qi i 10 where Q = (Q1 , Q2 ). This implies [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ]. So, while the entropies of order α and the entropies of type HI,q (P × Q) = HI,q (Q) + Q2 HI,q (P ) j α are different for α = 1, we see that the bijection j 1 and since t→ log2 [(21−α − 1)t + 1] 1−α q−1 q q−1 (1 − Q2 ) = j (1 − Q2 ) = j HI,q (Q) connects them. Therefore, we may ask what the advantage q q−1 q j j is in dealing with entropies of type α. We meanwhile also or learned that the book [2] gives a comprehensive discussion. q−1 Also Dar´ czy’s contribution [6], where “type α” is named o Q2 = 1 − j HI,q (Q) j q “degree α”, gives an enlightening analysis. Note that R´ nyi entropies (α = 1) are additive, but not e we get subadditive (except for α = 0) and not recursive, and they q−1 have not the branching property nor the sum property, that HI,q (P × Q) = HI,q (Q)+ HI,q (P )− HI,q (Q)HI,q (P ), q is, there exists a measurable function g on (0, 1) such that (A.4) q N which is (A.2) for α = 2 and k = q−1 . α We have been told by several experts in physics that the HN (P1 , P2 , . . . , PN ) = g(Pi ). i=1 operational signiﬁcance of the quantities Sα (for α = 1) in statistical physics seems not to be undisputed. Entropies of type α, on the other hand, are not additive but In contrast it was demonstrated in [4] (see Section 2) the do have the subadditivity property and the sum property and signiﬁcance of identiﬁcation entropy, which is formally close furthermore are additive of degree α: to, but essentially different from Sα for two reasons: always q H α (P1 Q1 , P1 Q2 , . . . , P1 QN , P2 Q1 , P2 Q2 , . . . , P2 QN , MN α = 2 and k = q−1 is uniquely determined and depends on the alphabet size q! . . . , PM Q1 , PM Q2 , . . . , PM QN ) α α We also have discussed the coding theoretical meanings of = HM (P1 , P2 , . . . , PM ) + HN (Q1 , Q2 , . . . , QN ) N the factors q and 1− 2 Pu . α + (21−α − 1)HM (P1 , P2 , . . . , PM )HN (Q1 , Q2 , . . . , QN ) α q−1 u=1 More recently we learned from referees that already in 1967 [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Q1 , Q2 , . . . , QN ) ∈ P([N ]); α Havrda and Charv´ t [7] introduced the entropies {HN } of type a M = 2, 3, . . . ; N = 2, 3, . . . ]. α: strong additive of degree α: N H α (P1 Q11 , P1 Q12 , . . . , P1 Q1N , P2 Q21 , P2 Q22 , α HN (P1 , P2 , . . . , PN ) = (21−α − 1)−1 ( Piα − 1) (A.5) MN i=1 . . . , P2 Q2N , . . . , PM QM1 , PM QM2 , . . . , PM QMN ) α M [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . , 0 = 0] α = HM (P1 , P2 , . . . , PM ) + α Pjα HN (Qj1 , Qj2 , . . . , QjN ) α lim HN (P1 , P2 , . . . , PN ) = HN (P1 , P2 , . . . , PN ), j=1 α→1 the Boltzmann/Gibbs/Shannon entropy. So, it is reasonable to [(P1 , P2 , . . . , PM ) ∈ P([M ]), (Qj1 , Qj2 , . . . , QjN ) ∈ deﬁne P([N ]); j = 1, 2, . . . , M ; M = 2, 3, . . . ; N = 2, 3, . . . ]. recursive of degree α: 1 HN (P1 , P2 , . . . , PN ) = HN (P1 , P2 , . . . , PN ). α α HN (P1 , P2 , . . . ,PN ) = HN −1 (P1 + P2 , P3 , . . . , PN ) This is a generalization of the BGS-entropy different from P1 P2 the R´ nyi entropies of order α = 1 (which according to [2] e α + (P1 + P2 )α H2 ( , ) P1 + P2 P1 + P2 u were introduced by Sch¨ tzenberger [9]) given by [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 3, 4, . . . with P1 +P2 > 0]. N 1 (In consequence entropies of type α also have the branching α HN (P1 , P2 , . . . , PN ) = log2 Piα , 1−α property.) i=1 It is clear now that for binary alphabet the ID-entropy is [(P1 , P2 , . . . , PN ) ∈ P([N ]), N = 2, 3, . . . ]. exactly the entropy of type α = 2. Comparison shows that However, prior to [13] there are hardly any applications or operational justiﬁcations of the entropy of type α. α HN (P1 , P2 , . . . , PN ) Moreover the q-ary case did not exist at all and therefore 1 = α log2 [(21−α − 1)HN (P1 , P2 , . . . , PN ) + 1] the name ID-entropy is well justiﬁed. 1−α We feel that it must be said that in many papers (with and several coauthors) Tsallis at least developed ideas to promote α HN (P1 , P2 , . . . , PN ) non standard equilibrium theory in Statistical Physics using generalized entropies Sα and generalized concepts of inner = (21−α − 1)−1 [2(1−α)α HN (P1 ,P2 ,...,PN ) − 1] energy. 11 Our attention has been drawn also to the papers [5], [11], [12] with possibilities of connections to our work. Recently a clear cut progress was made by C. Heup in his forthcoming thesis with a generalization of ID-entropy motivated by L-identiﬁcation. R EFERENCES [1] S. Abe, Axioms and uniqueness theorem for Tsallis entropy, Phys. Lett. A 271, No. 1-2, 74–79, 2000. e o [2] J. Acz´ l and Z. Dar´ czy, On Measures of Information and their Characterizations, Mathematics in Science and Engineering, Vol. 115, Academic Press, New York-London, 1975. [3] R. Ahlswede, General Theory of Information Transfer, in a special issue “General Theory of Information Transfer and Combinatorics” of Discrete Applied Mathematics, to appear. [4] R. Ahlswede, Identiﬁcation entropy, General Theory of Information Transfer and Combinatorics, Report on a Research Project at the ZIF (Center of interdisciplinary studies) in Bielefeld Oct. 1, 2002 – August a 31, 2004, edit R. Ahlswede with the assistance of L. B¨ umer and N. Cai, to appear. e [5] L.L. Campbell, A coding theorem and R´ nyi’s entropy, Information and Control 8, 423–429, 1965. o [6] Z. Dar´ czy, Generalized information functions, Information and Control 16, 36–51, 1970. a [7] J. Havrda, F. Charv´ t, Quantiﬁcation method of classiﬁcation processes, concept of structural a-entropy, Kybernetika (Prague) 3, 30–35, 1967 . e [8] A. R´ nyi, On measures of entropy and information, Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I pp. 547–561 Univ. California Press, Berkeley, 1961. u [9] M. P. Sch¨ tzenberger, Contribution aux applications statistiques de la thorie de l’information, Publ. Inst. Statist. Univ. Paris 3, No. 1-2, 3– 117, 1954. [10] C.E. Shannon, A mathematical theory of communication, Bell Syst. Techn. J. 27, 379-423, 623-656, 1948. [11] B.D. Sharma and H.C. Gupta, Entropy as an optimal measure, Infor- mation theory (Proc. Internat. CNRS Colloq., Cachan, 1977) (French), 151–159, Colloq. Internat. CNRS, 276, CNRS, Paris, 1978. [12] F. Topsoe, Game-theoretical equilibrium, maximum entropy and min- imum information discrimination, Maximum entropy and Bayesian methods (Paris, 1992), 15–23, Fund. Theories Phys., 53, Kluwer Acad. Publ., 1993. [13] C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J. Statist. Phys. 52, No. 1-2, 479–487, 1988. [14] C. Tsallis, R.S. Mendes, A.R. Plastino, The role of constraints within generalized nonextensive statistics, Physica A 261, 534-554, 1998.