Some Basic Properties of Fix-Free Codes Chunxuan Ye and Raymond W. Yeung

72
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001
Some Basic Properties of Fix-Free Codes

Chunxuan Ye and Raymond W. Yeung, Senior Member, IEEE
Abstracta variable-length code is a fix-free code if no codeword is a prefix or a suffix of any other codeword. This class of codes is applied to speed up the decoding process, for the decoder can decode from both sides of the compressed file simultaneously. In this paper, we study some basic properties of fix-free codes. We prove a sufficient and a necessary condition for the existence of fix-free codes, and we obtain some new upper bounds on the redundancy of optimal fix-free codes. Index TermsFix-free code, prefix code, redundancy.
I. INTRODUCTION IX-FREE codes were first introduced by Schtzenberger [22] and Gilbert and Moore [12] (where they were called never-self-synchronizing codes). Like alphabetic codes ([15], [20], [26]) and -ended codes ([2], [7]), fix-free codes are a special class of prefix codes. At the same time, fix-free codes also belong to suffix codes. In a fix-free code, no codeword is a prefix or a suffix of any other codeword. This kind of codes has several applications ([10], [13], [21], [25]). For example, a file compressed by a fix-free code can be decoded in the forward direction and the reverse direction simultaneously, thus reducing the decoding time in half compared with decoding in one direction only. As another example, suppose we want to search for the occurrence of a string in a given file compressed by a fix-free code. Again, the string can be searched in the compressed file in both directions simultaneously, thus reducing the search time in half. is called A fix-free code with codeword lengths . Previous work on fix-free codes complete if primarily focused on complete fix-free codes. Certain properties of such codes have been studied in [1], [9], [12], [23], and [24]. Furthermore, necessary conditions for the existence of complete fix-free codes were given by Fraenkel and Klein [10]. Schtzenberger [23] and Csari [8] studied algorithms for constructing finite complete fix-free codes, whereas Fraenkel and Klein [10] provided an algorithm to construct such codes with a given set of codeword lengths. A systematic summary on complete fix-free codes can be found in [3, Ch. 3]. In the above references, fix-free codes were referred to as either biprefix codes or affix codes. As mentioned in [10], an optimal fix-free code (a code with minimum average codeword length) for a certain source is not necessarily complete. A simple example illustrates this. The
Manuscript received October 27, 1999; revised April 6, 2000. The material in this paper was presented in part at the IEEE Information Theory Workshop, Metsovo, Greece, June 27July 1, 1999. The authors are with the Department of Information Engineering, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong, China (e-mail: cxye7@ie.cuhk.edu.hk; whyeung@ie.cuhk.edu.hk). Communicated by M. Weinberger, Associate Editor for Source Coding. Publisher Item Identifier S 0018-9448(01)00464-3.
only complete fix-free code for the source is . The average codeword length of this code is . However, the incomplete fix-free code for the same source has a shorter average codeword length equal . In practice, optimal fix-free codes for a lot of sources are to incomplete. Moreover, a complete fix-free code may not even exist for some sources. For instance, no complete fix-free code can be constructed for a source with three characters. Ahlswede et al. [1] have proved some useful properties of fix-free codes. In particular, it was shown that a fix-free code exists if . with codeword lengths However, this sufficient condition is very loose. For example, with and . Then consider the fix-free code . Thus, this sufficient condition cannot even be used to check some very simple cases. In this paper, we take a probabilistic approach to study the existence of fix-free codes. With this approach, we are able to improve the sufficient condition and obtain a necessary condition for the existence of fix-free codes. be the probability distribution of a source, Let and let be a code for the source. The redundancy of a code is defined as the difference between the average codeword of this code and the entropy of the source. In length as the redundancy of a Huffman code this paper, we denote as the redundancy of an optimal fix-free code. and denote . These bounds on , It is well known that which are independent of the source probability distribution, are the tightest possible. However, when partial knowledge about the source probabilities is available, such as the probability of the most likely source symbol, these bounds can be improved. The redundancy of a Huffman code has been discussed extensively in the literature (e.g., [4][6], [11], [16][19], [27]). Unlike prefix codes which can be optimally constructed by the Huffman algorithm, to our knowledge, there is no efficient algorithm for constructing optimal fix-free codes. This makes it very difficult to obtain nontrivial bounds on the redundancy of optimal fix-free codes. . In some Ahlswede et al. [1] have proved that cases, Huffman codes also happen to be optimal fix-free codes. , the code For example, for the source distribution is a Huffman code as well as an optimal fix-free code, and its redundancy is zero. Therefore, the lower bound on is tight. , we will show at the end For the above upper bound on of this paper that it is not tight for sources with a fixed alphabet size. Nevertheless, since this upper bound is a constant, for an independent and identically distributed (i.i.d.) source, the coding rate (per symbol) of an optimal fix-free code for source symbols approaches the entropy rate of the source when tends to infinity.
00189448/01$10.00 2001 IEEE
YE AND YEUNG: SOME BASIC PROPERTIES OF FIX-FREE CODES
73
Being a special kind of prefix codes, optimal fix-free codes have average codeword lengths no less than those of the corresponding Huffman codes. Therefore, lower bounds on the redundancy of Huffman codes ([6], [16], [18]) also apply to optimal fix-free codes. In this paper, we provide some new upper when partial knowledge of the source distribution bounds on is available. Recently, Girod [14] proposed a very simple scheme that allows bidirectional decoding of a variable-length coded bit stream obtained from the bit stream produced by a Huffman code. The coding rate of this scheme always approaches that of a Huffman code when the number of source symbols to be encoded becomes large. However, for an i.i.d. source, depending on the source distribution, the coding rate of such a scheme may be bounded away from the entropy rate of the source even if encoding of multiple symbols is allowed. The rest of this paper is organized as follows. In Section II, we discuss two basic properties of fix-free codes which incur much difficulty in analyzing the structure of such codes. In Section III, we use a probabilistic method to obtain a sufficient condition and a necessary condition for the existence of fix-free codes. Some related results are also obtained. The merit of our probabilistic method is that it is not necessary to consider the structure of fix-free codes explicitly. To demonstrate the generality of our probabilistic method, we use it in Section IV to prove the Kraft inequality for prefix codes. In Section V, we prove some new . Concluding remarks are in Section VI. upper bounds on In this paper, we confine our discussion to binary fix-free codes. All logarithms are in base . Our results can readily be generalized to nonbinary fix-free codes. II. TWO PROPERTIES OF FIX-FREE CODES In this section, we present two basic properties of fix-free codes which are quite different from those of prefix codes. These two properties play a crucial role in determining the methodology we use in Section III of this paper. A set of codewords is said to satisfy the fix-free condition if no codeword in the set is a prefix or a suffix of any other codeword. This set of codewords hence forms a fix-free code. Consider a code with codewords. The lengths of these codewords form , where is the length of the th a vector codeword. We assume without loss of generality that
However, fix-free codes do not have such a property. We now show this by a simple counterexample. Suppose and . Then, by definition, . A fix-free code with codeword lengths can be , while it is impossible to constructed as construct a fix-free code with codeword lengths , which can be seen as follows. If we choose as the first codeword, , the second and the third codewords must be and respectively. No further string with length four can be used without violating the fix-free condition with the existing codewords. The same argument applies if is chosen as the first codeword. Therefore, although a fix-free code with lengths exists, there does not exist a fix-free code with lengths Prefix codes also have the following property. Consider .
and
where both and satisfy the Kraft inequality. Suppose we already have a prefix code with lengths . Then to construct , we only need to add a new a prefix code with lengths to the existing code such that none codeword with length of the existing codewords is a prefix of this codeword. In other words, it is not necessary to change the original set of codewords in order to accommodate the new codeword. However, this property again does not hold for fix-free codes. with lengths Consider the fix-free code . It is impossible to add one more codeword with length four to this code without violating the fix-free condition, that is, it is impossible to construct a fix-free code with by appending a new codeword to the code lengths . But it is actually possible to construct a fix-free code with such lengths. An example of such a code is . Upon observing such irregular characteristics of fix-free codes, we realize that the usual methods for studying prefix codes do not work for fix-free codes. From the foregoing, for two ordered sets and , where and thus
and we use
to denote this ordered set of codeword lengths. and , we write for . If satisfy the Kraft
it is possible that there exists a fix-free code with codeword but there does not exist one with codeword lengths lengths . Therefore, constant cannot be the form of a necessary and sufficient condition for the existence of a fix-free code with a specified set of codeword lengths. Also from the foregoing, for
For two ordered sets of codeword lengths when inequality, then
i.e.,
also satisfy the Kraft inequality. Therefore, the existence implies that of a code and .
of a prefix code with codeword lengths with codeword lengths
74
even if there exist fix-free codes with codeword lengths and , a fix-free code with codeword lengths may not be constructed based on an existing code with codeword lengths . Both of these two rather peculiar characteristics of fix-free codes make it extremely difficult to characterize the existence of such codes by analyzing their structural properties. In the next section, we will take a probabilistic approach which avoids explicit consideration of the structure of fix-free codes. III. A SUFFICIENT AND A NECESSARY CONDITION FOR THE EXISTENCE OF FIX-FREE CODES Define as the positive part of a real number , i.e., if if The following two properties of 1) 2) , for , for ; .
Fig. 2.
Fig. 1. 2
l l
are obvious:
denote a string with length . For two strings and Let , we write if is a prefix of , and we write if is a suffix of . Obviously, either or implies . be a string with length , and let be the Lemma 1: Let which are prefixed or number of strings with length . Then suffixed by if if , then Proof: Let be the cardinality of a set and let be a string with length . Then from the definition of , we have If
l < l < 2l .
bits of must be same as , while each of its bits in possible the middle is either or . Thereby, totally , i.e., strings with length are both prefixed and suffixed by
Thus
Case 3. From Fig. 2, we see that if the first bits of are bits, then there exists one and only one equal to its last . Thus string with length prefixed and suffixed by
where both and are . The quantity equal to stands for the number of strings with length which are both . It depends on the relation between prefixed and suffixed by and . We now discuss it for three cases. Case 1. For this case, is equivalent to . Hence, one and only one string with length , namely, as its prefix and suffix, i.e., has Thus
Otherwise, no string with length ists. Thus
satisfying this condition ex-
Hence, for this case ,
Therefore, Lemma 1 is proved. We now describe a random procedure for constructing a code with codeword lengths . In this procedure, codewords with are chosen successively. Specifically, a codelengths word with length is independently chosen from the set of all possible strings with length . After all the codewords have been chosen, we check if they satisfy the fix-free condition. If so, these codewords form a fix-free code. Obviously, possible codes can be constructed by this a total of as the probability of successfully procedure. Define constructing a fix-free code with lengths . Then is
Case 2. We will adopt the convention that the first bit is on the left when we display a string. From Fig. 1, we see that for any string satisfying , the first and the last
75
equal to the total number of fix-free codes with lengths di. Thus if and only if there exvided by ists at least one fix-free code with lengths lemma, we prove a lower bound on Lemma 2: . In the following .
where if
is the smallest , and if
such that . So . Hence, (4) becomes if if if (5)
(1)
In the above, there is one term in the summation on the right-hand side (RHS) of (4) for the second case with . The summation is empty for the first and the third cases. Since every term on the RHS of (5) is nonnegative, we have if if if Inequality (6) is readily seen to be true from (2) and (3). There. fore, the lemma is true for , Suppose we have chosen codewords . The probability that forms . By the induction hypothesis, a fix-free code is assume that (6)
is the smallest such that . where Proof: A string is called available if it has not been chosen and it does not violate the fix-free condition with any codeword already chosen, otherwise, it is called unavailable. The total number of available and unavailable strings with a specified length is equal to the total number of strings with that length. By definition, the quantity discussed in Lemma 1 actually is the number of unavailable strings with length given that the has been chosen. codeword We prove the lemma by induction on the number of codewords in a code. In the construction of a fix-free code with two can be chosen from the set codewords, the first codeword strings with length . Since the fix-free condition of all is always satisfied by a set consisting of only one codeword, . Let denote the probability that a is successfully constructed fix-free code with lengths given that a codeword with length has already been chosen. Hence
(7) , then is certainly equal to , If because if it is impossible to construct a fix-free code with code, it is also impossible to construct a word lengths . On the other fix-free code with codeword lengths , the RHS of (7) must be zero since it hand, if is nonnegative. Now
The number of available strings with length , where is bounded in Lemma 1. Thus
is equal to
From Lemma 1 and by using the upper bound on , we obtain if if if On the other hand, letting be equal to
for
(2) (3) is equal to zero because it is the product of the RHS of (7) and st term. Therefore, we see that the
in (1), we have
(4)
76
is satisfied because both sides are equal to zero, and the lemma is proved for this case. , consider for the time being If as fixed and assume that they satisfy the fix-free condition. Denote as the number of available strings with length for the fix-free code . In the following, we obtain a lower bound on through the analysis , the number of unavailable strings with length . of By the inclusionexclusion principle, we have
Fig. 3.
l +l l
Fig. 4.
l +l > l
and
l <l
(8) We will examine all the summations on the RHS of (8) in the following. 1) Each term in the first summation stands for the number prefixed or suffixed by , of strings with length . It has already been discussed in Lemma 1, being and in the lemma, respectively. with and 2) Each term in the second summation on the RHS of (8) made unrepresents the number of strings with length available simultaneously by two distinct codewords and , where and . This is developed by
by . They are further discussed under the following three conditions. a) From Fig. 3, we see that for a string prefixed and suffixed by , its first bits and last by bits must be and , respectively. Any of the bits in the middle can be either or . Thus
The same argument applies in the discussion of . Hence
b)
(9)
and From Fig. 4, we see that if the first bits are identical with the last bits of of , there is one and only one possible string with both prefixed by and suffixed by . length Thus
(10) (11) and are distinct and is not a prefix or a Because suffix of , all four terms in (9) are mutually exclusive. Hence, (10) follows. Again, for the same reason, the last two terms in (10) vanish. The two terms in (11) represent which are prefixed the number of strings with length and suffixed by , or prefixed by and suffixed by
Otherwise, no strings with length condition exists. Thus
satisfying that
The same argument applies in the discussion of . Since for this case, each term in (11) is either or , we have
77
c) For this case, the three conditions , , and are equivalent. Conimplies sequently, , and implies . These are contradictions to the fix-free and . Therefore, the quancondition between tity (11) is . 3) Each term in the third summation on the RHS of (8) stands for the number of strings with length made unavailable , and simultaneously by three distinct codewords , , where . By means of arguments similar to what we have used above, it is then clear that
which implies
(13)
By definition, is nonnegative, and hence on both sides of (13), we obtain
. Taking
(14)
Similarly, we conclude that all the summations which are not explicitly displayed on the RHS of (8) vanish. Taking into account all the above analyses as well as the conclusion of Lemma 1, it follows from (8) that
depends on Note that this lower bound on but not on the specific choice of be the probability that forms a fix-free code given that can be defined only if code. Then
, . Let is a fix-free .
From (14), and by using the first property of
, we obtain
(12) where the first three summations on the RHS of (12) follow from the first summation on the RHS of (8), and the last summation on the RHS of (12) follows from the second summation on the RHS of (8). For the first summation on the RHS of (8), we use in the third case of Lemma 1. For the the upper bound on second summation on the RHS of (8), we use the lower bound in case 2b). Upon writing
(15)
Finally, it follows from (7) and (15) that
and
we obtain
Therefore, we have completed the proof of Lemma 2. A lower bound on is obtained in Lemma 2. If this is strictly positive. lower bound is greater than zero, if and only if there exists at least As mentioned, one fix-free code with codeword lengths . Hence, a sufficient condition on the existence of fix-free codes comes into being.
78
Theorem 1 (Sufficient Condition): For a set of codeword lengths , define
where
For
if if where is the smallest such that . If . there exists a fix-free code with codeword lengths
(17)
monoWith both terms on the RHS of (17) being positive, , which means tonically decreases with . If
Example 1: Consider the set of codeword lengths . Putting it into , we have then for . Thus from (16), , com-
. Hence, there exists a fix-free code with lengths pleting the proof. By Theorem 1, a fix-free code with codeword lengths can be constructed. An example of such a code is . The existence of fix-free codes with codeword and can also be verified by lengths Theorem 1. We now derive three sufficient conditions in the form similar to the Kraft inequality for prefix codes. Although these sufficient conditions are looser than Theorem 1, their forms are simpler and they can be more easily checked. Also, they will be useful in the discussion of the upper bounds on the redundancy of optimal fix-free codes in the later part of this paper. be a set of codeword lengths, and let Corollary 1: Let be the smallest such that . If
The following corollary is in fact, the sufficient condition due to Ahlswede et al. [1] Corollary 2: Let be a set of codeword lengths. If
then there exists a fix-free code with lengths . Proof: From the given condition, we have
By the definition of in Corollary 1, it is clear that is no more than . Hence then there exists a fix-free code with lengths . Proof: By Lemma 2 and using the second property of , we have By Corollary 1, there exists a fix-free code with lengths Hence, the corollary is proved. .
If the shortest length of a given set of codeword lengths is , we can further derive a tighter sufficient condition than Corollary 2. Corollary 3: Let then be a set of codeword lengths. If ,
(16)
implies the existence of a fix-free code with lengths
79
Proof 1: Assume that , since otherwise Lemma 2
. Then for is greater than . By
Proof 2: Assume that we have
. Since
or
Define Then
for
. Note that
and
By Corollary 2, there exists a fix-free code with lengths , and we let be such a code, which in fact can be constructed by the procedure described in [1]. Now as define a new code if if (18) On the other hand, from where we use juxtaposition to denote the concatenation of two strings. It is clear that this code has codeword lengths if if Since satisfies the fix-free condition, appending a common bit to the head and the tail of all codewords does not violate the fix-free condition. Also is obviously neither the prefix nor the suffix of any other forms a fix-free code, comcodeword. Hence pleting the proof. Having obtained a lower bound on in Lemma 2, in the following we now prove an upper bound on lemma, which is the counterpart of Lemma 2. Lemma 3: Since for
we have
which implies
we obtain
(20) (19) is the smallest such that . where Proof: We also prove this lemma by induction on the number of codewords in a code. Following arguments similar to those used in the proof of Lemma 2, we have and . From Lemma 1 and by using the , we obtain lower bound on for if if if (21)
, which imIt follows from (18) and (19) that plies the existence of a fix-free code. Therefore, the corollary is proved. Unlike the above proof, we now present an alternative proof which does not rely on Lemma 2. This proof not only shows that a fix-free code exists, but also shows how the code can be constructed.
(22)
80
On the other hand, setting
in (20), we have
(23) is the smallest such that . Following where similar arguments as those used for (4), (23) becomes if if if which is readily seen to be true from (21) and (22). Therefore, . the lemma is true for , have Suppose codewords is the probability that already been chosen, and forms a fix-free code. By the induction hypothesis, assume that and The manipulation of the terms in the above is similar to that in the derivation of (12). Upon writing
we obtain
(24) , must be equal to zero as has If already been discussed in Lemma 2. Hence for this case which implies
is satisfied because of the nonnegativity of the RHS of the above inequality, and the lemma is proved. , consider at this stage If as fixed and assume that they satisfy the fix-free condition. The following is to obtain an upper bound on , the for the fix-free code number of available strings with length ( has the same definition as in Lemma in the proof of 2). Recall the analysis of the quantity of Lemma 2. Here we use the lower bound on in the third case of Lemma 1 for the first summation on the RHS of (8), and use the upper bound in case 2 b) of Lemma 2 for the second summation on the RHS of (8). It then follows from (8) that
It then follows from the above upper bound on
that
(25) Finally, it follows from (24) and (25) that
completing the proof.
81
An upper bound on is obtained in the above lemma. must be zero, If this upper bound is equal to zero, which implies the nonexistence of fix-free codes with lengths . Hence, we have the following theorem. Theorem 2 (Necessary Condition): For a set of codeword lengths , define
It can be inferred from the given condition that if , , then . Hence, the last i.e., summation of (26) vanishes and reduces to
which coincides with under the same condition. Thereby, . As a fix-free it becomes the exact expression for code with lengths theorem is proved. exists if and only if , the
where is the smallest such that . If there does not exist a fix-free code with codeword lengths . Example 2: It is known from the previous section that we cannot construct a fix-free code with codeword lengths . Upon inserting into , we have
IV. KRAFT INEQUALITY We have developed a probabilistic method for studying fix-free codes. This method is very powerful because it does not involve explicit consideration of the structure of such codes. To illustrate the generality of our method, we use it to derive the Kraft inequality for prefix codes. The merit of our derivation is threefold. First, it is not necessary to know the Kraft inequality ahead of time. As we will see, our method naturally leads to the Kraft inequality. Second, both the necessity and the sufficiency of the Kraft inequality are proven on the same footing. Third, no explicit construction of a prefix code is involved, and hence our method can be generalized to handle more complicated code structures such as the fix-free code, the -ended code ([2], [7]), and the alphabetic code ([20], [26]). A set of codewords is said to satisfy the prefix condition if no codeword in the set is a prefix of any other codeword. The same random procedure as described in the previous section is used to construct a prefix code with codeword lengths . be the probability of successfully constructing a Let prefix code with codeword lengths , and and only if there exists at least one prefix code. if
By Theorem 2, no fix-free code with codeword lengths exists. The nonexistence of fix-free codes with and can also be codeword lengths verified by this theorem. We have obtained a lower bound and an upper bound on in Lemma 2 and Lemma 3, respectively. Under a certain condition, the upper bound coincides with the lower bound. The necessary and sufficient conditions for the existence of fix-free codes under this condition is then determined. Theorem 3: For a set of codeword lengths , if for , either or , then there exists a fix-free if and only if code with lengths
Kraft Inequality: There exists a binary prefix code with if and only if codeword lengths
Proof: Without loss of generality, assume We first prove the following proposition: (27) where is the smallest such that as . The proposition is proved by induction on the number of codewords in a code. To construct a prefix code with two codefrom the set of words, we can choose the first codeword strings with length . Hence, . Denote all as the probability of a code with lengths being a prefix code given that a codeword with length is chosen. Since for every chosen codeword , the number of unavailable strings with length is given by Proof: We expand
(26)
82
we obtain
displayed on the RHS of (29) vanish. Hence, it follows from (29) that
Therefore, which implies On the other hand, letting be equal to in (27), we have for is always nonnegative. Note that the value of depends on , but not on the specific choice of . Let be the probability of being a prefix code given that is a prefix code. Then
which is equal to since . Hence for , the proposition is ture. , Suppose we have chosen codewords . The probability that forms a prefix . By the induction hypothesis, assume that code is (28)
(30) If On the other hand , then is obviously equal to . is defined only if We note that . Finally, it follows from (28) and (30) that
Hence for this case Therefore, we have proved the proposition in (27). is equivalent to Now (31) Since for
and the proposition is proved. . At this moWe now consider the case of as fixed and assume that they ment, regard . satisfy the prefix condition. Note that the number of available strings with length Denote by for the prefix code which we will determine in the following. Now the number of unavailable is given by strings with length
the inequality in (31) holds if and only if the th term in the product is strictly positive, i.e., (32) , both and are multiples Since . Therefore, the difference between and of is at least , which in turn is a multiple of because . Thus, adding to the left side of (32) makes the inequality nonstrict, i.e., (33) Therefore, (32) implies (33). Conversely, it is obvious that (33) implies (32). Thus, we see that (32) and (33) are equivalent. is equivalent to . As Therefore, is equivalent to the existence of a mentioned, is the necessary and sufprefix code. Hence, ficient condition for the existence of a prefix code, completing the proof of the Kraft inequality.
(29) First, each term in the first summation on the RHS of (29) repprefixed by resents the number of strings with length . So it is equal to , and the summation is
Second, each term in the second summation on the RHS of prefixed (29) represents the number of strings with length and , where simultaneously by two distinct codewords and . Since is not a prefix of , and cannot both be prefices of . Thus, this summation vanishes. Similarly, all the summations which are not explicitly
83
V. NEW UPPER BOUNDS ON THE REDUNDANCY OF OPTIMAL FIX-FREE CODES Considering the difficulty in constructing optimal fix-free by codes, in this section we obtain upper bounds on constructing fix-free codes which are not necessarily optimal. . We will Without loss of generality, assume that in terms of prove new upper bounds on 1) , the probability of any given symbol; 2) , the probability of the most likely source symbol; , the probability of the least likely source symbol. 3) Let
Hence Fig. 5 shows this upper bound on as a function of . This upper bound is always less than two. Next, we prove a new upper bound on in terms of , the probability of the most likely symbol. Theorem 5:
be the binary entropy function. Theorem 4: Let symbol, then be the probability of any given source Proof: Define if if Proof: Let for some . Define if if (34) It is obvious that for . Since
We obtain the above definition of by using the Lagrange multipliers in order to minimize the upper bound on . The details are are omitted here. Since both and less than , all the lengths defined above are greater than (so that they are valid lengths for codewords). From (34), we have by Corollary 3, there exists a fix-free code with the lengths defined above. Now
By Corollary 2, there exists a fix-free code with these codeword lengths. Now Hence
On the other hand, by letting
in Theorem 4, we have
Therefore,
This upper bound on
as a function of
is shown in Fig. 6.
Let us revisit Theorem 4 after obtaining the upper bound on in terms of . If the probability of a certain symbol is
84
Fig. 5. Upper bound on R as a function of q given in Theorem 4.
Fig. 6. Upper bound on R as a function of p .
85
Fig. 7. Upper bound on R as a function of q .
known and is no less than , then must be equal to . Thus the upper bound in Theorem 4 can readily be enhanced via an application of Theorem 5, which is given in the following theorem. Theorem 6: Let symbol, then be the probability of any given source
Theorem 7:
Proof: Define if if if and if (35)
This upper bound on as a function of is shown in Fig. 7. can tend to only if the probFrom this figure, we find that ability of every source symbol tends to zero. For sources with a fixed alphabet size, this is impossible. Precisely, we have the following corollary. Corollary 4: For any fixed , where is the size of the source alphabet, it is impossible to construct a sequence of source distends to . tributions for which in Theorem 6 when We can calculate the upper bound on the probability of any source symbol is given. Furthermore, this upper bound can be enhanced considerably upon knowing that the given probability is that of the least likely source symbol. This is shown in the next theorem.
(36) where is a positive quantity no more than . Since
we have
Hence,
. On the other hand
86
Fig. 8. Upper bound on R as a function of p .
Hence, all lengths defined in (36) are at least (so that they are valid lengths for codewords). Now it follows from (35) and (36) that
The smaller the , the smaller the RHS of (37). When tends to zero, the value of the RHS of (37) is smallest. Hence, we have
By Corollary 1, there exists a fix-free code with codeword lengths . Now
This upper bound on as a function of is shown in Fig. 8. It is considerably tighter than the one given in Theorem 6 with . VI. CONCLUSION In this paper, we obtain a sufficient and a necessary condition for the existence of fix-free codes by means of a probabilistic method. Based on the sufficient condition, some new upper bounds on , the redundancy of an optimal fix-free code, in terms of partial information about the source distribution are derived. The results we obtain in this paper are stronger than all the previously known results along the same line. In particular, previously obtained by it is shown that the upper bound on Ahlswede et al. [1] cannot be tight for sources with any fixed alphabet size, although this is the best known upper bound on independent of the source probability distribution.
(37)
87
Ahlswede et al. [1] also conjectured that is a sufficient condition for the existence of binary fix-free codes. Recently, harada and Kobayashi [28] have generalized this conjecture to nonbinary cases.We cannot prove this conjecture. But with the help of the computer, we find that the conjecture is true , where is the maxfor the binary case at least for imum codeword length. This conjecture warrants further invescan be tigations because if it is true, the upper bound on , which is approximately . improved from to
[9] [10] [11] [12] [13] [14] [15]
ACKNOWLEDGMENT The authors wish to thank the two anonymous reviewers for their detailed comments on the manuscript. C. Ye would like to thank Dr. S. Xia for the useful discussion.
[16] [17] [18] [19]
REFERENCES
[20] [1] R. Ahlswede, B. Balkenhol, and L. Khachatrian, Some properties of fix-free codes, Sonderforschungsbereich 343, Universitt Bielefeld, submitted for publication. [2] T. Berger and R. W. Yeung, Optimum 1-ended binary prefix codes, IEEE Trans. Inform. Theory, vol. 36, pp. 14351441, Nov. 1990. [3] J. Berstel and D. Perrin, Theory of Codes. Orlando, FL: Academic, 1985. [4] R. M. Capocelli, R. Giancarlo, and I. J. Taneja, Bounds on the redundancy of Huffman codes, IEEE Trans. Inform. Theory, vol. IT-32, pp. 854857, Nov. 1986. [5] R. M. Capocelli and A. De Santis, Tight upper bounds on the redundancy of Huffman codes, IEEE Trans. Inform. Theory, vol. 35, pp. 10841091, Sept. 1989. [6] , New bounds on the redundancy of Huffman codes, IEEE Trans. Inform. Theory, vol. 37, pp. 10951104, July 1991. [7] R. M. Capocelli, A. De Santis, and G. Persiano, Binary perfix codes ending in a 1, IEEE Trans. Inform. Theory, vol. 40, pp. 12961302, July 1994. [8] Y. Csari, Sur un algorithme donnant les codes biprfixes finis, Math. Syst. Theory, vol. 6, pp. 221225, 1972. [21] [22] [23] [24] [25] [26] [27] [28]
, Proprits combinatoires des codes biprfixes, in Thorie des Codes, D. Perrin, Ed. Paris, France: LITP, 1979, pp. 2246. A. S. Fraenkel and S. T. Klein, Bidirectional Huffman coding, Computer J., vol. 33, pp. 296307, 1990. R. G. Gallager, Variations on a theme by Huffman, IEEE Trans. Inform. Theory, vol. IT-24, pp. 668674, Nov. 1978. E. N. Gilbert and E. F. Moore, Variable-length binary encodings, Bell Syst. Tech. J., vol. 38, pp. 933968, July 1959. D. Gillman and R. L. Rivest, Complete variable-length fix-free-codes, Des., Codes Cryptogr., vol. 5, pp. 109114, 1995. B. Girod, Bidirectionally decodable streams of prefix code-words, IEEE Commun. Lett., vol. 3, pp. 245247, Aug. 1999. Y. Horibe, An improved bound for weight-balanced tree, Inform. Contr., vol. 34, pp. 148151, 1977. O. Johnsen, On the redundancy of binary Huffman codes, IEEE Trans. Inform. Theory, vol. IT-26, pp. 220222, Mar. 1980. D. Manstetten, Tight bounds on the redundancy of Huffman codes, IEEE Trans. Inform. Theory, vol. 38, pp. 144151, Jan. 1992. B. L. Montgomery and J. Abrahams, On the redundancy of optimal binary prefix-condition codes for finite and infinite sources, IEEE Trans. Inform. Theory, vol. IT-33, pp. 156160, Jan. 1987. B. L. Montgomery and B. V. K. V. Kumar, On the average codeword length of optimal binary codes for extended sources, IEEE Trans. Inform. Theory, vol. IT-33, pp. 293296, Mar. 1987. N. Nakatsu, Bounds on the redundancy of binary alphabetical codes, IEEE Trans. Inform. Theory, vol. 37, pp. 12251229, July 1991. J. L. Peterson, Computer programs for detecting and correcting spelling errors, Commun. ACM, vol. 23, pp. 676687, 1980. M. P. Schtzenberger, On an application of semigroup methods to some problems in coding, IRE. Trans. Inform. Theory, vol. IT-2, pp. 4760, 1956. , On a special class of recurrent events, Ann. Math. Statist., vol. 32, pp. 12011213, 1961. , On a family of submonoids, Publ. Math. Inst. Hunger. Acad. Sci. Ser. A, vol. VI, pp. 381391, 1961. Y. Takishima, M. Wada, and H. Murakami, Reversible variable length codes, IEEE Trans. Commun., vol. 43, pp. 158162, 1995. R. W. Yeung, Alphabetic codes revisited, IEEE Trans. Inform. Theory, vol. 37, pp. 564572, May 1991. , Local redundancy and progressive bounds on the redundancy of a Huffman code, IEEE Trans. Inform. Theory, vol. 37, pp. 687691, May 1991. K. Karada and K. Kobayashi, Note on the fix-free code property, IEICE Trans. Fund. Electron., Commun. Comput. Sci., vol. EB2A, no. 10, pp. 21212128, Oct. 1999.

Some Basic Properties of Fix-Free Codes Chunxuan Ye and Raymond W. Yeung

Diunggah oleh

Informasi Dokumen

Judul Asli

Hak Cipta

Format Tersedia

Bagikan dokumen Ini

Bagikan atau Tanam Dokumen

Opsi Berbagi

Apakah menurut Anda dokumen ini bermanfaat?

Apakah konten ini tidak pantas?

Hak Cipta:

Format Tersedia

Some Basic Properties of Fix-Free Codes Chunxuan Ye and Raymond W. Yeung

Diunggah oleh

Hak Cipta:

Format Tersedia

72

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 47, NO. 1, JANUARY 2001