Non binary error correcting codes with noiseless feedback, localized errors, or both

Non–binary error correcting codes with noiseless feedback, localized errors, or both R. Ahlswede, C. Deppe∗, and V. Lebedev† 1 Introduction A famou...
9 downloads 0 Views 213KB Size
Non–binary error correcting codes with noiseless feedback, localized errors, or both R. Ahlswede, C. Deppe∗, and V. Lebedev†

1

Introduction

A famous problem in Coding Theory consists in finding good bounds for the maximal size M(n, t, q) of a t-error correcting code over a q-ary alphabet Q = {0, 1, . . . , q − 1} with block length n. This code concept is suited for communication over a q-ary channel with input alphabet X = Q and output alphabet Y = Q, where a word of length n sent by the encoder is changed by the channel in at most t letters. Here neither the encoder nor the decoder knows in advance where the errors, that is changes of letters, occur. Suppose now that having sent letters x1 , . . . , xj−1 ∈ X the encoder knows the letters y1 , . . . , yj−1 ∈ Y received before he sends the next letter xj (j = 1, 2, . . . , n). We then have the presence of a noiseless feedback channel. For q = 2 this model was considered by Berlekamp [10], who derived striking results for triples of performance (M, n, t)f , that is, the number of messages M, block length n and the number of errors t. It is convenient to use the notation of relative error τ = t/n and rate R = n−1 log M. We investigate here the q-ary case. Again the Hamming bound Hq (τ ) for Cqf (τ ), the supremum of the rates achievable for τ and all large n, is a central concept:

Hq (τ ) =

(

1 − hq (τ ) − τ logq (q − 1) 0

if 0 ≤ τ ≤ q−1 q if q−1 < τ ≤ 1, q

(1)

∗ Supported by the DFG in the project “Allgemeine Theorie des Informationstranfer und Kombinatorik” † Supported in part by the Russian Foundation for Basic Research, project no 06-01-00226.

1

where hq (τ ) = −τ logq (τ ) − (1 − τ ) logq (1 − τ ). We also call Cqf : [0, 1] → R+ the capacity error function (or curve). One readily verifies that for every q 1 (2) Cqf (τ ) = 0 for τ ≥ . 2 We turn now to another model. Suppose that the encoder, who wants to encode message i ∈ M = {1, 2, . . . , M}, knows the t-element set E ⊂ [n] = {1, . . . , n} of positions, in which only errors may occur.  He then can [n] make the codeword presenting i dependent on E ∈ Et = t , the family of t-element subsets of [n]. We call them “a priori error pattern”. A family {ui (E) : 1 ≤ i ≤ M, E ∈ Et } of q-ary vectors with n components is an (M, n, t, q)l code (for localized errors), if for all E, E ′ ∈ Et and all q-ary vectors e ∈ V (E) = {e = (e1 , . . . , en ) : ej = 0 for j 6∈ E} and e′ ∈ V (E ′ ) ui(E) ⊕ e 6= ui′ (E ′ ) ⊕ e′ for i 6= i′ , where ⊕ is the addition modulo q. We denote now the capacity error function by Cql . It was determined in [8] for the binary case to equal H2 (τ ). For general q the best known result is Theorem ABP [6] (i) Cql (τ ) ≤ Hq (τ ), for 0 ≤ τ ≤ 21 . (ii) Cql (τ ) = Hq (τ ), for 0 ≤ τ
1 and takes values in Xj = X = Q, and defined on Y j−1 = 1

y1 , y2 , . . . , yi−1 are received elements of Y (known to the sender before he sends fij (y1 , . . . , yi−1 ); fi1 is an element of X1 . It is assumed that the number of wrongly transmitted letters in n steps does not exceed t and that the receiver has a decoding system {Di : i ∈ M}, Di ⊂ Y n , Di ∩ Di′ = ∅ for i 6= i′ such that upon receiving y n = (y1 , . . . , yn ) he can correctly decide (or decode), which message was sent. Our goal is to determine the capacity error function Cqf for every q. Bassalygo conjectured q−1 the following. Let T aq be a tangent to Hq with T aq ( 2q−1 ) = 0 and aq the

3

argument with Hq (aq ) = T aq (aq ), then   if 0 ≤ τ ≤ aq Hq (τ ) q−1 f Cq (τ ) = T aq (τ ) if aq ≤ τ ≤ 2q−1   0 otherwise.

(3)

One reason for this conjecture was a result of [13], which implies that Cqf (

q−1 ) ≥ 0. 2q − 1

We will show that Bassalygo’s conjecture is not true. We begin with two estimates from above.

2.1

Upper bounds on Cqf

First notice that for t ≥ n2 (or τ ≥ 21 ) not even two messages can be transmitted correctly, because for their encoding functions, say f n = (f1 , . . . , fn ) and g n = (g1 , . . . , gn ), at any component j and any output string y j−1 = y1 . . . yj−1 an error for one of the messages can cause fj (y1 , . . . , yj−1) = gj (y1 , . . . , yj−1). Since 2t ≥ n, there are enough errors to produce this identity for all j = 1, 2, . . . , n and two messages cannot be decoded correctly: 1 Cqf (τ ) = 0 for τ ≥ . 2 Next we derive the Hamming upper bound. Lemma 1 (i) For every (M, n, t, q)f code holds t   X n (q − 1)j ≤ q n . M j j=0

(ii) For 0 ≤ τ ≤ 1 Cqf (τ ) ≤ Hq (τ ). 4

(4)

Proof: We count the number of output sequences. Let i ∈ M and y n = (y1 , . . . , yn ) be the output sequence with y1 = fi1 ⊕ e1 and yj = fij (y1 , y2 , . . . , yj−1) ⊕ ej for j = 2, 3, . . . , n determined by the encoding function fin and the q-ary additive noise en = (e1 , . . . , en ) ∈ Qn occurring in the transmission and so can be regarded as their function φ(fin , en ). For the family of encoding functions FM = {fin : i ∈ M} and a set V ⊂ Qn of error patterns we write Φ(FM , V) = {y n : there exist i ∈ M and en ∈ V such that y n = φ(fin , en )}. If at most t errors occur, we have [ V (E) = {en = (e1 , . . . , en ) : ej = 0 for at least n − t components}. V= E∈Et

Then we also have φ(fin , en ) 6= φ(fin′ , e′n ) for (i, en ) 6= (i′ , e′n ).

(5)

Indeed this is the case if i 6= i′ (because the decoder must be able to distinguish the messages) and if i = i′ and en 6= e′n (because the jth symbols of φ(fin , en ) and φ(fin , e′n ) are different if j is the first position where en and e′n are different). Therefore t   X n (q − 1)j ≤ q n (6) |Φ(FM , V)| = M|V| = M j j=0 and asymptotically we get

Cqf (τ ) ≤ Hq (τ ), 0 ≤ τ ≤ 1. For the range of Aigner.

1 q

≤τ ≤

1 2

(7)

we derive a second basic upper bound using a result

Theorem A[7] For every q ≥ 2 and t ≤ 12 n, if there exists an (M, n, t, q)f code, then there exists an (M, n − 2m, t − m, q)f code for every 0 ≤ m ≤ t. Substituting in (6) the parameters M, n, t by M, n − 2m, and t − m we get  t−m  X n − 2m (q − 1)j ≤ q n−2m . (8) M j j=0 Consequently M · τ

n−2m t−m

 (q − 1)t−m ≤ q n−2m and, asymptotically, for 0 ≤ µ ≤   τ −µ f Cq (τ ) ≤ Hq (1 − 2µ). (9) 1 − 2µ 5

Whereas Berlekamp [10] showed in the case q = 2 that the tangent at the curve H2 (τ ) running through the point k1 , 0 is an upper bound for C2l (τ ) if k = 3 and a lower bound for C2l (τ ) ifk ≥ 3,we show here first for q > 2 that 1 1 running through the point the tangent at Hq (τ ) in the point q , Hq q  1 , 0 gives an upper bound on Cqf (τ ) for 1q ≤ τ ≤ 21 . 2 (In fact this is part of our basic result that in this interval the tangent describes the capacity curve.) One readily verifies that τ d Hq (τ ) = logq . dτ (1 − τ )(q − 1)

(10)

So the tangent at the point with abscissa a has the equation   a T (τ ) = τ logq + R0 , (1 − a)(q − 1) where R0 = T (0). Going through (a, Hq (a)) implies that a R0 = Hq (a) − a logq (1 − a)(q − 1)  a  qa (1 − a)1−a aa = logq − log = logq (q(1 − a)) q (q − 1)a (q − 1)a (1 − a)a and therefore T (τ ) = τ logq Finally, T

1 2



and therefore



a (1 − a)(q − 1)



+ logq (q(1 − a)).

(11)

= 0 implies logq



a q 2 (1 − a)2 (1 − a)(q − 1)



=0

1 a q 2 (1 − a) = q − 1 and a = . q

(12)

(The other root is q−1 .) q The form of our tangent is T (τ ) = (1 − 2τ ) logq (q − 1). We are prepared to state 6

(13)

Lemma 2 For

1 q

≤τ ≤

1 2

Cqf (τ ) ≤ (1 − 2τ ) logq (q − 1). Proof: By (9) it suffices to show that the equation   τ −µ Hq (1 − 2µ) = (1 − 2τ ) logq (q − 1) 1 − 2µ

(14)

has a solution in µ for 0 ≤ µ ≤ τ . This can be written in the form (1 − 2µ)[1 +

1−µ−τ τ −µ 1−µ−τ τ −µ logq + logq 1 − 2µ 1 − 2µ 1 − 2µ 1 − 2µ −

τ −µ logq (q − 1)] 1 − 2µ

= (1 − 2τ ) logq (q − 1) or in the form (1 − 2µ) logq

1−µ−τ q + (1 − µ − τ ) logq + (τ − µ) logq (τ − µ) = 0. 1 − 2µ q−1

Here the first coefficient equals the sum of the two others. We equate the arguments in the second and the third log: 1 − µ − τ = (q − 1)(τ − µ) (15) (q − 2)µ = qτ − 1 (16) qτ − 1 µ = (17) q−2  −1 Then the first log has the argument 1−2µ = τ −µ, and since by (15) 1−2µ q q the desired equation follows, because −(1 − 2µ) + (1 − µ − τ ) + (τ − µ) = 0.  Remark 1 Another way to find µ is to maximize bin(m) = q 2m−n n−2m (q − t−m t−m 1) by comparing bin(m) and bin(m + 1) like it is done in the binary case in [10].

7

2.2

Lower bound derived by the new rubber method

In this section we will give a strategy which achieves the upper bound in Lemma 2 for relative errors 1q ≤ τ ≤ 12 . We show that we can transmit (q − 1)n−2t messages in block length n. A bijection b of messages M to the set {1, 2, . . . , q − 1}n−2t of used sequences is agreed upon by the sender and the receiver. Given message i ∈ M the sender chooses b(i) = xn−2t = (x1 , x2 , . . . , xn−2t ) ∈ {1, 2, . . . , q −1}n−2t as a skeleton for encoding, which finally will be known to the receiver. The “0” is used for error correction only. For all positions i ≤ n not needed dummy xi = 1 are defined to fill the block length n. Transmission algorithm: The sender sends x1 , continues with x2 and so on until the first error occurs, say in position p with xp sent. The error can here be of two kinds: a standard error (that means symbol xp is changed to another symbol yp ∈ {1, 2, . . . , q − 1}) and a towards zero error (that means xp is changed to yp = 0). If a standard error occurs, the sender transmits, with smallest l possible, 2l + 1 times 0 (where l ∈ N ∪ 0) until the decoder received l + 1 zeros (known to the sender via feedback. Such an l exists because the number of errors is bounded by t). Then he transmits at the next step xp , again, and continues the algorithm. If a towards zero error occurs, the sender decreases p by one (if it is bigger than 1) and continues (transmits at the next step xp ). Decoding algorithm: The decoding is very simple. The receiver just regards the “0” as a kind of deletion symbol - he erases it by a rubber, who in addition erases the previous symbol. This is the reason, why the sender has to repeat sending the symbol according to the skeleton, if a towards zero error occurs. At the end the first n − 2t symbols at the decoder are those of b(i) = (x1 , x2 , . . . , xn−2t ). Indeed, suppose that t0 towards zero errors occur. They are taken care of with loss in block length 2t0 . So we are left with t1 = t − t0 possible errors and block length n − 2t0 and only standard errors as well as a third kind of correction errors resulting from a change of a zero to a non-zero. The standard errors s1 , . . . , sr cause correction errors l1 , . . . , lrP resp. and loss in block lengths 2(l1 + P 1), . . . , 2(lr + 1) and thus a total of ri=1 (1 + li ) ≤ t1 errors and a total of ri=1 2(li + 1) ≤ 2t block length. Hence a block length n − 2(t0 + t1 ) = n − 2t to transmit with our strategy M = (q − 1)n−2t messages. log M Thus nq = (1 − 2t ) logq (q − 1) and we have derived the main result of this n section. 8

Theorem 1 For τ =

t n

and 0 < τ
1 there exist q − 1 words in the set with the first j − 1 symbols pairwise equal to the first j − 1 symbols of xn and each element of Q is contained in one of the j-th positions of these words or xn . Remark 2 The case n < q + 1 does in contrast to the case n ≥ q + 1 not always reach the Hamming bound. The first example is n = 3 and q = 6. It holds   63 Mf l (3, 6, 1) = 12 < = 13. 3·5+1 We hope that the exact result can be found in the near future. Remark 3 Generalizing Pelc’s result (28) to general q Aigner [7] and Malinowski [17] proved: ( q n−2 Mf (n, 1, q) = n ⌋ ⌊ q −r(n−1)(q−1) (q−1)n+1 where r = ⌊

if n ≤ q + 1 , if n ≥ q + 1

qn ⌋ (q − 1)n + 1

13

mod q.

n

q ⌋ mod q = 0. Thus it holds Mf (n, q, 1) = Mf l (n, q, 1) if ⌊ (q−1)n+1

4 4.1

The -model with feedback List codes for the standard model of error–correcting codes

It was demonstrated in [2] that in probabilistic channel coding theory list codes are much more adequate than ordinary codes in so far as they make it possible to determine capacities for a large class of channels, where they are unknown for ordinary codes. We show that this is also the case for combinatorial channel coding theory. In fact this is readily verified already for the standard model of t-error correcting codes. For a constant L define Cq (τ, L) as the supremal rate achievable for all large n with list codes of list size L and block length n correcting t = τ n errors. Theorem 6 For 0 ≤ τ
0 has a c– v∈V

v∈V

balanced covering C = {E1 , . . . , Ek } if (a) k ≥ |E| · d−1 min · (logq |V| + 1) + 1, (b) c ≤ k ≤ c · |E| · d−1 max , )·k+logq |V| −D(λ|| dmax |E| < 21 for λ = kc , (c) 2 where D(P ||Q) denotes the Kullback/Leibler distance or relative entropy. We now focus our interest on balanced packings. Recall that a packing of a hypergraph H = (V, E) is a subset of edges, such that every vertex is 14

contained in at most one edge. Accordingly, a c–balanced packing is a subset of edges in H, such that every vertex is contained in at most c edges. As every code {(ui, Di ) : i = 1, . . . , N, ui ∈ Qn , Di ⊂ Y n , Di ∩ Dj = ∅ for i 6= j} yields the packing {Di : i = 1, . . . , N} of X n , every list code with N P 1Di (yn ) ≤ c for all y n ∈ Y n corresponds to a c–balanced packing. We i=1

make use of the following result.

Packing Lemma [3] A hypergraph H = (V, E) has a c–balanced packing with k edges if (b) and (c) of the Balanced Covering Lemma hold. Generally speaking, coverings are easier to handle than packings, because overlap is allowed. On the other hand, c–balanced packings are easier to handle than c–balanced coverings, since it is not required that V is covered. This has the effect that condition (a) in the Balanced Covering Lemma can be dropped ((a) just guarantees the existence of a covering), (b) and (c) are proven using the old arguments. Observe that in typical applications in Information Theory |V| depends exponentially on the block length n and thus c has to grow with the block length in the Balanced Covering Lemma. Since for c–balanced packings condition (a) is no longer required, constant c’s are not automatically excluded. That is here the case. The theorem follows from Proposition 1 For 0 ≤ τ < m l q size L = Hq −R + 1.

q−1 q

the rate R < Hq (τ ) is achievable for list

Proof: Consider the hypergraph (V, E) = (Qn , (Bxn (n, τ n))xn ∈Qn ), where Bxn (n, τ n) is the ball in Qn with radius τ n and center xn . Write B(n, τ n) for the ball with center 0 = (0, 0, . . . , 0). Here dmin = dmax = |B(n, τ n)| = d, |C| = q n , |E| = q n , and logq |B(n, τ n)| → hq (τ ) − τ logq (q − 1) as n → ∞. n By the assumptions on R and L = c condition (b) obviously holds for k = 2Rn and n large. To verify (c) we derive an upper bound on the exponent there. d ) · k + log2 |V| < −1. We have to show that −D( Lk || |E| Evaluation of the relative entropy yields          d L L L d L log2 + 1− log2 1 − − log2 1 − ·k − log2 k |E| k k |E| k 15



L + log2 |V| ≤ −L log2 L + L log2 k − k 1 − k



  L d log2 1 − + L logq k |E|

+ log2 |V|, because we have omitted the negative term k 1 −

L k



 log2 1 −

d |E|



and use

d d d ≤ logq |E| , as |E| ≤ 1. that log2 |E| d Rn → n(Hq (τ ) − 1) = −n(1 − hq (τ ) + Using now k = 2 and that logq |E| τ logq (q − 1)) = −n(hq (τ ) − τ logq (q − 1) + 2τ logq (q − 1)) = −nHq (τ ) − 2q 2t logq (q − 1) ≤ −nHq (τ ) as n → ∞, for a δ > 0 so small that L ≥ Hq (τlog)−R−δ d and n > n0 (δ) so that logq |E| ≤ −n(Hq (τ ) − δ), we continue with

≤ LnR − 2nR (1 − L2−nR ) log2 (1 − L2−nR ) −nL(Hq (τ ) − δ) + n log2 q − L log2 L.

(33)

d [−(1 − z) log 2 (1 − z)] = Since −(1 − z) log2 (1 − z) ≤ 2z for small z (because dz 1 + log2 (1 − z) and therefore this function has gradient 1 at z = 0), we upper bound the expression in (33), using z = L · 2−nR , by

LnR + 2L − nL(Hq (τ ) − δ) + n log2 q − L log2 L   log2 q . = 2L − L log2 L − nL Hq (τ ) − δ − R − L

It suffices now to guarantee that the term in square brackets is positive or, equivalently, that log2 q L≥ , Hq (τ ) − R − δ which is the case by the choice of δ.



Now with m0 ≥ L we can always encode the true message on the list known to the decoder and known to the encoder, because of the feedback, and send it to the decoder. Thus we have the Corollary 1 Cqf,(τ ) ≥ Hq (τ ) for all 0 ≤ τ