Week 5 The dual code. Syndrome decoding

Version 2023-11-01. To PDF of this week only Open the whole set of notes in PDF

Synopsis. Every linear code C has a dual code, C, and check matrices. While a generator matrix G is used to encode messages into codevectors, a check matrix H serves to detect errors — and to correct them using syndrome decoding.

Motivation. The inner product of vectors

Let C be a linear code. Given a received vector y, how to test whether yC? Storing all codevectors of C is not an option for codes of large length and dimension, whose use is dictated by modern applications to low-noise channels. Storing just a generator matrix G of C is better in terms of storage space, but testing whether y is in the row space of G can be computationally demanding.

Some codes, however, are defined by a single checksum — recall the even weight code and the ISBN-10 code. A checksum of a given vector is easy to compute.

Extending the checksum approach, we introduce a check matrix which generates the dual code. It turns out that this construction helps to correct errors as well (not just detect). The first notion we need is:

Definition: inner product

For u,vFqn, the scalar (element of Fq) defined as uv=i=1nuivi is called the inner product of the vectors u and v.

Example: some inner products in F23

For 111,101F23, one has

111111=12+12+12=1, 111101=11+10+11=0, 101101=12+02+12=0.

If CFqn is a set, vFqn, we may write vC to denote the set {vccC}.

Properties of the inner product

(1) Expression as a matrix product: uv=uvT.

Explanation: we write elements of Fqn as row vectors. Thus, u is a row vector (u1,,un), and vT is the transpose of v, so a column vector (v1v2vn). Multiplying u, an 1×n matrix, and vT, an n×1 matrix, we obtain a 1×1 matrix, which we identify with a scalar in Fq.

(2) Symmetry: uv=vu. (Explanation: this is easily seen from the definition.)

(3) Bilinearity: for a scalar λFq we have (u+λw)v=uv+λ(wv) and u(v+λw)=uv+λ(uw). (Explanation: from linear algebra, the matrix product in uvT is bilinear.)

(4) Non-degeneracy: uFqn={0}, if and only if u=0.

Explanation: let ϵi=(0,,0,1,0,,0) be the vector with ith symbol 1 and all other symbols 0. Then uϵi=ui. So if uFqn={0}, then in particular uϵi=0 hence ui=0, for all i, meaning that u is the zero vector. And if u=0, then uc=0 for all cFqn.

The dual code

Definition: dual code

Given a code CFqn, we define the dual code C as

C={vFqnvC={0}}.

We can say that C consists of all vectors orthogonal to the code C (where v orthogonal to C means vC={0}).

  • Exercise. Using bilinearity of the inner product, show that C is a linear code.

Recall that Rep(n,F2)={000,111}F2n is the binary repetition code of length n. We now work out the code Rep(n,F2) using the definition.

By definition, Rep(n,F2)={vF2nv000=0, v111=0}. The first condition, v0=0 is vacuous (holds for all vectors vF2n). The second condition, v111, means v1+v2++vn=0 in F2, i.e., vEn, the binary even weight code of length n. Thus:

Example: the dual code of the binary repetition code

Rep(n,F2)=En.

Check matrices

Definition: check matrix

A check matrix for a linear code C means a generator matrix for C.

One sometimes says parity check matrix (the term arose from applications of binary codes).

Theorem 5.1: properties of the dual code and a check matrix

If CFqn is a linear code of dimension k, then:

  • i. dimC=nk;

  • ii. C={vFqn:vHT=0} for any check matrix H of C.

  • Proof. We recall the Rank-Nullity Theorem from Linear Algebra: if M is a matrix with n columns, then

    rank(M)+dimNullspace(M)=n,

    where rank(M) is the dimension of the span of the rows of M, and Nullspace(M) can be written as {vFqn:MvT=0}.

    i. Consider the matrix [C] made up of all codevectors of C used as rows. The Nullspace([C]) is the set {v:[C]vT=0}. Note that the column vector [C]vT is [c1vTc2vT]=[c1vc2v], which is zero if and only if the inner product cv is 0 for all rows c of [C], i.e., for all codevectors c of C. By definition of the dual code, this happens exactly when vC, so Nullspace([C])=C. By rank-nullity, dimC=nrank([C]). Since the rows of [C] span C, one has rank([C])=dimC=k and so dimC=nk.

    ii. By definition H generates the code C; so by i., H has nk rows, H=[r1rnk]. Thus, rank(H)=dimC=nk, and so by rank-nullity, dimNullspace(H)=n(nk)=k. Note that CNullspace(H): indeed, if cC, then ricT=ric=0 for all i because riC, which means that HcT=0. Since dimC=dimNullspace(H), it follows that C=Nullspace(H), which is {v:HvT=0}.

    The law (AB)T=BTAT for the product of matrices implies that (vHT)T=HvT, and so HvT is zero iff vHT is. Thus, C={vFqn:vHT=0} as claimed.

The syndrome of a vector

Definition: syndrome

Let H be a check matrix for a linear code CFqn. Let yFqn. The vector

S(y)=yHT

is called the syndrome of y. The linear map S:FqnFqnk is the syndrome map.

Proposition 5.2: syndromes of vectors in the same coset

Let S be a syndrome map for a linear code CFqn. If v,yFqn,

  • S(v)=S(y) v,y are in the same coset of C;

  • S(v)=0 vC.

  • Proof. yHT=vHT (yv)HT=0 yvC. By definition of cosets, this means that v is in the coset of y. In particular, S(v)=0 means v0+C=C.

The use of syndromes in error detection and correction

If S(y)0, y is not a codevector, so the syndrome map detects errors in a received vector.

To correct errors, we need to construct a decoder for the linear code C. If we know a check matrix H for C, we can improve the standard array decoder for C. We will write the same decoder differently; it will require much less memory but more calculations.

Algorithm 5.3: the syndrome decoder

Preparation. Construct a table of syndromes, with qnk rows, of the form

.
Coset leader ai S(ai)

Start with the top row: the codeword 0 and its syndrome S(0)=0.

At each step, choose a vector aiFqn of smallest weight such that S(ai) does not appear in the table; then ai is a coset leader of a new coset.

Decoding.

  • • Receive a vector yFqn.

  • • Calculate S(y)=yHT.

  • • In the table, find ai with S(ai)=S(y). Then ai is the coset leader of y+C.

  • • Return DECODE(y)=yai.

Remark. The syndrome decoder is based on a choice of one coset leader in every coset. This is the same as for the standard array decoder.

In fact, if the same coset leaders are chosen in both decoders, both decoders with yield the same function DECODE:FqnC. They differ only in the way this function is computed.

The number of arithmetic operations required to calculate the syndrome S(y)=yHT can be of order n2, whereas the standard array decoder requires n operations to look up a vector. On the other hand, the amount of memory required by the syndrome decoder is proportional to qnk which is better than qn for the standard array. The advantage is especially significant for codes with high code rate kn.

Nevertheless, for codes which have more algebraic structure (than just linear codes), decoding algorithms exist which require even less storage, but the computation complexity is higher compared to syndrome decoding. Some examples will appear from the next chapter onwards.

Example: example of syndrome decoding

Let C be the binary linear code with check matrix H=[001000100100110010010001].

  • (a) Construct the table of syndromes for C using the matrix H.

  • (b) Using the table of syndromes, decode the received vector y=111111.

Solution.

(a) When calculating syndromes, it is useful to observe that the syndrome of a vector 00100 (with 1 in position i and 0s elsewhere) is equal to the ith column of H, transposed.

The syndrome map is linear, so the syndrome of a sum of two vectors is the sum of their syndromes, etc.

For example, S(011000)=0011+1000=1011 (the sum of the second and the third columns of H, transposed).

.
vector syndrome leader?
000000 0000 yes
000001 0001 yes
000010 0010 yes
000100 0100 yes
001000 1000 yes
010000 0011 yes
100000 0110 yes

All vectors of weight 1 have different syndromes, so they all are coset leaders. We need more coset leaders, hence we start looking at vectors of weight 2, then weight 3:

.
000011 0011 no, syndrome already in the table
000101 0101 yes
001001 1001 yes
001010 1010 yes
001100 1100 yes
010100 0111 yes
011000 1011 yes
101000 1110 yes
001101 1101 yes
011100 1111 yes

When we try a vector, say of weight 2, and find that is syndrome is already in the table, we ignore that vector and try another one.

We found 16=262 coset leaders so we stop.

(b) S(111111)=1010 which is the syndrome of the coset leader 001010 in the table. Therefore, DECODE(111111)=111111001010=110101.