Week 1 Introduction. How to detect and correct errors?
Version 2023-09-30. To PDF of this week only Open the whole set of notes in PDF
These notes are being revised to reflect the content of the course as taught in the 2023/24 academic year. Questions and comments on these lecture notes should be directed to Yuri.Bazlov@manchester.ac.uk.
Synopsis
We discuss information transmission and introduce the most basic notions of Coding Theory: channel, alphabet, symbol, word, Hamming distance and of course code. We show how a code can detect and correct some errors that occur during transmission. We illustrate the process using two simple examples of codes.
What is information?
It is fair to say that our age is the age of information. Huge quantities of information and data literally flow around us and are stored in various forms.
Information processing gives rise to many mathematical questions. Information needs to be processed because we may need, for example, to:
-
• store the information;
-
• encrypt the information;
-
• transmit the information.
For practical purposes, information needs to be stored efficiently, which leads to problems such as compacting or compressing the information. For the purposes of data protection and security, information may need to be encrypted. We will NOT consider these problems here. The course basically addresses one (extremely important) problem that arises in connection with information transmission.
We do not attempt to give an exhaustive definition of information. Whereas some mathematical models for space, time, motion were developed hundreds of years ago, the modern mathematical theory of information was only born in 1948 in the paper A Mathematical Theory of Communication by Claude Shannon (1916–2001). The following will be enough for our purposes:
Definition: information, alphabet, symbol
Fix a finite set
Elements of
By information we mean a stream (a sequence) of symbols.
What does it mean to transmit information? What is a channel?
Informally, it means that symbols are sent by one party (the sender) and are received by another party (receiver). The symbols are transmitted via some medium, which we will in general refer to as the channel. More precisely, the channel is a mathematical abstraction of various real-life media such as a telephone line, a satellite communication link, a voice (in a face to face conversation between individuals), a CD (the sender writes information into it — the user reads the information from it), etc.
In this course we will assume that when a symbol is fed into the channel (the input symbol), the same or another symbol is read from the other end of the channel (the output symbol). Thus, we will only consider channels where neither erasures (when the output symbol is unreadable) nor deletions (when some symbols fed into the channel simply disappear) occur. Working with those more general channels requires more advanced mathematical apparatus which is beyond this course.
Importantly, we assume that there is noise in the channel, which means that the symbols are randomly changed by the channel. Our model of information transmission is thus as follows:
The above discussion leads us to the following simplified definition of a channel, which will be sufficient for this course.
Definition: memoryless channel
Fix the alphabet
The definition says for each
A noiseless channel is the identity function which reads the input symbol
Memoryless means that the received symbol depends only on
When a symbol
-
• The received symbol is
We say that no error occurred in this symbol. -
• The received symbol is
An error occurred in this symbol.
We formalise this in
Definition: symbol error
A symbol error is an event where a symbol
The binary symmetric channel with bit error rate
Our most basic example of a channel “speaks” the binary alphabet, which we will now define.
Definition: binary alphabet, bit
The binary alphabet is the set
A bit (the same as binary symbol) is an element of
Definition:
Theoretically, we can consider
There are other channels which are mathematical models of media not well-approximated by
What is a code and what is it used for?
A word is a finite sequence of symbols, and a code is a set of words. However, in this course we only consider block codes — this means that all the words in the code are of the same length
The main application of codes is channel coding. This means that the sender uses the chosen code to encode information before sending it via the channel. This allows the receiver to achieve one of the following goals:
-
• detect most errors that occur in the channel and ask the sender to retransmit the parts where errors are detected; or
-
• correct most errors that occur in the channel (nothing is retransmitted).
We will now formalise the definition of a code, explain encoding, and then show how error detection works and how error correction works.
Definition: word
A word of length
Notation: words
We may write a word
We will denote words by underlined letters:
Definition: code, codeword
A code of length
A codeword is an element of the code.
The sender and the receiver choose a code
The encoding procedure that we consider requires the code
Definition: encoder
An encoder for a code
Here is what the sender must do.
Procedure: encoding
Before transmission, a code
-
• The information stream is split up into chunks of length
called messages. -
• The sender takes each message
and replaces it with the codeword -
• Each codeword
is sent into the channel.
Note: “
How to use a code to detect errors?
Recall that the sender transmits only codewords:
The sender transmits a codeword
If, however,
Procedure: error detection
-
1. The sender sends a codeword
via the channel. -
2. The receiver receives a word
-
• if
this is a detected error, and the receiver asks the sender to retransmit the current codeword; -
• if
the receiver accepts If and this is an undetected error.
-
-
3. If there are any more codewords to transmit, return to step 1.
We assume that every detected error is eliminated by retransmitting the codeword one or more times. These retransmissions slow down the communication, but eventually the information accepted by the receiver will consist of correct codewords and codewords containing an undetected error.
How good is a code at detecting errors?
To select the most suitable error-detecting code for a particular application, or to decide whether to use a code at all, we need to measure how good is error detection.
One way to quantify this is to ask how many symbol errors must occur in a codeword to result in an undetected error. We will investigate this next week. This approach does not explicitly take the parameters of the channel into account.
Another approach is to calculate
Definition:
Suppose that an alphabet
Note that
an example of a code used for error detection
We now introduce the code
Definition: the code
To set up error detection based on
Example: encoder for
Define
Thus, if the information which needs to be transmitted is
How good is
Suppose that the codeword
-
•
is received with probability — for each of the three bits, the probability of arriving unchanged is -
•
have probability each, and are detected errors; -
•
and with probability each, are undetected errors; -
•
has probability to be received, and is a detected error.
The total probability of an undetected error is
If any codeword other than
Hence
Thus, if information is sent unencoded (
How to use a code for error correction?
In order to take full advantage of error detection, the receiver should be able to contact the sender to request retransmission. In some situations this is not possible. We will now see how to modify the above error detection set-up so that the receiver could recover from errors, without contacting the sender.
Mathematics behind error correction got to be associated with the name of Richard Hamming (1915–1998) who came up with an idea to set up an efficient error-correcting code. In engineering literature, the set-up we are going to describe is referred to as forward error correction (FEC).
The first basic concept we need for error correction is distance between words.
Definition: Hamming distance
The Hamming distance between two words
Example: Hamming distace between some pairs of binary words
For example, in the set
Of course,
Lemma 1.1: properties of the Hamming distance
For any words
-
1.
iff -
2.
-
3.
(the triangle inequality).
Remark: recall that a function
-
Proof. 1. Since
is a cardinality of a subset of it is an integer between and Moreover, is iff for all meaning that2. Symmetry is clear as
is equivalent to3. An index
such that and does not contribute to nor to nor to (becauseAn index
such that or contributes at least to and can contribute at most toThus, every index
contributes to left-hand side at most as much as to the right-hand side. Summing for running over proves the triangle inequality. □
We will now use the Hamming distance to set up error correction. Let
Definition: decoder, nearest neighbour
A decoder for
A nearest neighbour of
In order to use error correction, the sender and the receiver choose a decoder
Remark: what the decoder does; unencode
Thus, if the received word
To restore the original message
It may happen that some words
Definition: decoded correctly
In the above setup, let
The following Claim shows that the error-correcting setup makes sense — at least, if no symbol errors occurred in a codeword, the decoder will not introduce errors! Strictly speaking, the Claim is unnecessary, because it will follow from part 2 of Theorem 2.1 given later.
Claim: a codeword is always decoded to itself
A codeword is its own unique nearest neighbour: indeed,
Therefore, a codeword is always decoded to itself:
How good is a code at correcting errors?
This can be measured in two ways. First, one can determine the maximum number of symbol errors that can occur in a codeword which the decoder is guaranteed to correct. Second, one can calculate the probability
an example of a code used for error correction
It is easy to see that the code
We define a new code, the binary repetition code of length
Definition: the code
Define
One can observe that every word in
Example: the decoder for
Why develop any more theory and not just use and ?
The problem with these codes is that, for every bit of information, you need to transmit
Concluding remarks for Chapter 1
Codes have been used for error correction for thousands of years: a natural language is essentially a code! If we “receive” a corrupted English word such as PHEOEEM, we will assume that is has most likely been THEOREM, because this would involve fewest mistakes.
The following examples are part of historical background to Coding Theory and are not covered in lectures.
Example 1 (a real-world use of Coding Theory in scientific research)
Figure 1.1: The Voyager spacecraft. Image taken from https://voyager.jpl.nasa.gov/mission/spacecraft/instruments/
Voyager 1 is a spacecraft launched by NASA in 1977. Its primary mission was to explore Jupiter, Saturn, Uranus and Neptune. Lots of precious photographs and data was sent back to Earth. Later, NASA scientists claimed that Voyager 1 reached the interstellar space.
Messages from Voyager 1 travel through the vast expanses of interplanetary space. Given that the spacecraft is equipped with a mere 23 Watt radio transmitter (powered by a plutonium-238 nuclear battery), it is inevitable that noise, such as cosmic rays, interferes with its transmissions. In order to protect the data from distortion, it is encoded with the error-correcting code called extended binary Golay code. We will look at this code later in the course. Newer space missions employ more efficient and more sophisticated codes.
Example 2 (CD, a compact disc)
A more down-to-earth example of the use of error-correcting codes. A CD can hold up to 80 minutes of music, represented by an array of zeros and ones. The data on the CD is encoded using a Reed-Solomon code. This way, even if a small scratch, a particle of dust or a fingerprint happens to be on the surface of the CD, it will still play perfectly well — all due to error correction.
However, every method has its limits, and larger scratches or stains may lead to something like a thunderclap during playback!
Example 3 (one of the first uses of a code for error correction)
In 1948, Richard Hamming was working at the famous Bell Laboratories. Back then, the data for “computers” was stored on punch cards:
Figure 1.2: A punch card. Image from http://www.columbia.edu/cu/computinghistory
pieces of thick paper where holes represented ones and absences of holes represented zeros. Punchers who had to perforate punch cards sometimes made mistakes, which frustrated Hamming.
Hamming was able to come up with a code with the following properties: each codeword is 7 bits long, and if one error is made in a codeword (i.e., one bit is changed from 0 to 1 or vice versa), one can still recover the original codeword. This made the punch card technology more robust, as a punch card with a few mistakes would still be usable. The trade-off, however, was that the length of data was increased by 75%: there are only 16 different codewords, therefore, they can be used to convey messages which have the length of 4 bits.
The original Hamming code will be introduced in the course soon!