The theory of error-correcting codes and more broadly, information theory, originated in Claude Shannon’s monumental work A mathematical theory of communication, published over 60 years ago in 1948. Shannon’s work gave a precise measure of the information content in the output of a random source in terms of its entropy. The noiseless coding theorem or the source coding theorem informally states that i.i.d. random variables each with entropy
can be compressed into
bits with negligible probability of information loss, and conversely compression into
bits would entail almost certain information loss.
More directly relevant to this course is Shannon’s noisy coding theorem which considered communication of a message (say consisting of bits that are output by a source coder) on a noisy communication channel whose behavior is given by a stochastic channel law. The noisy coding theorem states that every such channel has a precisely defined real number called capacity that quantifies the maximum rate at which reliable communication is possible on that channel. More precisely, given a noisy channel with capacity
, if information is transmitted at rate
(which means
message bits are communicated in
uses of the channel), then if
then exist coding schemes (comprising an encoder/decoder pair) that guarantee negligible probability of miscommunication, whereas if
, then regardless of the coding scheme, the probability of error at the receiver is bounded below by some constant (which increased as
increases). (Later, a strong converse to the Shannon coding theorem was proved, which shows that when
, the probability of miscommunication goes exponentially (in
) to
.) Shannon’s theorem was one of the early uses of the probabilistic method; it asserted the existence of good coding schemes at all rates below capacity, but did not give any efficient method to construct a good code or for that matter to verify that a certain code was good.
We will return to Shannon’s probabilistic viewpoint and in particular his noisy coding theorem in a couple of lectures, but we will begin by introducing error-correcting codes from a more combinatorial/geometric viewpoint, focusing on aspects such as the minimum distance of the code. This viewpoint was pioneered by Richard Hamming in his celebrated 1950 paper Error detecting and error correcting codes. The Hamming approach is more suitable for tackling worst-case/adversarial errors, whereas the Shannon theory handles stochastic/probabilistic errors. This corresponds to a rough dichotomy in coding theory results — while the two approaches have somewhat different goals and face somewhat different limits and challenges, they share many common constructions, tools, and techniques. Further, by considering meaningful relaxations of the adversarial noise model or the requirement on the encoder/decoder, it is possible to bridge the gap between the Shannon and Hamming approaches. (We will see some results in this vein during the course.)
The course will be roughly divided into the following interrelated parts. We will begin by results on the existence and limitations of codes, both in the Hamming and Shannon approaches. This will highlight some criteria to judge when a code is good, and we will follow up with several explicit constructions of “good” codes (we will encounter basic finite field algebra during these constructions). While this part will mostly have a combinatorial flavor, we will keep track of important algorithmic challenges that arise. This will set the stage for the algorithmic component of the course, which will deal with efficient (polynomial time) algorithms to decode some important classes of codes. This in turn will enable us to approach the absolute limits of error-correction “constructively,” via explicit coding schemes and efficient algorithms (for both worst-case and probabilistic error models).
Codes, and ideas behind some of the good constructions, have also found many exciting “extraneous” applications such as in complexity theory, cryptography, pseudorandomness and explicit combinatorial constructions. (For example, in the spring 2009 offering of 15-855 (the graduate complexity theory course), we covered in detail the Sudan-Trevisan-Vadhan proof of the Impagliazzo-Wigderson theorem that under a exponential circuit lower bound for
, based on a highly efficient decoding algorithm for Reed-Muller codes.) Depending on time, we may mention/discuss some of these applications of coding theory towards the end of the course, though given that there is plenty to discuss even restricting ourselves to primarily coding-theoretic motivations, this could be unlikely.
We now look at some simple codes and give the basic definitions concerning codes. But before that, we will digress with some recreational mathematics and pose a famous “Hat” puzzle, which happens to have close connections to the codes we will soon introduce (that’s your hint, if you haven’t seen the puzzle before!)
Guessing hat colors
The following puzzle made the New York Times in 2001.
players enter a room and a red or blue hat is placed on each person’s head. The color of each hat is determined by a coin toss, with the outcome of one coin toss having no effect on the others. Each person can see the other players’ hats but not his own.
No communication of any sort is allowed, except for an initial strategy session before the game begins. Once they have had a chance to look at the other hats, the players must simultaneously guess the color of their own hats or pass. The group wins the game if at least one player guesses correctly and no players guess incorrectly.
One obvious strategy for the players, for instance, would be for one player to always guess “red” while the other players pass. This would give the group a 50 percent chance of winning the prize. Can the group achieve a higher probability of winning (probability being taken over the initial random assignment of hat colors)? If so, how high a probability can they achieve?
(The same game can be played with any number of players. The general problem is to find a strategy for the group that maximizes its chances of winning the prize.)
1. A simple code
Suppose we need to store 64 bit words in such a way that they can be correctly recovered even if a single bit per word gets flipped. One way is to store each information bit by duplicating it three times. We can thus store 21 bits of information in the word. This would permit only about a fraction of information to be stored per bit of the word. However, it would allow us to correct any single bit flip since the majority value of the three copies of the bit gives the correct value of the bit, even if one of the copies is flipped.
Hamming in introduced a code, now called the “Hamming code,” which could also correct
-bit errors using fewer redundant (or extra) bits. The code is defined in such a way that a chunk of
information bits
gets mapped (or “encoded”) to a “codeword” of
bits as
This transformation can equivalently be represented as a mapping from to
(the operations are done modulo
) where
is the column vector
and
is the matrix
It is easy to verify that two distinct -bit vectors
and
get mapped to codewords
and
which differ in at least
bits. (Define
so that
. Now check that for each non-zero
,
has at least
bits which are
.) It follows that the above code can correct all single bit flips, since for any
-bit vector there is always at most one codeword which can be obtained by a single bit flip. (As we will see soon, these codes also have the remarkable property that for
which is not a codeword, there is always a codeword which can be obtained by a single bit flip.)
2. Some basic definitions
Let us get a few simple definitions out of the way.
Definition 1 (Hamming distance) The Hamming distance between two strings
and
of the same length over a finite alphabet
, denoted
, is defined as the number of positions at which the two strings differ, i.e.,
. The fractional Hamming distance or relative distance between
is given by
.
It is trivial to check that the Hamming distance defines a metric on .
Definition 2 (Hamming weight) The Hamming weight of a string
over alphabet
is defined as the number of non-zero symbols in the string. More formally, the Hamming weight of a string
. Note that
.
Given a string , the Hamming ball or radius
around
is the set
.
Definition 3 (Code) An error correcting code or block code
of length
over a finite alphabet
is a subset of
. The elements of
are called the codewords in
, and the collection of all codewords is sometimes called a codebook. The alphabet of
is
, and if
, we say that
is a
-ary code. When
, we say that
is a binary code. The length
of the codewords of
is called the block length of
.
Associated with a code is also an encoding map
which maps the message set
, identified in some canonical way with
say, to codewords belonging to
. The code is then the image of the encoding map.
We now define two crucial parameters concerning a code: its rate which measures the amount of redundancy introduced by the code, and its minimum distance which measures the error-resilience of a code quantified in terms of how many errors need to be introduced to confuse one codeword for another.
Definition 4 (Rate) The rate of a code
, denoted
, is defined by
Thus,
is the amount of non-redundant information per bit in codewords of
. The dimension of
is defined to
; this terminology will make sense once we define linear codes shortly. Note that a
-ary code of dimension
has
codewords.
Definition 5 (Distance) The minimum distance, or simply distance, of a code
, denoted
, is defined to be the minimum Hamming distance between two distinct codewords of
. That is,
In particular, for every pair of distinct codewords in
the Hamming distance between them is at least
. The relative distance of
, denoted
, is the normalized quantity
, where
is the block length of
. Thus, any two codewords of
differ in at least a fraction
of positions.
Example 1 The parity check code, which maps
bits to
bits by appending the parity of the message bits, is an example of distance
code. Its rate is
.
Example 2 The Hamming code discussed earlier is an example of distance 3 code, and has rate
.
The following simple fact highlights the importance of distance of a code for correcting (worst-case) errors. The proof follows readily from the definition of minimum distance.
Lemma 6 For a code, the following statements are equivalent:
has minimum distance
.
can be used to correct all
symbol errors.
can be used to detect all
symbol errors.
can be used to correct all
symbol erasures. (In the erasure model, some symbols are erased and the rest are intact, and we know the locations of the erasures. The goal is to fill in the values of the erased positions, using the values of the unerased positions and the redundancy of the code.)
3. Linear codes
A general code might have no structure and not admit any representation other than listing the entire codebook. We now focus on an important subclass of codes with additional structure called linear codes. Many of the important and widely used codes are linear.
Linear codes are defined over alphabets which are finite fields. Throughout, we will denote by
the finite field with
elements, where
is a prime power. (Later on in the course, it is valuable to have a good grasp of the basic properties of finite fields and field extensions. For now, we can safely think of
as a prime, in which case
is just
with addition and multiplication defined modulo
.)
Definition 7 (Linear code) If
is a field and
is a subspace of
then
is said to be a linear code.
As is a subspace, there exists a basis
where
is the dimension of the subspace. Any codeword can be expressed as the linear combination of these basis vectors. We can write these vectors in matrix form as the columns of a
matrix. Such a matrix is called a generator matrix.
Definition 8 (Generator matrix and encoding) Let
be a linear code of dimension
. A matrix
is said to be a generator matrix for
if its
columns span
.
The generator matrixprovides a way to encode a {message}
(thought of as a column vector) as the codeword
. Thus a linear code has an encoding map
which is a linear transformation
.
Comment: Many coding texts define the “transpose” version, where the rows of the generator matrix span the code. We prefer the above definition since it is customary to treat vectors as column vectors (even in these coding texts) and it is therefore nice to multiply by vectors on the right and avoid taking transposes of vectors.
Note that a linear code admits many different generator matrices, corresponding to the different choices of basis for the code as a vector space.
Notation: A -ary linear code of block length
and dimension
will be referred to as an
code. Further, it the code has minimum distance
, it will be referred to as an
code. When the alphabet size
is clear from the context, or not very relevant to the discussion, we omit the subscript.
Example 3 Some simple examples of binary linear codes:
- The binary parity check code: This is an
code consisting of all vectors in
of even Hamming weight.
- The binary repetition code: This is an
code consisting of the two vectors
and
.
- The Hamming code discussed above is a
linear code.
Exercise 1 Show that for a linear code
, its minimum distance equals the minimum Hamming weight of a nonzero codewords of
, i.e.,
Exercise 2 (Systematic or standard form) Let
be an
linear code. Prove that after permuting coordinates if necessary,
has a generator matrix of the form
where
is the
identity matrix and
is some
matrix.
A generator matrix in the form is said to be in systematic form. When such a generator matrix is used for encoding, the encoding is called systematic: the first
symbols of the codeword are just the message symbols, and the remaining
symbols comprise redundant check symbols. Thus, after permuting coordinates if needed, every linear code admits a systematic encoding. The above-mentioned encoding map for the
Hamming code was systematic.
3.1. Parity check matrices
The following is a good way to flex your basic linear algebra muscles (Exercise 2 is a useful way to proceed):
Exercise 3 Prove that
is an
code if and only if there is a matrix
of full row rank such that
In other words, is the nullspace of
. Such a matrix
is called a parity check matrix for
. A linear code can thus be compactly represented by either its generator matrix or its parity check matrix. The minimum distance of a code has the following characterization in terms of the parity check matrix.
Lemma 9 If
is the parity check matrix of a linear code
, then
equals the minimum number of columns of
that are linearly dependent.
3.2. Hamming code revisited
The Hamming code is best understood by the structure of its parity check matrix. This will also allow us to generalize Hamming codes to larger lengths.
We defined the Hamming code using generator matrix
If we define the matrix
then one can check that and that
is a parity check matrix for
. Note that
has a rather nice structure: its columns are the integers
to
written in binary.
Correcting single errors with the Hamming code: Suppose that is a corrupted version of some (unknown) codeword
, with a single bit flipped. We know by the distance property of
that
is uniquely determined by
. In particular, a naive method to determine
would be to flip each bit of
and check if the resulting vector is in the null space of
.
A more slick (and faster) way to correct is as follows. We know that
where
is the column vector of all zeros except a single 1 in the
‘th position. Note that
the
th column of
. The
th column of
is the binary representation of
, and thus this method recovers the location
of the error.
Definition 10 (Syndrome) The vector
is said to be the syndrome of
.
Generalized Hamming codes: Define to be the
matrix where column
of
is the binary representation of
. This matrix must contain
through
, which are the binary representations of all powers of two from 1 to
, and thus has full row rank.
Now we can define the ‘th generalized Hamming code
to be the binary code with parity check matrix .
Lemma 11
is an
code.
Proof: Since has rank
, it follows that the dimension of
equals
. By Lemma 9, we need to check that no two columns of
are linearly dependent, and there are
linearly dependent columns in
. The former follows since the columns of
are all distinct. For the latter, note that the first 3 columns of
, being the binary representations of
, add up to
.
Despite its simplicity, the Hamming code is amazing in that it is optimal in the following sense.
Lemma 12 If
is a binary code of block length
and minimum distance
, then
It follows that the Hamming code
has the maximum possible number of codewords (and thus has largest rate) amongst all binary codes of block length
and minimum distance at least
.
Proof: This is a special case of the “Hamming volume” (upper) bound on the size of codes. For each , define its neighborhood
of all strings that differ in at most one bit from
. Since
has distance
, by the triangle inequality we have
for
. Therefore
giving the desired upper bound on . Note that
has dimension
, and therefore size
which meets the upper bound (1) for block length
.
The obvious generalization of the above argument to arbitrary distances gives the following bound. This is called the Hamming bound (also sometimes called the Volume bound). We will meet this bound again and also discuss some more sophisticated combinatorial bounds a few lectures down.
Lemma 13 (Hamming bound) Let
be a binary code of block length
and distance
. Then
Note that for equality to hold in (2) something remarkable must happen. Hamming balls of radius around the codewords must cover each point in
exactly once. Thus we would have a perfect packing of non-overlapping Hamming spheres that cover the full space. (There is no way to pack balls in Euclidean space in such a way, with no “holes” in between!) Codes which meet the bound (2) are therefore called perfect codes.
By Lemma 12, Hamming codes are perfect. It turns out that perfect codes are a rare phenomenon. The following theorem (which is beyond the scope of this course) enumerates all (binary) perfect codes:
Theorem 14 (Tietavainen and van Lint) The following are all the binary perfect codes:
- The Hamming codes
![]()
- The
Golay code
- The trivial codes consisting of just one codeword, or the whole space, or the repetition code
for odd
![]()
The Golay code is a remarkable object with many equivalent definitions. The following defines it as a so-called “cyclic” code:
consists of
when the polynomial
is divisible by
in the ring
.
3.3. Dual of a code
Since a linear code is a subspace of we can define its dual or orthogonal space.
Definition 15 (Dual code) If
is a linear code, then its dual, denoted
, is a linear code over
defined as
where
is the dot product over
of vectors
and
.
Using Exercise 3, verify the following.
Exercise 4
- If
is an
linear code, then
is an
linear code.
.
- If
is a parity check matrix for
, then
is a generator matrix for
. Equivalently, if
is a generator matrix for
, then
is a parity check matrix for
.
Unlike vector spaces over , where the dual (or orthogonal complement)
of a subspace
satisfies
(and
), for subspaces of
,
and
can intersect non-trivially. One can have
(such a code
is called self-orthogonal) or even
(called self-dual codes).
Exercise 5 For every even
, give an example of a self-dual binary linear code of block length
.
A rather mysterious phenomenon in coding theory is that for many constructions of good codes, their dual also has some nice properties. We will now discuss the dual of the Hamming code, a code called the Hadamard code which has been highly influential and played a fundamental role in the computer-scientists’ study of coding theory.
3.4. Hadamard code
The dual of the Hamming code has as generator matrix
, which is a
matrix whose rows are all non-zero bit vectors of length
. This is a
code and is called the simplex code. The Hadamard code is obtained by adding the all-zeroes row to
.
Definition 16 (Hadamard code) The binary Hadamard code
is a
linear code whose
generator matrix has all
-bit vectors as its rows. Thus the encoding map for the Hadamard code encodes
by a string in
consisting of the dot product
for every
. The Hadamard code can also be defined over
, but encoding a message in
with its dot product with every vector in
.
We note that the Hadamard code is the most redundant linear code in which no two codeword symbols are equal in every codeword. Hadamard codes have excellent distance property:
Lemma 17 The Hadamard code
(as well as the Simplex code) has minimum distance
. The
-ary Hadamard code of dimension
has distance
.
Proof: We prove that for ,
for exactly
(i.e., half of the) elements
. Assume for definiteness that
. Then for every
,
, and therefore exactly one of
and
equals
. The proof for the
-ary case is similar.
We will later see that binary codes cannot have relative distance more than (unless they only have a fixed constant number of codewords). Thus the relative distance of Hadamard codes is optimal, but their rate is (necessarily) rather poor.
Comment: The first order Reed-Muller code is a code that is closely related to the Hadamard code. Linear algebraically, it is simply the subspace spanned by the Hadamard code and the all ‘s vector (i.e., the union of the Hadamard code
and its coset
). It maps a message
to
, or equivalently the evaluations
of the
-variate polynomial
. It is a
code. We will later see that no binary code of block length
and relative distance
can have more than
codewords, so the first order Reed-Muller code is optimal in this sense.
Code families and Asymptotically good codes
The Hamming and Hadamard codes exhibit two extremes in the trade-off between rate and distance. Hamming codes have rate approaching (and in fact optimal rate) but their distance is only
. Hadamard codes have (optimal) relative distance
, but their rate approaches
. A natural question this raises is whether there are codes which have both good rate and good relative distance (say, with neither of them approaching
for large block lengths).
To formulate this question formally, and also because our focus in this course is on the asymptotic behavior of codes for large block lengths, we consider code families. Specifically, we define a family of codes to be an infinite collection where
is a
-ary code of block length
with
and
. Most of the constructions of codes we will encounter in this book will naturally belong to an infinite family of codes that share the same general structure and properties (and it is usually these asymptotic properties that will often guide our constructions).
We have defined the alphabet sizes of the codes in the family to also grow with
(and
). Code families where
for all
for some fixed
will be of special interest (and turn out to be more challenging to understand and construct). We will call such families as code families over a fixed alphabet or more specifically as
-ary code families.
The notions of rate and relative distance naturally extend to code families. The rate of an infinite family of codes is defined as
The (relative) distance of a family of codes equals
A -ary family of codes is said to be asymptotically good if both its rate and relative distance are bounded away from zero, i.e., if there exist constants
and
such that
and
.
In this terminology, the question raised above concerning (binary) codes with both good rate and good distance, can be phrased as: “Are there asymptotically good binary code families?”
For a code family, achieving a large rate and large relative distance are naturally conflicting codes, and there are fundamental trade-offs between these. We will study some of these in the course. A good deal is known about the existence (or even explicit construction) of good codes and as well as limitations of codes, but the best possible asymptotic trade-off between rate and relative distance for binary codes remains a fundamental open question. In fact, the asymptotic bounds have seen no improvement since 1977!
[This post was typseset using the LaTeX to WordPress-HTML converter LaTeX2WP.]
A self-orthogonal code is a code
such that
and not the other way around. This has been corrected in the notes.
Comment by Venkat Guruswami — January 21, 2010 @ 3:27 pm |
extra-ordinary notes
Comment by Gurmeet Singh — February 27, 2010 @ 3:31 am |
[…] systematic to dual to quantum gates Posted on August 10, 2011 by [email protected] from https://errorcorrectingcodes.wordpress.com/2010/01/12/lecture-material-1-introduction-linear-codes/ […]
Pingback by systematic to dual to quantum gates | Peter's ruminations — August 10, 2011 @ 7:51 pm |