**Read the updates at the end! Comments are disabled because people don’t actually read the post. This is sad to me, since it really shows that the majority of people commenting on this post are really just trolling for a free pass at an implementation, think that I or someone else will send it to them or port it into C for them, and are so lazy that they can’t scroll through the post themselves.**

For part of my class project in coding theory this semester I’ve implemented the Guruswami-Sudan decoding algorithm for Reed-Solomon codes. Reed-Solomon codes are used widely in practice for error-control coding. For example, CDs use a Reed-Solomon code to provide resilience to errors introduced by scratches or machine error. The math behind RS codes is somewhat complicated, but the idea is as follows. Suppose you have some *k* symbols of data that you want to encode into *n* symbols. The redundancy (and hence error correction capability) can be thought of as the difference *(n – k)*. This is the “fat” we add into the data to make it resistant to errors.

RS codes take these *k* symbols and think of them as the coefficients of a polynomial

f(X) = f_0 + f_1 x + f_2 x^2 + … + f_{k-1} x^{k-1}

The encoder and decoder agree to fix *n* points *{a_1, a_2, … a_n}*. The encoding map consists of evaluating *f(X)* at these *n* points and sending the evaluations:

codeword = (f(a_1), f(a_2), … f(a_n))

The key is in decoding — if the decoder gets any *k* symbols correctly, it should be able to interpolate a unique polynomial *f* through them perfectly (this is from linear algebra). Thus in the CD example, if the CD player can’t read some of the symbols, it can only tell the decoder those symbols that it did read. As long as *k* of those symbols were received the decoder can recover the polynomial and hence the data sequence.

Now of course the math is a little more complicated than that, but not by too much. RS codes are actually defined as polynomials over the ring *F[x]* where *F* is the finite field with *2^m* elements, and the encoding is the evaluation of a degree *k-1* polynomial on all the nonzero elements of the field.

The Guruswami-Sudan algorithm algorithm works in two steps — it interpolates a certain bivariate polynomial *Q[x,y]* in a F[x,y], and then factors it into factors of the form *(y – f_i(x))*. The y-roots *{f_i(x)}* can be thought of as “candidate polynomials” for the data sequences. The algorithm works in polynomial time and outputs a small list of candidates which is guaranteed to contain the transmitted codeword under certain conditions on how bad the errors are. The real key is that the list will often contain the true codeword even if more errors occurred than the code was designed for.

My implementation is based on a very nice tutorial paper by Robert McEliece, who is a really fantastic writer. It uses the Koetter algorithm for interpolation and the Roth-Ruckenstein algorithm to factor the resulting polynomial. This posting is mostly so that Google will possibly pop up this page if people are looking for MATLAB implementations of this algorithm and don’t want to waste sleepless nights like I did trying to force it to work. Caveat emptor, as they always say.

UPDATE : I’ve moved several times since this post was written and in the meantime taken down the code since implementing this is a common course project. It was tricky but not too hard, and I’ve decided the learning benefits of implementing it yourself outweigh the (now negligible) time savings for projects involving follow-on work.