Skip to main content

Error correction and detection in IT, technically speaking

Quite literally, error correction is the reason why we enjoy clear digital movies, static free tv and free flowing mobile phone information. But how does it work?

Error correction is why we have a good working technology in IT and technology in general. It is the reason why your Blu-ray disk displays a clear movie and with sound format resembling continuity. It is the reason why we can enjoy TV stations without stuttering and static. Even our humble phone or fax relies on error correction to receive and send intelligible and meaningful information from one place to another. This article dive straight into the technical details of how this actually works.

All error-detection and correction schemes add some form of extra data to the message which data receivers use to check consistency and recover data when it is determined that it is corrupted. In a systematic scheme, the transmitter sends the original data and attaches some parity data, which is derived from the data bits by some algorithm. In a non-systematic system, the original message is transformed into an encoded message carrying the same information and that has at least as many bits as the original message.

Different error detection codes

A repetition code is a scheme whereby bits are repeated across a channel to achieve error-free communication. The data is divided into blocks of bits and each block is transmitted a predetermined number of times. Example: The bit pattern 1101 block can be sent repeated 3 times as 1101 1101 1101. If this 12-bit pattern is received as 1001 1101 1101, the code assumes that the first block is incorrect, and correction can be made by copying the other two.

A repetition code is very inefficient and is susceptible to problems if the error occurs in the same place for each group. Thus, in the above example, receiving 1001 1001 1001 would be wrongly assumed to be correct. However, the advantage of using repetition codes is the simplicity of implementation and is still used in some transmissions of numbers stations.

A parity bit is a bit that is added to a group of source bits to ensure that the number of set bits is even or odd. This simple scheme is used to detect single or other odd number of errors in the output. Extensions and variations of the parity bit are used in storage error detection and recovery systems called RAID.

A checksum is a modular arithmetic sum of code words for a fixed length. Checksum schemes include parity bits and check digits.

Cyclic Redundancy Checks (CRC) is a non-secure hash function designed to detect accidental damage of digital data in computer networks, so it is not suitable for detection maliciously introduced errors. It is very suited for detecting burst errors and, since they are particularly easy to implement in hardware, are used in digital networks and storage devices like hard disks.

Unlike CRCs, cryptographic hash functions can provide strong assurances about data integrity even it the errors are maliciously introduced.

Error correction

Error correction can be realised in two different ways:

Automatic repeat request (ARQ): The error detection scheme is combined with requests for retransmission of erroneous data. The data is checked using the error detection code used and, if the check fails, that data is simply retransmitted. This may be done repeatedly until all the data can be verified as correct.

Forward error correction (FEC): The sender encodes the message using an ECC (error correction code) prior to transmission. The additional redundancy added is used by the receiver to recover the original data without retransmission being necessary. The reconstruction is usually what is deemed to be the most likely data.

Applications

When we have applications that are real-time, such as telephone conversations, Automatic Repeat Request (ARQ) cannot be used because if data is retransmit, it will arrive too late to be of any use. In this case, FEC must be used. ARQ is used when extremely low error rates are important, such as in digital money transfers.

The internet uses error control at multiple levels. ARQ is used to retransmit packets received with incorrect checksums after getting discarded through Transport Control Protocol (TCP). When User Datagram Protocol (UDP) is used, the packets with incorrect checksums are discarded but not retransmitted ie detection but no correction.

Satellite broadcasting, data storage and special types of RAM called ECC RAM also make use of error detection and correction for us to have crystal clear television channels, data integrity and stability of servers, respectively.

Conclusion

Whatever technology we have available that we take for granted, we can be sure that without error correction, not one of them would be of any use to society as we know it.


#Technology #DeveloperTools #ErrorHandling