Checksums are fixed blocks of information, or datum, used to verify and maintain data integrity when storing and transmitting computerized information. Using mathematical functions, a checksum algorithm tallies the number of bits in a particular block of information and generates a number that represents accurate data. When the data is duplicated by any means, the duplicate is then checked against the original number, via the calculated value. If the numbers match, the data is considered complete and accurate.
Most valid file transfer protocols (FTPs) require some sort of data verification, and many functions are capable of not only detecting, but repairing minor problems with data integrity. Some types of checksums — also known as hash sums — include md5 and cyclic redundancy checks (CRCs). CRCs are a type of checksum known as a polynomial code checksum, capable of identifying accidental alterations to the original data format.
While useful for detecting errors in files downloaded over the web or via peer-to-peer (P2P) networks, checksums are also used in any function wherein data may be corrupted when transferred over a network or in a storage medium. Data errors are often caused by missing, duplicate, or incorrect bits. Since a checksum's value is calculated based on every bit in a data block, even one missing bit can cause a checksum error.
Most files encode checksums directly in their data, usually appended at the end of the file. When the file is transferred to another system or device, the receiver reads the checksum and uses its algorithm to verify that the entire file has been received without error. Files with missing, corrupted, or repeated bits may not function properly, or at all. Others may appear to work correctly despite failing the checksum. Incomplete or corrupted data can continue to degrade over time, until it becomes unusable or causes errors.
Due to their nature, basic checksums are generally only useful in verifying and repairing small blocks of data. As the amount of information increases, so does the likelihood of an error the checksum can neither detect or correct. This can cause corrupted data to repeatedly pass the checksum, generate more errors, and even corrupt the checksum datum.
Cyclic redundancy checks, because of the increased complexity of their algorithms, have a much lower error rate than a standard checksum, and can be applied to larger blocks of data. Although CRCs are not secure, they still provide greater reliability when checking and preserving data integrity. Some software also provides checksum capability and error repair based on custom functions.