Computer files can take up a large amount of space on a hard drive as well as a lot of bandwidth to transmit. To save space, especially with files that are not often accessed, and bandwidth for files being transmitted, storage methods have been developed to save the data in a smaller package by compressing it in some way. In each case, a compression algorithm — a method for reducing the data size — is used. There are several popular categories and types of compression algorithm, each of which works in a different manner, and some of which have results that differ in important ways. Using various compression algorithms, it is generally possible to reduce a text file to less than half its original size; for graphics files, the results vary widely. The file that results from compression may either be a different format or an archive file, which is often used for storage, transmission, and distribution.
One way to categorize compression algorithms is by whether they use dictionary or statistical methods to compress data. The dictionary method focuses on repeatable phrases and is used in GIF images and in JAR and ZIP archives. The statistical method relies on frequency of use to make a conversion, which is done in two passes. An example is Modified Huffman (mh), used in some fax machines.
A second way to categorize compression algorithms, and the one that non-professional programmers most often encounter is by whether they are lossless or lossy. A lossless data compression algorithm is one the compresses the data in such a way that when it is decompressed, it is exactly identical to the original file. One example of a lossless data compression algorithm is lzw (Lempel-Ziv-Welch algorithm). Developed in 1977 by Lempel and Ziv and improved in 1984 by Welch, it is used in files such as GIF, TIF, and PDF, as well as certain modems.
A lossy data compression algorithm has the capacity for reducing data to a smaller size than lossless compression, but at the cost of some of the original data. In other words, the restoration following lossy data compression does not give an identical copy of the original file. The compression algorithm is, however, designed to limit the losses so that they are not apparent to the ear or eye. Lossy compression is used in file formats such as AAC, JPEG, MPEG, and MP3.