A file signature in computer programming is a unique identifying number located at the beginning of a file. This number identifies the type of file, giving information about the data contained within the actual file. This information can be used to determine what type of file is being read when the file extension or user error has misidentified the file as an incorrect type. The file signature also can contain information that ensures the original data that was stored in the file is still intact and has not been modified. The combination of these elements allows a file signature to serve as an important form of verification, especially against computer viruses.
The concept of a file signature emerged because of the need for a file header, a block of data at the beginning of a file that defines the parameters of how information is stored in the file. Part of the header information is a sequence of bytes that defines the file type that was originally created. This can be an image file, a document from a specific program, or even a protocol type when a file stream is being used as a communications method between a client and server. The file header does not use a defined standard; it, instead, is proprietary to each different format, meaning a program or operating system needs a file signature database to determine the type of an unknown file.
The actual file signature is sometimes referred to as a magic number. In programming, this is a value unique in the data field it occupies. When looking at a file header to determine the file type, this means no two file signatures should be the same, allowing for each type of format to have a distinct identifying string of bytes. This can be of particular use when dealing with the transfer and interpretation of files online, where the identifying extension of a file could be arbitrary and unable to be relied upon as an identifier for a file type.
In addition to just the file type, a file signature also can contain information that allows error checking to be performed on a file so the data it is holding can be confirmed as being intact. This is often performed using a function known as a checksum. A checksum is a function that uses the integer values of the file data to create mathematical values that can be replicated after a file is transferred or loaded. In the most basic form, this process involves adding the values of a series of bytes in the file and then recording the sum, allowing the program decoding the file to perform the same function. If the results are different during decoding, then the file might have been corrupted and data could either be invalid or could have been modified for malicious purposes.