Data redundancy is a situation that occurs within database systems and involves the unintentional creation of duplicated data that is not necessary to the function of the database. While redundancy is often a desirable trait in some situations, this is not true when it comes to the function of a database. The presence of duplicated data can often have an adverse effect on the function of the system, resulting in returning information in response to system queries that is less than helpful. One of the key functions of data management is the identification of duplicated data and the removal of those duplications.
The potential for data redundancy is found in just about any type of database program. Programs that are considered flat, such as spreadsheets, and rely on manual entry of data are particularly susceptible to the duplication of information that may lead to complications when it comes to retrieving the information desired. Relational style databases, such as sales contact databases, often include processes that help to minimize the chances for unintentional duplication, such as the creation of two different contact files on the same contact associated with the same company. Even with the use of system checks to help reduce the incidence of data redundancy, there is still the potential for issues to occur, making it necessary to periodically engage in the task of data cleanup within a database.
At best, data redundancy means that the database is littered with information that is not essential but poses no real threat to the ability to find the data when and as needed. At worst, the presence of the duplicated data slows down the essential functions of the database and can complicate the process of using the database to manage certain tasks. For example, using a customer database that is clogged with redundant information to generate mailing labels would result in the creation of a number of duplicated levels, making it necessary to either sort and dispose of the duplicates before the labels could be used, or take the time to clean up the database before attempting to generate the labels.
Fortunately, monitoring for and correcting data redundancy is something that many data management systems can accomplish with relative ease. Some systems will flag the input of duplicated data, making it easy to review the perceived duplication and decide whether to delete it or allow it to stand. There are even software programs that can be used to scan an existing database for duplications, and automatically remove those redundant entries with relative ease.