Change data capture is the process of locating, recording and saving version records in data systems. In most cases, change data capture systems work by giving data certain markers that reference specific data entries. When the data is changed, these markers change as well. This alerts the change data capture system, and it saves the older data version, giving users and systems access to both old and new data. These processes are common in large data storage systems such as data warehouses and web-based data systems.
Versioning data is considered a very important aspect of data storage. When one piece of data is overwritten by another, the original piece of data can’t just disappear. This would cause havoc if that piece of information was important for an ongoing process or corporate record.
Creating versions of different data pieces is the center of change data capture. If a single piece of information changes five times, the system needs to remember each of the five values and when they changed. This is both important for long-term record keeping and error checking. For example, if a worker imputed a sales figure in the wrong portion of a database, it could disrupt a huge amount of information. Versioning allows the company to revert that number if needed.
There isn’t one set method of change data capture. Different data systems use their own versions, often developed in-house to go along with their own specific style of data storage. Even so, there are a handful of methods that are commonly used. It isn’t unusual for a single system to have several different methods of change data capture operating on the same system. Often, each method specializes in a certain type of capture or operates as a redundant failsafe system.
The most common methods for creating different versions of data are special markers in the data. These markers are in a special row or column in the data that keeps track of when changes occur. Change data capture scripts watch these areas for changes and keep track of the modifications made. These special cells might contain version numbers, timestamps or proprietary data strings.
The two most common places to find full-scale change data capture systems are in data warehouses and open-access databases. One of the major selling points for data warehousing is the constant and comprehensive backups of data. For as long as a user subscribes to their services, these systems never get rid of anything. Open-access databases, like Wikipedia, use versioning to prevent tampering and keep records of which users made which changes. While Wikipedia's versioning may not be as comprehensive as those used in data warehouses, it is often examined by more users.