Schema matching is a technique used to merge two or more complex databases or sets of information into each other. As the use of databases and electronic information storage grows larger and more complex through the Internet, there has to be defined methods for merging sets of data from one database to another, and schema matching is one such technique. The concept is simple, but the reality of data merging is fairly complex.
The term "schema matching" is used synonymously with "schema mapping," because users are actually mapping data, not matching them. Two or more databases are mapped together and similar aspects of each database are mapped into each other. The most common way to merge data is by using exact references. An example of this style of merging is combining the name column of one database with the name column of another database.
Merging is not usually that simple, for people or computers. With so much data needing to be filtered, combined and used, having one database rather than multiple databases is essential. Schema mapping focuses on making this tedious process automated and more efficient. An example of where schema matching is necessary could be when one database has a "student's major" field and another database has a "student’s field of study" field. It's the same information, but the slightly different titles complicate efforts to blend it.
Schema matching breaks this complex process of merging databases into four steps: pre-integration, comparison, conforming and merging. Before multiple databases can be merged, they need to be analyzed for similarities and differences. In the realm of schema matching, this is known as pre-integration. The computer begins to determine the most efficient integration method.
Next, the computer evaluates the schemes by comparing them with one another at a more detailed level. In the comparison step, the computer looks at each database entry and determines where there may be conflicts. An example of this is when a "student’s interest" field lists "doctor" and another database lists it as "physician." A person would likely recognize the information as being the same but, for database tools, they are two separate entities.
Once the computer has determined all of the potential conflicts, it can move forward with trying to resolve the issues. This may be as simple as changing all instances of "physician" to "doctor." In reality, the process is substantially more complex.
Once all conflicts have been fixed, the computer can proceed with merging the data in the schema-matching process. At this stage, two or more databases are merged into one large database. If all goes well, no conflicts or errors will occur during integration and future access to the database.