Data integration is the merging of multiple data sources into a single data source. This practice is often very time-consuming and involved, as the different data sources are likely incompatible with one another. Things as simple as different column names on a spreadsheet are enough to require date reformatting. This process is most common in situations where two groups started with no connection, but are placed together after they have worked independently. Data integration has become a more important topic due to the prevalence of free data sources and online databases.
The data part of data integration can be almost anything as long as it is stored in a computer system. The actual content of the data is rarely as important as the way in which the data is stored. Most of the time, the data is kept in databases, organized systems of information. These systems contain unique entries and fields that allow users to find information quickly.
The biggest hurdle to any data integration process is the data itself. In many cases, when the data was first set up, there was no intention of ever merging the dataset with another. This means that even though two datasets may refer to the same thing, they are totally incompatible.
Nearly anything will make databases incompatible. Something as simple as a difference in presentation, such as field order or column width, can be enough to prevent an easy merger. When the data is significantly different, such as one database that contains more or less information, the merging is much more difficult.
The two situations that call for data integration more than any other are in the business and the research fields. In the business world, merging departments or companies requires combining the previously separate information into a single structure. This form of integration is generally very difficult unless the original groups used similar software and had similar information goals.
When data integration is performed for research purposes, it generally goes much smoother. When one researcher gives access to his information to another, the two parties are generally looking into the same process. This means they will use similar methods to catalog and store their data.
In the past, data integration was a relatively minor area of data studies, but this has changed since the early part of the 21st century. With free online databases becoming more popular and accurate, companies are scrambling to get their information in a sharable format. This allows them to both release their information in a public form and to integrate private versions of well-known public interfaces into their systems.