Data warehouse architecture is a design that encapsulates all the facets of data warehousing for an enterprise environment. Data warehousing is the creation of a central domain to store complex, decentralized enterprise data in a logical unit that enables data mining, business intelligence, and overall access to all relevant data within an organization. Data warehouse architecture is inclusive of all reporting requirements, data management, security requirements, band width requirements, and storage requirements.
When creating a data warehouse architecture, it is important to break the architecture into specific domains that are joined into a holistic final design. This design should be considered the blue print for the enterprise data architecture. In particular, several primary areas should be developed when considering data warehouse architecture. These areas are source system access, staging area process, data enrichment process, data architecture, business intelligence process, and storage requirements.
Data warehousing requires source data to be transferred from a transactional or database of record into the data warehouse. This process is simplified into the term Extract Transform and Load (ETL), which basically encapsulates the areas of source system access, data enrichment, and data architecture. For the sake of clarity, it is better to design these architectural areas in detail, which outlines how the ETL process will be achieved. While some data is required from the source systems, all data is not desirable as it would overburden the enterprise warehouse. The primary areas of concern when addressing the source system layer are data access methodologies, data required from the source system, and refresh requirements.
The next data warehousing architectural layer to consider is the staging area process. As most data from source systems will require validation and data cleansing, it is important to create a landing zone for source data to reside prior to loading into the business rules layer of the data warehouse. The staging area maintains raw data feeds from source systems that are typically time stamped to ensure the recentness of data.
The data enrichment or business rules process is where data is cleaned to meet the desired outcome of the data warehouse. A good example of this cleansing approach is using address cleansing tools; in the event the source system has incorrect data, the data enrichment process will run the address from the raw data set into a business rule system that would correct invalid addresses. This is also the time where inaccurate data is deleted or modified to ensure completeness within the data warehouse.
The next layer to consider is the data architecture layer. This area is where the true design or schema of the enterprise data warehouse is completed. Data warehousing in not a combination of all the data sets within an enterprise, but instead it is a newly defined database built to enable an overview of all business entities within the enterprise.
This requires the data architecture to answer the questions that will be posed by the business in the area of business intelligence and data mining. By creating the data architecture in this manner, the raw data sets will be transformed into fact tables that will allow the users to perform ad-hoc reporting on the entire enterprise view rather then a specific database. This is also the area that will maintain metadata about the data from the raw system, which could include the source system name or primary keys.
The next area to consider is the business intelligence and reporting requirements. This layer can be thought of as the user-facing requirement for the data warehousing. Typically, this area contains canned reports, ad-hoc reporting capability, and enterprise dash boards or alerts. The business intelligence layers normally get the most consideration, as it is the only outward-facing component within the data warehouse.
The final layer for consideration is overall data storage requirements and maintenance. As a data warehouse continues to grow and expand, user base data storage must be strictly managed and maintained. Additionally, while creating the data warehouse architecture, the design should make realistic estimates as to what will be required form a data storage capacity as well as a band with data access capacity. These requirements will be critical as the data warehouse become widely used throughout the enterprise.