Data classification consists of analyzing and categorizing an organization’s data assets with the objective of determining appropriate access, storage and retention. It is a discipline within the broader field of data management. In addition, it is part of information lifecycle management (ILM).
As a part of ILM, data classification helps an organization understand various aspects of its data, which include security, legal and compliance requirements, business importance, availability needs and its current location. After the various aspects of the data have been understood, a policy appropriate for its security and retention can be implemented. The classification project should be undertaken by the business process owners and legal and compliance representatives in partnership with the information technology (IT) department. Though software tools are available to aid in collection and analysis, the process of data classification is primarily manual, and it can be time consuming.
Data is considered one of two types: structured or unstructured. Structured data is usually found in databases. It is often organization-specific information, such as employee, customer and product records. Typically, the information cannot be accessed directly, but it must be accessed through an application programming interface (API). Unstructured data is found in electronic or paper documents, emails or other types of free form content, which might include audio and video files.
Data classification can benefit an organization in several ways. Proof of the existence of a policy and adherence to the policy might be required to meet government regulations around the handling of financial or sensitive data. Tax law requirements are one example of these regulations. Data classification can benefit an organization in the case of legal litigation, because appropriate data classification, along with an implemented ILM policy, ensures that the data required by the litigation is available and that extraneous data is not available. The organization can be protected from the actions of disgruntled employees or employee mistakes by appropriately classifying and restricting access to proprietary, highly confidential and top-secret data.
Some other benefits of data classification include potential reduction in backup and storage costs because data will not maintained longer than it needs to be. Data storage media should also be matched to its availability requirements. In other words, data that is needed immediately must be on a storage device that allows it to be accessed immediately, such as a network server. Data that is not needed immediately can be stored on a device that does not provide immediate access, such as digital tape.
In addition to meeting legal and compliance requirements, understanding the data might yield other important benefits. The organization that has analyzed and classified its data might be able to more effectively mine this information for decision-making or marketing purposes. This more effective use of data can lead to improved profitability.