The most important data mining concepts are used for the analysis of collected information, most notably in the effort to observe a behavior. Unknown interactions between data are researched in a variety of ways to ascertain critical relationships between subjects and aggregated information. One challenge in data mining is that the actual information collected may not be reminiscent of the whole domain. In an effort to address this fact, correlations between the data can be methodically controlled by the various data mining concepts.
Standards for data mining concepts are enforced by the Association for Computing Machinery's Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). This organization publishes the “International Journal of Information Technology and Decision Making” as well as the journal SIGKDD Explorations. Enforcing ethics and basic principles of data mining keeps the industry working efficiently and with limited legal problems.
Pre-processing of the information is one of the most important aspects of data mining. The raw data must be mined and interpreted. In order to perform this action, a process must be determined, the target data should be assembled and patterns are found. The process is known as Knowledge Discovery in Databases and was developed by Gregory Piatetsky-Shapiro in 1989.
Four different classes of data mining concepts allow the process to take place. Clustering uses the algorithm created from the data mining process to assemble items into similar groups. Unlike clustering, classification of the information is when the data is assembled into predefined groups and analyzed. Association attempts to find relationships between variables, determining which groups of data are commonly associated. The final type of data mining is regression, based on the method of identifying a function within the data collection.
Validating the information is the final step in discovering what the data mining application represents. When not all algorithms present a valid data set, the patterns that occur can result in a situation called overfitting. To overcome this problem, the data is compared to a test set. This is a concept in which the measurements are aligned with a series of algorithms that would provide a plausible set of data sets. If the acquired information does not line up to the test set, then the assumed patterns in the data must be inaccurate.
Some of the most important data mining concepts occur in a variety of industries. Gaming, business, marketing, science, engineering and surveillance all utilize data mining techniques. By conducting these techniques, each field can determine best practices or better ways to find results.