Web mining is a form of information harvesting that applies to data gathered from online sources. Data collection from sources across the Internet allows users to aggregate large volumes of information for analysis to make key business decisions in an online environment. For example, a researcher might use web mining to collect information regarding the use of specific keywords in web content. Alternatively, retailers and other marketing professionals use online data mining to spot trends in web traffic, the conversion of site visitors to buyers, and other web usage.
In terms of gathering, sorting, and analyzing data, web mining mimics traditional data mining activities. Comparatively, web mining activities focus on web-based information, rather than a large cross section of information sources such as offline computer databases, customer records, or hard copy accounting data, as typically occurs with traditional data mining. Focusing solely on data collection from online sources provides targeted analysis needed for online marketing strategies, website structure decisions, and similar electronic commerce-related decision making. Collecting data via web mining also provides the added benefit of a broad international demographic, since websites from all over the world are available to researchers and information gatherers.
Professionally, web mining is divided into three specific categories: web structure mining, usage mining, and web content mining. Each area focuses on specific information such as the structure and hyperlinks of a particular website, server log information regarding visitor usage, and specific content available online. Website analytic software packages and services are a prime example of web usage mining, providing webmasters with information regarding visitor traffic, search results used, links clicked, and time spent interacting with specific pages. Structure mining, on the other hand, provides detailed information about a specific website's internal structure, including hyperlinks, databases, and query functions.
To the marketing professional, web mining offers a wealth of uses relative to marketing activities. Knowing how site visitors use a particular website, how competitors set up a competing site, and what content is already online is valuable information. Such information helps key decision makers craft a marketing strategy based on previously proven techniques and documented information.
Colleges and universities also utilize web mining via software that verifies student papers are unique and not plagiarized. Using web content mining principles, such grading aides search the entirety of the Internet for like content. Instructors upload the text of a student document and then instruct the plagiarism software to check the Internet for similar phrases or copied text online. Results are often expressed as percentage of matching text. Links to any similar results are provided to allow instructors the ability to visit sites to determine if matches are indeed plagiarized.