Sequence mining is a type of structured data mining in which the database and administrator look for sequences or trends in the data. This data mining is split into two fields. Itemset sequence mining typically is used in marketing, and string sequence mining is used in biology research. Sequence mining is different from regular trend mining, because the data are more specific, which makes building an effective database difficult for database designers, and it can sometimes go awry if the sequence is any different from the common sequence.
At one point or another, all databases are used to mine for data. This mining helps businesses and research parties find something they need. Usually, they are looking for some sort of trend, but what that trend is and how specific the information is will depend on the database design. In sequence mining, the database is built to find very specific sequences, with little to no variation. This is a unique form of structured data mining in which the database looks through the structured data for similarities.
Sequence mining can be broken into two categories. Itemset mining is used in marketing and business to find specific trends in sales numbers, product types, product placement in a store and the use of a product. These figures are taken and applied to marketing algorithms to help strategize a marketing project and to bolster sales. Information about a product and how it does typically is taken from the database, but the defining aspect of itemset sequence mining is that the sequence is taken from multi-symbol database cells.
String mining is the opposite of itemset mining because it looks at each symbol individually rather than as a cluster. In string mining, the database might be set to find a sequence from a protein source or gene samples. This helps in comparing many gene samples to see whether they are the same or to break down large sequences and find which sequences they contain. Mostly biological and medical research teams use this.
Creating a database for sequence mining can be difficult because, unlike trend mining and other structured data mining, the sequences must specifically match each other. This also leads to the problem of mining for sequences. If the sequence is any different, it won't be recognized, which might make itemset mining more difficult. String mining typically benefits from this, because the slightest difference in a tissue sample could make the organism — or whatever the research team is researching — completely distinct from other samples.