Dynamic time warping (DTW) involves a method of calculation, called an algorithm, to compare sounds, video, and graphics that may be similar but samples of which may have subtle differences. The calculations typically formulate a linear representation of the sample, and measure the differences as a function of time. Different elements of a sample can be mapped on a grid to identify similarities, while commands for functions often use symbols to identify each variable. Speech recognition, for example, sometimes uses dynamic time warping to match words even if are spoken at different speeds or certain parts are pronounced differently.
Many speech recognition programs use dynamic time warping because people often speak at different rates. Certain vowel sounds may be annunciated differently depending on emotions or other factors. Some programs can recognize words spoken no matter who is speaking. For this reason, it is usually not effective to add up the distances in time intervals to compare sounds. With DTW, various time-specific points for each signal are analyzed; these distances are calculated on a grid which runs from bottom-left to top-right.
Similarities in the corresponding parts of two samples can be measured using the Levenshtein distance. Letters are used to represent the changes between one source and another. The solution to the algorithm typically is a larger number the more different the two samples are. This concept is often used for speech recognition as well as spell checking and analyzing genetic material.
In some measurements, frequency changes can offset the ability of dynamic time warping. Signals can be calculated in such a way that their form is used regardless of frequency. Modulated signals can pose a problem as well, but a grid that calculates distances between line segments instead of points can compensate.
Sequence alignment is generally mathematical and some computer programming skills are needed to fully understand it. Dynamic time warping algorithms depend on some basic conditions for realistically calculating the differences between audio or visual samples. Considering a sample as a path along a grid, the algorithm often follows rules, such as the path cannot turn back and that it is measured one step at a time. In addition to the bottom-left to top-right format, measurements are limited to locations close to a diagonal line. Values that are too steep or shallow are often disregarded because they can cause errors in the final measurement.