In computing and online applications, a stop word is a word that is filtered out during the processing of some type of text, such as during the initiation of a search online. Known as one component of natural language processing or NLP, the idea behind this type of filtering is to help expedite searches by omitting common articles of speech from the request. Instead, the search engine uses a simple marker to note the presence of the word in the text string but does not prevent the presence of that marker from including that page in the search results.
One way to understand how stop words may complicate the function of search engines is to consider the fact that when conducting a search, the engine will consider every word that is included in the search request submitted by the user. As part of that consideration, the engine looks for pages containing each word. That means if the search request contains three words, the engine makes three sweeps of the Internet, eventually giving priority to the pages that include all three of the words.
For example, a search such as “the house on the hill” would require the engine to make searches on each word in the sequence, with some even running a search on the stop word “the” twice. This takes up time and resources that could be diverted to aiding in other keyword searches that other end users are currently conducting. By using markers to replace “the” and “on” during the search, the engine can devote fewer resources and still return results that are highly likely to satisfy the end user.
While the stop word is sometimes referred to as a poison word, there is really nothing particularly wrong with including articles of speech in the text used for conducting searches. The use of a stop word or words as part of the search request may complicate the process of search engine indexing when attempting to retrieve data that meets the search criteria. Still, the end user is not likely to see much difference in the information extraction that ultimately is returned.
There is no hard and fast stop word listing that is used universally by all search engines. In fact, some search engines do not use any type of stop word list as part of the natural language processing task at all. Other engines, however, will make ample use of the stop word list as a means of allocating resources in a more efficient manner, while still returning search engine results that are accurate and likely to be highly appropriate for search requests submitted by anyone using a particular engine.