Preparing your result...
Loading...
Press Esc to dismiss this message

Text Search Overview

< Previous | Next >

This page provides a general overview of the text search facility available in the Library. Searching the library is performed either by using the search box in the header or by accessing the more flexible and sophisticated Search the Library page (which also allows a strict query mode to be enabled).

Page Contents


General Approach

The Library provides a distributed, sophisticated, and highly customized text search engine which is tuned and designed to deliver fast, quality results when searching for documents. When documents are indexed, the location of each word, including the field in which it resides and it's relative position in that field, is recorded. Depending on the field, certain specialized rules may be applied. These rules include such things as: suppressing the indexing of certain words (known as "stopwords"), applying lexicographic analysis to enhance retrievability (such as changing "web-site" so it can also be found by searching "web," "site," or "website"), and enabling morphological equivilency algorithms (known as "stemming" which, for example, allows "record" to also retrieve "records," "recording," and "recorded").

For many people, finding relevant documents is accomplished quickly and easily by simply entering words into the appropriate search box. By default, the Library interprets queries as having no special syntax and no complex query rules. The goal of the default behavior is to analyze each request and provide the best list of documents which contain all the words which could be identified. An additional goal is to provide at least some results if at all possible while avoiding such things as a "syntax error."

By using options on the Search the Library page, results may be limited to certain collections or date ranges within the Library. In addition, limiting results can also be accomplished on search result pages by using the "Adjust Filters/Sort" dialog.

Some people may wish to use the strict query mode syntax which enables such facilities as wildcards, "fuzzy" word search, proximity phrase search, and grouping and combining words and phrases using boolean operations. The query search syntax is described beginning on the next page. Use of the query search syntax is appropriate for people that are comfortable forming properly structured queries and are not frustrated by the occassional "syntax error."


Stopword Lists

Most narrative text contains a certain amount of "noise." For example, in this sentence, the only words to carry significant meaning are: "example," "sentence," "words," "carry," "significant," "meaning." Suppressing the indexing of words which carry little meaning (known as "stopwords") is a common approach to enhancing the accuracy and speed of a text search engine. The Library's text search engine applies a limited stopword list to the narrative text fields. Notably, stopwords lists are not applied to all fields, so that erroneous suppression of a person's name, for example, does not occur.

Stopword lists applied when appropriate: