How to configure a database for full-text searching

Databases

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

Full-text search is applicable in a wide range of business scenarios such as e-businesses—searching for items on a web site; law firms—searching for case histories in a legal-data repository; or human resources departments—matching job descriptions with stored resumes. The basic administrative and development tasks of full-text search are equivalent regardless of business scenarios. However, in a given business scenario, full-text index and queries can be honed to meet business goals. For example, for an e-business maximizing performance might be more important than ranking of results, recall accuracy (how many of the existing matches are actually returned by a full-text query), or supporting multiple languages. For a law firm, returning every possible hit (total recall of information) might be the most important consideration.

For any scenario, a database administrator performs the following basic steps to configure table columns in a database for full-text search:

  1. Create a full-text catalog.

  2. On each table that you want to search, create a full-text index by:

    1. Identify each text columns that you want to include in the full-text index.

    2. If a given column contains documents stored as binary data (varbinary(max), or image data), you must specify a table column (the type column) that identifies the type of each document in the column being indexed.

    3. Specify the language that you want full-text search to use on the documents in the column.

    4. Choose the change-tracking mechanism that you want to use on the full-text index to track changes in the base table and its columns.

Full-text search supports multiple languages through the use of the following linguistic components: word breakers and stemmers, stoplists that contain stopwords (also known as noise words), and thesaurus files. Thesaurus files and, in some cases, stoplists require configuration by a database administrator. A given thesaurus file supports all full-text indexes that use the corresponding language, and a given stoplist can be associated with as many full-text indexes as you want.