Analyzer configuration

Analyzers are used to control how the content of a document is broken into 'terms' (words) during the indexing process. For example, an analyzer can remove common words,  normalize plural to singular and can perform other language-specific operations in order to improve the search quality. 


Configuration nodes

The following nodes are used to specify an analyzer:

  • the <locale> node specifies the locale like "de" or "en" used within an index configuration node to specify the appropriate analyzer of the contents of an index.
  • the <class> node specifies the package/class name of the analyzer class.
  • the <stemmer> node is used to specify the stemmer algorithm of the analyzer.

Available analyzers

Currently, these analyzers are part of the OpenCms search package:

    Analyzer for german language content.
    Analyzer for russian language content.
  • org.apache.lucene.analysis.standard.StandardAnalyzer
    Analyzer for english and other language content.
  • org.apache.lucene.analysis.snowball.SnowballAnalyzer
    Analyzer for various languages, see the snowball homepage.
    For this analyzer, the language is specified using the additional parameter with values: Danish, Dutch, English, Finnish, French, German, Italian, Lovins, Norwegian, Porter, Portuguese, Russian, Spanish, Swedish


This example shows how to configure an analyzer for contents in french language: