Lucene 4.2.1 analyzers-common API

Analyzers for indexing content in different languages and domains.

See:
          Description

Packages
org.apache.lucene.analysis.ar Analyzer for Arabic.
org.apache.lucene.analysis.bg Analyzer for Bulgarian.
org.apache.lucene.analysis.br Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca Analyzer for Catalan.
org.apache.lucene.analysis.charfilter Normalization of text before the tokenizer.
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).
org.apache.lucene.analysis.commongrams Construct n-grams for frequently occurring terms and phrases.
org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation The code for the compound word hyphenation is taken from the Apache FOP project.
org.apache.lucene.analysis.core Basic, general-purpose analysis components.
org.apache.lucene.analysis.cz Analyzer for Czech.
org.apache.lucene.analysis.da Analyzer for Danish.
org.apache.lucene.analysis.de Analyzer for German.
org.apache.lucene.analysis.el Analyzer for Greek.
org.apache.lucene.analysis.en Analyzer for English.
org.apache.lucene.analysis.es Analyzer for Spanish.
org.apache.lucene.analysis.eu Analyzer for Basque.
org.apache.lucene.analysis.fa Analyzer for Persian.
org.apache.lucene.analysis.fi Analyzer for Finnish.
org.apache.lucene.analysis.fr Analyzer for French.
org.apache.lucene.analysis.ga Analysis for Irish.
org.apache.lucene.analysis.gl Analyzer for Galician.
org.apache.lucene.analysis.hi Analyzer for Hindi.
org.apache.lucene.analysis.hu Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
org.apache.lucene.analysis.hy Analyzer for Armenian.
org.apache.lucene.analysis.id Analyzer for Indonesian.
org.apache.lucene.analysis.in Analysis components for Indian languages.
org.apache.lucene.analysis.it Analyzer for Italian.
org.apache.lucene.analysis.lv Analyzer for Latvian.
org.apache.lucene.analysis.miscellaneous Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl Analyzer for Dutch.
org.apache.lucene.analysis.no Analyzer for Norwegian.
org.apache.lucene.analysis.path Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.pattern Set of components for pattern-based (regex) analysis.
org.apache.lucene.analysis.payloads Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.position Filter for assigning position increments.
org.apache.lucene.analysis.pt Analyzer for Portuguese.
org.apache.lucene.analysis.query Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse Filter to reverse token text.
org.apache.lucene.analysis.ro Analyzer for Romanian.
org.apache.lucene.analysis.ru Analyzer for Russian.
org.apache.lucene.analysis.shingle Word n-gram filters
org.apache.lucene.analysis.sinks TeeSinkTokenFilter and implementations of TeeSinkTokenFilter.SinkFilter that might be useful.
org.apache.lucene.analysis.snowball TokenFilter and Analyzer implementations that use Snowball stemmers.
org.apache.lucene.analysis.standard Fast, general-purpose grammar-based tokenizers.
org.apache.lucene.analysis.standard.std31 Backwards-compatible implementation to match Version.LUCENE_31
org.apache.lucene.analysis.standard.std34 Backwards-compatible implementation to match Version.LUCENE_34
org.apache.lucene.analysis.standard.std36 Backwards-compatible implementation to match Version.LUCENE_36
org.apache.lucene.analysis.sv Analyzer for Swedish.
org.apache.lucene.analysis.synonym Analysis components for Synonyms.
org.apache.lucene.analysis.th Analyzer for Thai.
org.apache.lucene.analysis.tr Analyzer for Turkish.
org.apache.lucene.analysis.util Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia Tokenizer that is aware of Wikipedia syntax.
org.apache.lucene.collation Unicode collation support.
org.apache.lucene.collation.tokenattributes Custom AttributeImpl for indexing collation keys as index terms.
org.tartarus.snowball Snowball stemmer API.
org.tartarus.snowball.ext Autogenerated snowball stemmer implementations.

 

Analyzers for indexing content in different languages and domains.

For an introduction to Lucene's analysis API, see the org.apache.lucene.analysis package documentation.

This module contains concrete components (CharFilters, Tokenizers, and (TokenFilters) for analyzing different types of content. It also provides a number of Analyzers for different languages that you can use to get started quickly.



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.