Lucene 4.7.2 analyzers-common API

Analyzers for indexing content in different languages and domains.

See:
Description

Packages
org.apache.lucene.analysis.ar	Analyzer for Arabic.
org.apache.lucene.analysis.bg	Analyzer for Bulgarian.
org.apache.lucene.analysis.br	Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca	Analyzer for Catalan.
org.apache.lucene.analysis.charfilter	Normalization of text before the tokenizer.
org.apache.lucene.analysis.cjk	Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
org.apache.lucene.analysis.ckb	Analyzer for Sorani Kurdish.
org.apache.lucene.analysis.cn	Analyzer for Chinese, which indexes unigrams (individual chinese characters).
org.apache.lucene.analysis.commongrams	Construct n-grams for frequently occurring terms and phrases.
org.apache.lucene.analysis.compound	A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation	The code for the compound word hyphenation is taken from the Apache FOP project.
org.apache.lucene.analysis.core	Basic, general-purpose analysis components.
org.apache.lucene.analysis.cz	Analyzer for Czech.
org.apache.lucene.analysis.da	Analyzer for Danish.
org.apache.lucene.analysis.de	Analyzer for German.
org.apache.lucene.analysis.el	Analyzer for Greek.
org.apache.lucene.analysis.en	Analyzer for English.
org.apache.lucene.analysis.es	Analyzer for Spanish.
org.apache.lucene.analysis.eu	Analyzer for Basque.
org.apache.lucene.analysis.fa	Analyzer for Persian.
org.apache.lucene.analysis.fi	Analyzer for Finnish.
org.apache.lucene.analysis.fr	Analyzer for French.
org.apache.lucene.analysis.ga	Analysis for Irish.
org.apache.lucene.analysis.gl	Analyzer for Galician.
org.apache.lucene.analysis.hi	Analyzer for Hindi.
org.apache.lucene.analysis.hu	Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell	Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.
org.apache.lucene.analysis.hy	Analyzer for Armenian.
org.apache.lucene.analysis.id	Analyzer for Indonesian.
org.apache.lucene.analysis.in	Analysis components for Indian languages.
org.apache.lucene.analysis.it	Analyzer for Italian.
org.apache.lucene.analysis.lv	Analyzer for Latvian.
org.apache.lucene.analysis.miscellaneous	Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram	Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl	Analyzer for Dutch.
org.apache.lucene.analysis.no	Analyzer for Norwegian.
org.apache.lucene.analysis.path	Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.pattern	Set of components for pattern-based (regex) analysis.
org.apache.lucene.analysis.payloads	Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.position	Filter for assigning position increments.
org.apache.lucene.analysis.pt	Analyzer for Portuguese.
org.apache.lucene.analysis.query	Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse	Filter to reverse token text.
org.apache.lucene.analysis.ro	Analyzer for Romanian.
org.apache.lucene.analysis.ru	Analyzer for Russian.
org.apache.lucene.analysis.shingle	Word n-gram filters
org.apache.lucene.analysis.sinks	`TeeSinkTokenFilter` and implementations of `TeeSinkTokenFilter.SinkFilter` that might be useful.
org.apache.lucene.analysis.snowball	`TokenFilter` and `Analyzer` implementations that use Snowball stemmers.
org.apache.lucene.analysis.standard	Fast, general-purpose grammar-based tokenizers.
org.apache.lucene.analysis.standard.std31	Backwards-compatible implementation to match `Version.LUCENE_31`
org.apache.lucene.analysis.standard.std34	Backwards-compatible implementation to match `Version.LUCENE_34`
org.apache.lucene.analysis.standard.std36	Backwards-compatible implementation to match `Version.LUCENE_36`
org.apache.lucene.analysis.standard.std40	Backwards-compatible implementation to match `Version.LUCENE_40`
org.apache.lucene.analysis.sv	Analyzer for Swedish.
org.apache.lucene.analysis.synonym	Analysis components for Synonyms.
org.apache.lucene.analysis.th	Analyzer for Thai.
org.apache.lucene.analysis.tr	Analyzer for Turkish.
org.apache.lucene.analysis.util	Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia	Tokenizer that is aware of Wikipedia syntax.
org.apache.lucene.collation	Unicode collation support.
org.apache.lucene.collation.tokenattributes	Custom `AttributeImpl` for indexing collation keys as index terms.
org.tartarus.snowball	Snowball stemmer API.
org.tartarus.snowball.ext	Autogenerated snowball stemmer implementations.

Analyzers for indexing content in different languages and domains.

For an introduction to Lucene's analysis API, see the org.apache.lucene.analysis package documentation.

This module contains concrete components (CharFilters, Tokenizers, and (TokenFilters) for analyzing different types of content. It also provides a number of Analyzers for different languages that you can use to get started quickly.

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV NEXT

FRAMES NO FRAMES