Analyzers for indexing content in different languages and domains.

For an introduction to Lucene's analysis API, see the org.apache.lucene.analysis package documentation.

This module contains concrete components (CharFilters, Tokenizers, and (TokenFilters) for analyzing different types of content. It also provides a number of Analyzers for different languages that you can use to get started quickly. To define fully custom Analyzers (like in the index schema of Apache Solr), this module provides CustomAnalyzer.

Packages 
Package Description
org.apache.lucene.analysis.ar
Analyzer for Arabic.
org.apache.lucene.analysis.bg
Analyzer for Bulgarian.
org.apache.lucene.analysis.bn
Analyzer for Bengali Language.
org.apache.lucene.analysis.boost
Provides various convenience classes for creating boosts on Tokens.
org.apache.lucene.analysis.br
Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca
Analyzer for Catalan.
org.apache.lucene.analysis.charfilter
Normalization of text before the tokenizer.
org.apache.lucene.analysis.cjk
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
org.apache.lucene.analysis.ckb
Analyzer for Sorani Kurdish.
org.apache.lucene.analysis.classic
Fast, general-purpose grammar-based tokenizers.
org.apache.lucene.analysis.commongrams
Construct n-grams for frequently occurring terms and phrases.
org.apache.lucene.analysis.compound
A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation
Hyphenation code for the CompoundWordTokenFilter.
org.apache.lucene.analysis.core
Basic, general-purpose analysis components.
org.apache.lucene.analysis.custom
A general-purpose Analyzer that can be created with a builder-style API.
org.apache.lucene.analysis.cz
Analyzer for Czech.
org.apache.lucene.analysis.da
Analyzer for Danish.
org.apache.lucene.analysis.de
Analyzer for German.
org.apache.lucene.analysis.el
Analyzer for Greek.
org.apache.lucene.analysis.email
Fast, general-purpose URLs and email addresses tokenizers.
org.apache.lucene.analysis.en
Analyzer for English.
org.apache.lucene.analysis.es
Analyzer for Spanish.
org.apache.lucene.analysis.et
Analyzer for Estonian.
org.apache.lucene.analysis.eu
Analyzer for Basque.
org.apache.lucene.analysis.fa
Analyzer for Persian.
org.apache.lucene.analysis.fi
Analyzer for Finnish.
org.apache.lucene.analysis.fr
Analyzer for French.
org.apache.lucene.analysis.ga
Analyzer for Irish.
org.apache.lucene.analysis.gl
Analyzer for Galician.
org.apache.lucene.analysis.hi
Analyzer for Hindi.
org.apache.lucene.analysis.hu
Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell
A Java implementation of Hunspell stemming and spell-checking algorithms (Hunspell), and a stemming TokenFilter (HunspellStemFilter) based on it.
org.apache.lucene.analysis.hy
Analyzer for Armenian.
org.apache.lucene.analysis.id
Analyzer for Indonesian.
org.apache.lucene.analysis.in
Analyzer for Indian languages.
org.apache.lucene.analysis.it
Analyzer for Italian.
org.apache.lucene.analysis.lt
Analyzer for Lithuanian.
org.apache.lucene.analysis.lv
Analyzer for Latvian.
org.apache.lucene.analysis.minhash
MinHash filtering (for LSH).
org.apache.lucene.analysis.miscellaneous
Miscellaneous Tokenstreams.
org.apache.lucene.analysis.ne
Analyzer for Nepali.
org.apache.lucene.analysis.ngram
Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl
Analyzer for Dutch.
org.apache.lucene.analysis.no
Analyzer for Norwegian.
org.apache.lucene.analysis.path
Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.pattern
Set of components for pattern-based (regex) analysis.
org.apache.lucene.analysis.payloads
Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.pt
Analyzer for Portuguese.
org.apache.lucene.analysis.query
Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse
Filter to reverse token text.
org.apache.lucene.analysis.ro
Analyzer for Romanian.
org.apache.lucene.analysis.ru
Analyzer for Russian.
org.apache.lucene.analysis.shingle
Word n-gram filters.
org.apache.lucene.analysis.sinks
org.apache.lucene.analysis.snowball
TokenFilter and Analyzer implementations that use a modified version of Snowball stemmers.
org.apache.lucene.analysis.sr
Analyzer for Serbian.
org.apache.lucene.analysis.sv
Analyzer for Swedish.
org.apache.lucene.analysis.synonym
Analysis components for Synonyms.
org.apache.lucene.analysis.synonym.word2vec
Analysis components for Synonyms using Word2Vec model.
org.apache.lucene.analysis.ta
Analyzer for Tamil.
org.apache.lucene.analysis.te
Analyzer for Telugu Language.
org.apache.lucene.analysis.th
Analyzer for Thai.
org.apache.lucene.analysis.tr
Analyzer for Turkish.
org.apache.lucene.analysis.util
Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia
Tokenizer that is aware of Wikipedia syntax.
org.apache.lucene.collation
Unicode collation support.
org.apache.lucene.collation.tokenattributes
Custom AttributeImpl for indexing collation keys as index terms.
org.tartarus.snowball
Snowball stemmer API
org.tartarus.snowball.ext
Autogenerated snowball stemmer implementations.