Overview (Lucene 9.12.0 common API)

Analyzers for indexing content in different languages and domains.

For an introduction to Lucene's analysis API, see the org.apache.lucene.analysis package documentation.

This module contains concrete components (CharFilters, Tokenizers, and (TokenFilters) for analyzing different types of content. It also provides a number of Analyzers for different languages that you can use to get started quickly. To define fully custom Analyzers (like in the index schema of Apache Solr), this module provides CustomAnalyzer.

Packages
Package	Description
org.apache.lucene.analysis.ar	Analyzer for Arabic.
org.apache.lucene.analysis.bg	Analyzer for Bulgarian.
org.apache.lucene.analysis.bn	Analyzer for Bengali Language.
org.apache.lucene.analysis.boost	Provides various convenience classes for creating boosts on Tokens.
org.apache.lucene.analysis.br	Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.ca	Analyzer for Catalan.
org.apache.lucene.analysis.charfilter	Normalization of text before the tokenizer.
org.apache.lucene.analysis.cjk	Analyzer for Chinese, Japanese, and Korean, which indexes bigrams.
org.apache.lucene.analysis.ckb	Analyzer for Sorani Kurdish.
org.apache.lucene.analysis.classic	Fast, general-purpose grammar-based tokenizers.
org.apache.lucene.analysis.commongrams	Construct n-grams for frequently occurring terms and phrases.
org.apache.lucene.analysis.compound	A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation	Hyphenation code for the CompoundWordTokenFilter.
org.apache.lucene.analysis.core	Basic, general-purpose analysis components.
org.apache.lucene.analysis.custom	A general-purpose Analyzer that can be created with a builder-style API.
org.apache.lucene.analysis.cz	Analyzer for Czech.
org.apache.lucene.analysis.da	Analyzer for Danish.
org.apache.lucene.analysis.de	Analyzer for German.
org.apache.lucene.analysis.el	Analyzer for Greek.
org.apache.lucene.analysis.email	Fast, general-purpose URLs and email addresses tokenizers.
org.apache.lucene.analysis.en	Analyzer for English.
org.apache.lucene.analysis.es	Analyzer for Spanish.
org.apache.lucene.analysis.et	Analyzer for Estonian.
org.apache.lucene.analysis.eu	Analyzer for Basque.
org.apache.lucene.analysis.fa	Analyzer for Persian.
org.apache.lucene.analysis.fi	Analyzer for Finnish.
org.apache.lucene.analysis.fr	Analyzer for French.
org.apache.lucene.analysis.ga	Analyzer for Irish.
org.apache.lucene.analysis.gl	Analyzer for Galician.
org.apache.lucene.analysis.hi	Analyzer for Hindi.
org.apache.lucene.analysis.hu	Analyzer for Hungarian.
org.apache.lucene.analysis.hunspell	A Java implementation of Hunspell stemming and spell-checking algorithms (`Hunspell`), and a stemming TokenFilter (`HunspellStemFilter`) based on it.
org.apache.lucene.analysis.hy	Analyzer for Armenian.
org.apache.lucene.analysis.id	Analyzer for Indonesian.
org.apache.lucene.analysis.in	Analyzer for Indian languages.
org.apache.lucene.analysis.it	Analyzer for Italian.
org.apache.lucene.analysis.lt	Analyzer for Lithuanian.
org.apache.lucene.analysis.lv	Analyzer for Latvian.
org.apache.lucene.analysis.minhash	MinHash filtering (for LSH).
org.apache.lucene.analysis.miscellaneous	Miscellaneous Tokenstreams.
org.apache.lucene.analysis.ne	Analyzer for Nepali.
org.apache.lucene.analysis.ngram	Character n-gram tokenizers and filters.
org.apache.lucene.analysis.nl	Analyzer for Dutch.
org.apache.lucene.analysis.no	Analyzer for Norwegian.
org.apache.lucene.analysis.path	Analysis components for path-like strings such as filenames.
org.apache.lucene.analysis.pattern	Set of components for pattern-based (regex) analysis.
org.apache.lucene.analysis.payloads	Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.pt	Analyzer for Portuguese.
org.apache.lucene.analysis.query	Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse	Filter to reverse token text.
org.apache.lucene.analysis.ro	Analyzer for Romanian.
org.apache.lucene.analysis.ru	Analyzer for Russian.
org.apache.lucene.analysis.shingle	Word n-gram filters.
org.apache.lucene.analysis.sinks	`TeeSinkTokenFilter`.
org.apache.lucene.analysis.snowball	`TokenFilter` and `Analyzer` implementations that use a modified version of Snowball stemmers.
org.apache.lucene.analysis.sr	Analyzer for Serbian.
org.apache.lucene.analysis.sv	Analyzer for Swedish.
org.apache.lucene.analysis.synonym	Analysis components for Synonyms.
org.apache.lucene.analysis.synonym.word2vec	Analysis components for Synonyms using Word2Vec model.
org.apache.lucene.analysis.ta	Analyzer for Tamil.
org.apache.lucene.analysis.te	Analyzer for Telugu Language.
org.apache.lucene.analysis.th	Analyzer for Thai.
org.apache.lucene.analysis.tr	Analyzer for Turkish.
org.apache.lucene.analysis.util	Utility functions for text analysis.
org.apache.lucene.analysis.wikipedia	Tokenizer that is aware of Wikipedia syntax.
org.apache.lucene.collation	Unicode collation support.
org.apache.lucene.collation.tokenattributes	Custom `AttributeImpl` for indexing collation keys as index terms.
org.tartarus.snowball	Snowball stemmer API
org.tartarus.snowball.ext	Autogenerated snowball stemmer implementations.