Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.ar |
Analyzer for Arabic.
|
org.apache.lucene.analysis.bg |
Analyzer for Bulgarian.
|
org.apache.lucene.analysis.br |
Analyzer for Brazilian Portuguese.
|
org.apache.lucene.analysis.cjk |
Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).
|
org.apache.lucene.analysis.cn |
Analyzer for Chinese, which indexes unigrams (individual chinese characters).
|
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words.
|
org.apache.lucene.analysis.compound |
A filter that decomposes compound words you find in many Germanic
languages into the word parts.
|
org.apache.lucene.analysis.cz |
Analyzer for Czech.
|
org.apache.lucene.analysis.de |
Analyzer for German.
|
org.apache.lucene.analysis.el |
Analyzer for Greek.
|
org.apache.lucene.analysis.en |
Analyzer for English.
|
org.apache.lucene.analysis.es |
Analyzer for Spanish.
|
org.apache.lucene.analysis.fa |
Analyzer for Persian.
|
org.apache.lucene.analysis.fi |
Analyzer for Finnish.
|
org.apache.lucene.analysis.fr |
Analyzer for French.
|
org.apache.lucene.analysis.ga |
Analysis for Irish.
|
org.apache.lucene.analysis.gl |
Analyzer for Galician.
|
org.apache.lucene.analysis.hi |
Analyzer for Hindi.
|
org.apache.lucene.analysis.hu |
Analyzer for Hungarian.
|
org.apache.lucene.analysis.hunspell |
Stemming TokenFilter using a Java implementation of the
Hunspell stemming algorithm.
|
org.apache.lucene.analysis.icu |
Analysis components based on ICU
|
org.apache.lucene.analysis.icu.segmentation |
Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm.
|
org.apache.lucene.analysis.id |
Analyzer for Indonesian.
|
org.apache.lucene.analysis.in |
Analysis components for Indian languages.
|
org.apache.lucene.analysis.it |
Analyzer for Italian.
|
org.apache.lucene.analysis.ja |
Analyzer for Japanese.
|
org.apache.lucene.analysis.lv |
Analyzer for Latvian.
|
org.apache.lucene.analysis.miscellaneous |
Miscellaneous TokenStreams
|
org.apache.lucene.analysis.ngram |
Character n-gram tokenizers and filters.
|
org.apache.lucene.analysis.nl |
Analyzer for Dutch.
|
org.apache.lucene.analysis.no |
Analyzer for Norwegian.
|
org.apache.lucene.analysis.path |
Analysis components for path-like strings such as filenames.
|
org.apache.lucene.analysis.payloads |
Provides various convenience classes for creating payloads on Tokens.
|
org.apache.lucene.analysis.phonetic |
Analysis components for phonetic search.
|
org.apache.lucene.analysis.position |
Filter for assigning position increments.
|
org.apache.lucene.analysis.pt |
Analyzer for Portuguese.
|
org.apache.lucene.analysis.query |
Automatically filter high-frequency stopwords.
|
org.apache.lucene.analysis.reverse |
Filter to reverse token text.
|
org.apache.lucene.analysis.ru |
Analyzer for Russian.
|
org.apache.lucene.analysis.shingle |
Word n-gram filters
|
org.apache.lucene.analysis.snowball |
TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.analysis.stempel |
Stempel: Algorithmic Stemmer
|
org.apache.lucene.analysis.sv |
Analyzer for Swedish.
|
org.apache.lucene.analysis.synonym |
Analysis components for Synonyms.
|
org.apache.lucene.analysis.th |
Analyzer for Thai.
|
org.apache.lucene.analysis.tr |
Analyzer for Turkish.
|
org.apache.lucene.analysis.wikipedia |
Tokenizer that is aware of Wikipedia syntax.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
org.apache.lucene.facet.enhancements |
Enhanced category features
Mechanisms for addition of enhanced category features.
|
org.apache.lucene.facet.enhancements.association |
Association category enhancements
A
CategoryEnhancement
for adding associations data to the index (categories with
AssociationProperty 's). |
org.apache.lucene.facet.index |
Indexing of document categories
Attachment of
CategoryPath 's
or CategoryAttribute 's
to a given document using a
Taxonomy . |
org.apache.lucene.facet.index.streaming |
Expert: attributes streaming definition for indexing facets
Steaming of facets attributes is a low level indexing interface with Lucene indexing.
|
org.apache.lucene.index.memory |
High-performance single-document main memory Apache Lucene fulltext search index.
|
org.apache.lucene.queryParser |
A simple query parser implemented with JavaCC.
|
org.apache.lucene.search.highlight |
The highlight package contains classes to provide "keyword in context" features
typically used to highlight search terms in the text of results pages.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CannedTokenStream
TokenStream from a canned list of Tokens.
|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
EmptyTokenizer
Emits no tokens
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1.
This class is included for use with existing
indexes and will be removed in a future release (possibly Lucene 4.0). |
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>
An abstract TokenFilter to make it easier to build graph
token filters requiring some lookahead.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.
|
class |
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with
posLength sometimes > 1.
|
class |
MockHoleInjectingTokenFilter |
class |
MockRandomLookaheadTokenFilter
Uses
LookaheadTokenFilter to randomly peek at future tokens. |
class |
MockTokenizer
Tokenizer for testing.
|
class |
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.
|
class |
NumericTokenStream
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg
offsets are consistent with one another).
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter.
|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.sink |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.getTokenStream()
Returns the sink
TokenStream |
TokenStream |
MockAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
ReusableAnalyzerBase.reusableTokenStream(String fieldName,
Reader reader)
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents . |
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
LimitTokenCountAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
Analyzer.reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
TokenStream |
MockAnalyzer.tokenStream(String fieldName,
Reader reader) |
TokenStream |
ReusableAnalyzerBase.tokenStream(String fieldName,
Reader reader)
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents and returns the sink of the
components. |
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
TokenStream |
LimitTokenCountAnalyzer.tokenStream(String fieldName,
Reader reader) |
abstract TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
Modifier and Type | Method and Description |
---|---|
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] posIncrements) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements,
int[] posLengths,
Integer finalOffset) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
int[] posIncrements,
Integer finalOffset) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
Integer finalOffset) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
int[] posLengths,
Integer finalOffset) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
int[] posLengths,
Integer finalOffset,
boolean offsetsAreCorrect) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
int[] startOffsets,
int[] endOffsets,
String[] types,
int[] posIncrements,
Integer finalOffset) |
static void |
BaseTokenStreamTestCase.assertTokenStreamContents(TokenStream ts,
String[] output,
String[] types) |
Constructor and Description |
---|
ASCIIFoldingFilter(TokenStream input) |
CachingTokenFilter(TokenStream input) |
FilteringTokenFilter(boolean enablePositionIncrements,
TokenStream input) |
ISOLatin1AccentFilter(TokenStream input)
Deprecated.
|
KeywordMarkerFilter(TokenStream in,
CharArraySet keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
KeywordMarkerFilter(TokenStream in,
Set<?> keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
LengthFilter(boolean enablePositionIncrements,
TokenStream in,
int min,
int max)
Build a filter that removes words that are too long or too
short from the text.
|
LengthFilter(TokenStream in,
int min,
int max)
Deprecated.
|
LimitTokenCountFilter(TokenStream in,
int maxTokenCount)
Build a filter that only accepts tokens up to a maximum number.
|
LookaheadTokenFilter(TokenStream input) |
LowerCaseFilter(TokenStream in)
Deprecated.
|
LowerCaseFilter(Version matchVersion,
TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
MockFixedLengthPayloadFilter(Random random,
TokenStream in,
int length) |
MockGraphTokenFilter(Random random,
TokenStream input) |
MockHoleInjectingTokenFilter(Random random,
TokenStream in) |
MockRandomLookaheadTokenFilter(Random random,
TokenStream in) |
MockVariableLengthPayloadFilter(Random random,
TokenStream in) |
PorterStemFilter(TokenStream in) |
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source,
TokenStream result)
Creates a new
ReusableAnalyzerBase.TokenStreamComponents instance. |
StopFilter(boolean enablePositionIncrements,
TokenStream in,
Set<?> stopWords)
Deprecated.
use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
StopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase)
Deprecated.
Use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
StopFilter(Version matchVersion,
TokenStream in,
Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
StopFilter(Version matchVersion,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase)
Deprecated.
Use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
TeeSinkTokenFilter(TokenStream input)
Instantiates a new TeeSinkTokenFilter.
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
TokenStreamToDot(String inputText,
TokenStream in,
PrintWriter out)
If inputText is non-null, and the TokenStream has
offsets, we include the surface form in each arc's
label.
|
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes) |
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes,
boolean useWhiteList) |
ValidatingTokenFilter(TokenStream in,
String name,
boolean offsetsAreCorrect)
The name arg is used to identify this stage when
throwing exceptions (useful if you have more than one
instance in your chain).
|
Modifier and Type | Class and Description |
---|---|
class |
ArabicLetterTokenizer
Deprecated.
(3.1) Use
StandardTokenizer instead. |
class |
ArabicNormalizationFilter
A
TokenFilter that applies ArabicNormalizer to normalize the orthography. |
class |
ArabicStemFilter
A
TokenFilter that applies ArabicStemmer to stem Arabic words.. |
Constructor and Description |
---|
ArabicNormalizationFilter(TokenStream input) |
ArabicStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BulgarianStemFilter
A
TokenFilter that applies BulgarianStemmer to stem Bulgarian
words. |
Constructor and Description |
---|
BulgarianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BrazilianStemFilter
A
TokenFilter that applies BrazilianStemmer . |
Constructor and Description |
---|
BrazilianStemFilter(TokenStream in)
Creates a new BrazilianStemFilter
|
BrazilianStemFilter(TokenStream in,
Set<?> exclusiontable)
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer
or ICUTokenizer.
|
class |
CJKTokenizer
Deprecated.
Use StandardTokenizer, CJKWidthFilter, CJKBigramFilter, and LowerCaseFilter instead.
|
class |
CJKWidthFilter
A
TokenFilter that normalizes CJK width differences:
Folds fullwidth ASCII variants into the equivalent basic latin
Folds halfwidth Katakana variants into the equivalent kana
NOTE: this filter can be viewed as a (practical) subset of NFKC/NFKD
Unicode normalization. |
Constructor and Description |
---|
CJKBigramFilter(TokenStream in)
|
CJKBigramFilter(TokenStream in,
int flags)
Create a new CJKBigramFilter, specifying which writing systems should be bigrammed.
|
CJKWidthFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ChineseFilter
Deprecated.
Use
StopFilter instead, which has the same functionality.
This filter will be removed in Lucene 5.0 |
class |
ChineseTokenizer
Deprecated.
Use
StandardTokenizer instead, which has the same functionality.
This filter will be removed in Lucene 5.0 |
Constructor and Description |
---|
ChineseFilter(TokenStream in)
Deprecated.
|
Modifier and Type | Class and Description |
---|---|
class |
SentenceTokenizer
Tokenizes input text into sentences.
|
class |
WordTokenFilter
A
TokenFilter that breaks sentences into words. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
SmartChineseAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
SmartChineseAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
WordTokenFilter(TokenStream in)
Construct a new WordTokenizer.
|
Modifier and Type | Class and Description |
---|---|
class |
CompoundWordTokenFilterBase
Base class for decomposition token filters.
|
class |
DictionaryCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
class |
HyphenationCompoundWordTokenFilter
A
TokenFilter that decomposes compound words found in many Germanic languages. |
Constructor and Description |
---|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary)
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary,
boolean onlyLongestMatch)
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary)
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
boolean onlyLongestMatch)
Deprecated.
|
CompoundWordTokenFilterBase(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary) |
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary) |
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary,
boolean onlyLongestMatch) |
CompoundWordTokenFilterBase(Version matchVersion,
TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch) |
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary)
Deprecated.
|
DictionaryCompoundWordTokenFilter(TokenStream input,
Set dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary)
|
DictionaryCompoundWordTokenFilter(TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set<?> dictionary)
Creates a new
DictionaryCompoundWordTokenFilter |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Creates a new
DictionaryCompoundWordTokenFilter |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary)
Deprecated.
Use the constructors taking
Set |
DictionaryCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Deprecated.
Use the constructors taking
Set |
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary)
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary)
|
HyphenationCompoundWordTokenFilter(TokenStream input,
HyphenationTree hyphenator,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator)
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
int minWordSize,
int minSubwordSize,
int maxSubwordSize)
Create a HyphenationCompoundWordTokenFilter with no dictionary.
|
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary)
Creates a new
HyphenationCompoundWordTokenFilter instance. |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
Set<?> dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Creates a new
HyphenationCompoundWordTokenFilter instance. |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
String[] dictionary)
Deprecated.
Use the constructors taking
Set |
HyphenationCompoundWordTokenFilter(Version matchVersion,
TokenStream input,
HyphenationTree hyphenator,
String[] dictionary,
int minWordSize,
int minSubwordSize,
int maxSubwordSize,
boolean onlyLongestMatch)
Deprecated.
Use the constructors taking
Set |
Modifier and Type | Class and Description |
---|---|
class |
CzechStemFilter
A
TokenFilter that applies CzechStemmer to stem Czech words. |
Constructor and Description |
---|
CzechStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
GermanLightStemFilter
A
TokenFilter that applies GermanLightStemmer to stem German
words. |
class |
GermanMinimalStemFilter
A
TokenFilter that applies GermanMinimalStemmer to stem German
words. |
class |
GermanNormalizationFilter
Normalizes German characters according to the heuristics
of the
German2 snowball algorithm.
|
class |
GermanStemFilter
A
TokenFilter that stems German words. |
Constructor and Description |
---|
GermanLightStemFilter(TokenStream input) |
GermanMinimalStemFilter(TokenStream input) |
GermanNormalizationFilter(TokenStream input) |
GermanStemFilter(TokenStream in)
Creates a
GermanStemFilter instance |
GermanStemFilter(TokenStream in,
Set<?> exclusionSet)
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics,
and standardizes final sigma to sigma.
|
class |
GreekStemFilter
A
TokenFilter that applies GreekStemmer to stem Greek
words. |
Constructor and Description |
---|
GreekLowerCaseFilter(TokenStream in)
Deprecated.
|
GreekLowerCaseFilter(Version matchVersion,
TokenStream in)
Create a GreekLowerCaseFilter that normalizes Greek token text.
|
GreekStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
EnglishMinimalStemFilter
A
TokenFilter that applies EnglishMinimalStemmer to stem
English words. |
class |
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.
|
class |
KStemFilter
A high-performance kstem filter for english.
|
Constructor and Description |
---|
EnglishMinimalStemFilter(TokenStream input) |
EnglishPossessiveFilter(TokenStream input)
Deprecated.
|
EnglishPossessiveFilter(Version version,
TokenStream input) |
KStemFilter(TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
SpanishLightStemFilter
A
TokenFilter that applies SpanishLightStemmer to stem Spanish
words. |
Constructor and Description |
---|
SpanishLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
PersianNormalizationFilter
A
TokenFilter that applies PersianNormalizer to normalize the
orthography. |
Constructor and Description |
---|
PersianNormalizationFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
FinnishLightStemFilter
A
TokenFilter that applies FinnishLightStemmer to stem Finnish
words. |
Constructor and Description |
---|
FinnishLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ElisionFilter
Removes elisions from a
TokenStream . |
class |
FrenchLightStemFilter
A
TokenFilter that applies FrenchLightStemmer to stem French
words. |
class |
FrenchMinimalStemFilter
A
TokenFilter that applies FrenchMinimalStemmer to stem French
words. |
class |
FrenchStemFilter
Deprecated.
Use
SnowballFilter with
FrenchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Constructor and Description |
---|
ElisionFilter(TokenStream input)
Deprecated.
|
ElisionFilter(TokenStream input,
Set<?> articles)
Deprecated.
|
ElisionFilter(TokenStream input,
String[] articles)
Deprecated.
|
ElisionFilter(Version matchVersion,
TokenStream input)
Constructs an elision filter with standard stop words
|
ElisionFilter(Version matchVersion,
TokenStream input,
Set<?> articles)
Constructs an elision filter with a Set of stop words
|
FrenchLightStemFilter(TokenStream input) |
FrenchMinimalStemFilter(TokenStream input) |
FrenchStemFilter(TokenStream in)
Deprecated.
|
FrenchStemFilter(TokenStream in,
Set<?> exclusiontable)
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis
and n-eclipsis (i.e., that 'nAthair' should become 'n-athair')
|
Constructor and Description |
---|
IrishLowerCaseFilter(TokenStream in)
Create an IrishLowerCaseFilter that normalises Irish token text.
|
Modifier and Type | Class and Description |
---|---|
class |
GalicianMinimalStemFilter
A
TokenFilter that applies GalicianMinimalStemmer to stem
Galician words. |
class |
GalicianStemFilter
A
TokenFilter that applies GalicianStemmer to stem
Galician words. |
Constructor and Description |
---|
GalicianMinimalStemFilter(TokenStream input) |
GalicianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HindiNormalizationFilter
A
TokenFilter that applies HindiNormalizer to normalize the
orthography. |
class |
HindiStemFilter
A
TokenFilter that applies HindiStemmer to stem Hindi words. |
Constructor and Description |
---|
HindiNormalizationFilter(TokenStream input) |
HindiStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HungarianLightStemFilter
A
TokenFilter that applies HungarianLightStemmer to stem
Hungarian words. |
Constructor and Description |
---|
HungarianLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens.
|
Constructor and Description |
---|
HunspellStemFilter(TokenStream input,
HunspellDictionary dictionary)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided
HunspellDictionary
|
HunspellStemFilter(TokenStream input,
HunspellDictionary dictionary,
boolean dedup)
Creates a new HunspellStemFilter that will stem tokens from the given TokenStream using affix rules in the provided
HunspellDictionary
|
Modifier and Type | Class and Description |
---|---|
class |
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text,
applying foldings from UTR#30 Character Foldings.
|
class |
ICUNormalizer2Filter
Normalize token text with ICU's
Normalizer2
With this filter, you can normalize text in the following ways:
NFKC Normalization, Case Folding, and removing Ignorables (the default)
Using a standard Normalization mode (NFC, NFD, NFKC, NFKD)
Based on rules from a custom normalization mapping. |
class |
ICUTransformFilter
A
TokenFilter that transforms text with ICU. |
Constructor and Description |
---|
ICUFoldingFilter(TokenStream input)
Create a new ICUFoldingFilter on the specified input
|
ICUNormalizer2Filter(TokenStream input)
Create a new Normalizer2Filter that combines NFKC normalization, Case
Folding, and removes Default Ignorables (NFKC_Casefold)
|
ICUNormalizer2Filter(TokenStream input,
com.ibm.icu.text.Normalizer2 normalizer)
Create a new Normalizer2Filter with the specified Normalizer2
|
ICUTransformFilter(TokenStream input,
com.ibm.icu.text.Transliterator transform)
Create a new ICUTransformFilter that transforms text on the given stream.
|
Modifier and Type | Class and Description |
---|---|
class |
ICUTokenizer
Breaks text into words according to UAX #29: Unicode Text Segmentation
(http://www.unicode.org/reports/tr29/)
Words are broken across script boundaries, then segmented according to
the BreakIterator and typing provided by the
ICUTokenizerConfig
|
Modifier and Type | Class and Description |
---|---|
class |
IndonesianStemFilter
A
TokenFilter that applies IndonesianStemmer to stem Indonesian words. |
Constructor and Description |
---|
IndonesianStemFilter(TokenStream input)
|
IndonesianStemFilter(TokenStream input,
boolean stemDerivational)
Create a new IndonesianStemFilter.
|
Modifier and Type | Class and Description |
---|---|
class |
IndicNormalizationFilter
A
TokenFilter that applies IndicNormalizer to normalize text
in Indian Languages. |
class |
IndicTokenizer
Deprecated.
(3.6) Use
StandardTokenizer instead. |
Constructor and Description |
---|
IndicNormalizationFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
ItalianLightStemFilter
A
TokenFilter that applies ItalianLightStemmer to stem Italian
words. |
Constructor and Description |
---|
ItalianLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
JapaneseBaseFormFilter
Replaces term text with the
BaseFormAttribute . |
class |
JapaneseKatakanaStemFilter
A
TokenFilter that normalizes common katakana spelling variations
ending in a long sound character by removing this character (U+30FC). |
class |
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.
|
class |
JapaneseReadingFormFilter
A
TokenFilter that replaces the term
attribute with the reading of a token in either katakana or romaji form. |
class |
JapaneseTokenizer
Tokenizer for Japanese that uses morphological analysis.
|
Constructor and Description |
---|
JapaneseBaseFormFilter(TokenStream input) |
JapaneseKatakanaStemFilter(TokenStream input) |
JapaneseKatakanaStemFilter(TokenStream input,
int minimumLength) |
JapanesePartOfSpeechStopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTags) |
JapaneseReadingFormFilter(TokenStream input) |
JapaneseReadingFormFilter(TokenStream input,
boolean useRomaji) |
Modifier and Type | Class and Description |
---|---|
class |
LatvianStemFilter
A
TokenFilter that applies LatvianStemmer to stem Latvian
words. |
Constructor and Description |
---|
LatvianStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
EmptyTokenStream
An always exhausted token stream.
|
class |
PrefixAndSuffixAwareTokenFilter
Links two
PrefixAwareTokenFilter . |
class |
PrefixAwareTokenFilter
Joins two token streams and leaves the last token of the first stream available
to be used when updating the token values in the second stream based on that token.
|
class |
SingleTokenTokenStream
A
TokenStream containing a single token. |
class |
StemmerOverrideFilter
Provides the ability to override any
KeywordAttribute aware stemmer
with custom dictionary-based stemming. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
PrefixAwareTokenFilter.getPrefix() |
TokenStream |
PrefixAwareTokenFilter.getSuffix() |
Modifier and Type | Method and Description |
---|---|
void |
PrefixAwareTokenFilter.setPrefix(TokenStream prefix) |
void |
PrefixAwareTokenFilter.setSuffix(TokenStream suffix) |
Constructor and Description |
---|
PrefixAndSuffixAwareTokenFilter(TokenStream prefix,
TokenStream input,
TokenStream suffix) |
PrefixAwareTokenFilter(TokenStream prefix,
TokenStream suffix) |
StemmerOverrideFilter(Version matchVersion,
TokenStream input,
Map<?,String> dictionary)
Create a new StemmerOverrideFilter, performing dictionary-based stemming
with the provided
dictionary . |
Modifier and Type | Class and Description |
---|---|
class |
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).
|
class |
EdgeNGramTokenizer
Tokenizes the input from an edge into n-grams of given size(s).
|
class |
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s).
|
class |
NGramTokenizer
Tokenizes the input into n-grams of the given size(s).
|
Constructor and Description |
---|
EdgeNGramTokenFilter(TokenStream input,
EdgeNGramTokenFilter.Side side,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
EdgeNGramTokenFilter(TokenStream input,
String sideLabel,
int minGram,
int maxGram)
Creates EdgeNGramTokenFilter that can generate n-grams in the sizes of the given range
|
NGramTokenFilter(TokenStream input)
Creates NGramTokenFilter with default min and max n-grams.
|
NGramTokenFilter(TokenStream input,
int minGram,
int maxGram)
Creates NGramTokenFilter with given min and max n-grams.
|
Modifier and Type | Class and Description |
---|---|
class |
DutchStemFilter
Deprecated.
Use
SnowballFilter with
DutchStemmer instead, which has the
same functionality. This filter will be removed in Lucene 5.0 |
Constructor and Description |
---|
DutchStemFilter(TokenStream _in)
Deprecated.
|
DutchStemFilter(TokenStream _in,
Map<?,?> stemdictionary)
Deprecated.
|
DutchStemFilter(TokenStream _in,
Set<?> exclusiontable)
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
DutchStemFilter(TokenStream _in,
Set<?> exclusiontable,
Map<?,?> stemdictionary)
Deprecated.
use
KeywordAttribute with KeywordMarkerFilter instead. |
Modifier and Type | Class and Description |
---|---|
class |
NorwegianLightStemFilter
A
TokenFilter that applies NorwegianLightStemmer to stem Norwegian
words. |
class |
NorwegianMinimalStemFilter
A
TokenFilter that applies NorwegianMinimalStemmer to stem Norwegian
words. |
Constructor and Description |
---|
NorwegianLightStemFilter(TokenStream input) |
NorwegianMinimalStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
PathHierarchyTokenizer
Tokenizer for path-like hierarchies.
|
class |
ReversePathHierarchyTokenizer
Tokenizer for domain-like hierarchies.
|
Modifier and Type | Class and Description |
---|---|
class |
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.
|
class |
NumericPayloadTokenFilter
Assigns a payload to a token based on the
Token.type() |
class |
TokenOffsetPayloadTokenFilter
Adds the
Token.setStartOffset(int)
and Token.setEndOffset(int)
First 4 bytes are the start |
class |
TypeAsPayloadTokenFilter
Makes the
Token.type() a payload. |
Constructor and Description |
---|
DelimitedPayloadTokenFilter(TokenStream input,
char delimiter,
PayloadEncoder encoder) |
NumericPayloadTokenFilter(TokenStream input,
float payload,
String typeMatch) |
TokenOffsetPayloadTokenFilter(TokenStream input) |
TypeAsPayloadTokenFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.
|
class |
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)
|
class |
PhoneticFilter
Create tokens for phonetic matches.
|
Constructor and Description |
---|
BeiderMorseFilter(TokenStream input,
org.apache.commons.codec.language.bm.PhoneticEngine engine)
|
BeiderMorseFilter(TokenStream input,
org.apache.commons.codec.language.bm.PhoneticEngine engine,
org.apache.commons.codec.language.bm.Languages.LanguageSet languages)
Create a new BeiderMorseFilter
|
DoubleMetaphoneFilter(TokenStream input,
int maxCodeLength,
boolean inject) |
PhoneticFilter(TokenStream in,
org.apache.commons.codec.Encoder encoder,
boolean inject) |
Modifier and Type | Class and Description |
---|---|
class |
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement",
except the first return token which retains its original positionIncrement value.
|
Constructor and Description |
---|
PositionFilter(TokenStream input)
Constructs a PositionFilter that assigns a position increment of zero to
all but the first token from the given input stream.
|
PositionFilter(TokenStream input,
int positionIncrement)
Constructs a PositionFilter that assigns the given position increment to
all but the first token from the given input stream.
|
Modifier and Type | Class and Description |
---|---|
class |
PortugueseLightStemFilter
A
TokenFilter that applies PortugueseLightStemmer to stem
Portuguese words. |
class |
PortugueseMinimalStemFilter
A
TokenFilter that applies PortugueseMinimalStemmer to stem
Portuguese words. |
class |
PortugueseStemFilter
A
TokenFilter that applies PortugueseStemmer to stem
Portuguese words. |
Constructor and Description |
---|
PortugueseLightStemFilter(TokenStream input) |
PortugueseMinimalStemFilter(TokenStream input) |
PortugueseStemFilter(TokenStream input) |
Modifier and Type | Method and Description |
---|---|
TokenStream |
QueryAutoStopWordAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
QueryAutoStopWordAnalyzer.tokenStream(String fieldName,
Reader reader) |
Modifier and Type | Class and Description |
---|---|
class |
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc".
|
Constructor and Description |
---|
ReverseStringFilter(TokenStream in)
Deprecated.
use
ReverseStringFilter.ReverseStringFilter(Version, TokenStream)
instead. This constructor will be removed in Lucene 4.0 |
ReverseStringFilter(TokenStream in,
char marker)
Deprecated.
use
ReverseStringFilter.ReverseStringFilter(Version, TokenStream, char)
instead. This constructor will be removed in Lucene 4.0 |
ReverseStringFilter(Version matchVersion,
TokenStream in)
Create a new ReverseStringFilter that reverses all tokens in the
supplied
TokenStream . |
ReverseStringFilter(Version matchVersion,
TokenStream in,
char marker)
Create a new ReverseStringFilter that reverses and marks all tokens in the
supplied
TokenStream . |
Modifier and Type | Class and Description |
---|---|
class |
RussianLetterTokenizer
Deprecated.
Use
StandardTokenizer instead, which has the same functionality.
This filter will be removed in Lucene 5.0 |
class |
RussianLightStemFilter
A
TokenFilter that applies RussianLightStemmer to stem Russian
words. |
class |
RussianLowerCaseFilter
Deprecated.
Use
LowerCaseFilter instead, which has the same
functionality. This filter will be removed in Lucene 4.0 |
class |
RussianStemFilter
Deprecated.
Use
SnowballFilter with
RussianStemmer instead, which has the
same functionality. This filter will be removed in Lucene 4.0 |
Constructor and Description |
---|
RussianLightStemFilter(TokenStream input) |
RussianLowerCaseFilter(TokenStream in)
Deprecated.
|
RussianStemFilter(TokenStream in)
Deprecated.
|
Modifier and Type | Class and Description |
---|---|
class |
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream.
|
class |
ShingleMatrixFilter
Deprecated.
Will be removed in Lucene 4.0. This filter is unmaintained and might not behave
correctly if used with custom Attributes, i.e. Attributes other than
the ones located in
org.apache.lucene.analysis.tokenattributes . It also uses
hardcoded payload encoders which makes it not easily adaptable to other use-cases. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ShingleAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
ShingleAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
ShingleFilter(TokenStream input)
Construct a ShingleFilter with default shingle size: 2.
|
ShingleFilter(TokenStream input,
int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
int minShingleSize,
int maxShingleSize)
Constructs a ShingleFilter with the specified shingle size from the
TokenStream input |
ShingleFilter(TokenStream input,
String tokenType)
Construct a ShingleFilter with the specified token type for shingle tokens
and the default shingle size: 2
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize)
Deprecated.
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter)
Deprecated.
Creates a shingle filter using default settings.
|
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle)
Deprecated.
Creates a shingle filter using the default
ShingleMatrixFilter.TokenSettingsCodec . |
ShingleMatrixFilter(TokenStream input,
int minimumShingleSize,
int maximumShingleSize,
Character spacerCharacter,
boolean ignoringSinglePrefixOrSuffixShingle,
ShingleMatrixFilter.TokenSettingsCodec settingsCodec)
Deprecated.
Creates a shingle filter with ad hoc parameter settings.
|
Modifier and Type | Class and Description |
---|---|
class |
SnowballFilter
A filter that stems words using a Snowball-generated stemmer.
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
SnowballAnalyzer.reusableTokenStream(String fieldName,
Reader reader)
Deprecated.
Returns a (possibly reused)
StandardTokenizer filtered by a
StandardFilter , a LowerCaseFilter ,
a StopFilter , and a SnowballFilter |
TokenStream |
SnowballAnalyzer.tokenStream(String fieldName,
Reader reader)
Deprecated.
Constructs a
StandardTokenizer filtered by a StandardFilter , a LowerCaseFilter , a StopFilter ,
and a SnowballFilter |
Constructor and Description |
---|
SnowballFilter(TokenStream input,
SnowballProgram stemmer) |
SnowballFilter(TokenStream in,
String name)
Construct the named stemming filter.
|
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer
A grammar-based tokenizer constructed with JFlex
This should be a good tokenizer for most European-language documents:
Splits words at punctuation characters, removing punctuation.
|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor and Description |
---|
ClassicFilter(TokenStream in)
Construct filtering in.
|
StandardFilter(TokenStream in)
Deprecated.
Use
StandardFilter.StandardFilter(Version, TokenStream) instead. |
StandardFilter(Version matchVersion,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
StempelFilter
Transforms the token stream as per the stemming algorithm.
|
Constructor and Description |
---|
StempelFilter(TokenStream in,
StempelStemmer stemmer)
Create filter using the supplied stemming table.
|
StempelFilter(TokenStream in,
StempelStemmer stemmer,
int minLength)
Create filter using the supplied stemming table.
|
Modifier and Type | Class and Description |
---|---|
class |
SwedishLightStemFilter
A
TokenFilter that applies SwedishLightStemmer to stem Swedish
words. |
Constructor and Description |
---|
SwedishLightStemFilter(TokenStream input) |
Modifier and Type | Class and Description |
---|---|
class |
SynonymFilter
Matches single or multi word synonyms in a token stream.
|
Constructor and Description |
---|
SynonymFilter(TokenStream input,
SynonymMap synonyms,
boolean ignoreCase) |
Modifier and Type | Class and Description |
---|---|
class |
ThaiWordFilter
TokenFilter that use BreakIterator to break each
Token that is Thai into separate Token(s) for each Thai word. |
Constructor and Description |
---|
ThaiWordFilter(TokenStream input)
Deprecated.
Use the ctor with
matchVersion instead! |
ThaiWordFilter(Version matchVersion,
TokenStream input)
Creates a new ThaiWordFilter with the specified match version.
|
Modifier and Type | Class and Description |
---|---|
class |
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case.
|
Constructor and Description |
---|
TurkishLowerCaseFilter(TokenStream in)
Create a new TurkishLowerCaseFilter, that normalizes Turkish token text
to lower case.
|
Modifier and Type | Class and Description |
---|---|
class |
WikipediaTokenizer
Extension of StandardTokenizer that is aware of Wikipedia syntax.
|
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
class |
ICUCollationKeyFilter
Converts each token into its
CollationKey , and
then encodes the CollationKey with IndexableBinaryStringTools , to
allow it to be stored as an index term. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
ICUCollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
CollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
ICUCollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
TokenStream |
CollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
CollationKeyFilter(TokenStream input,
Collator collator) |
ICUCollationKeyFilter(TokenStream input,
com.ibm.icu.text.Collator collator) |
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
AbstractField.tokenStream |
Modifier and Type | Method and Description |
---|---|
TokenStream |
NumericField.tokenStreamValue()
Returns a
NumericTokenStream for indexing the numeric value. |
TokenStream |
Fieldable.tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.
|
TokenStream |
Field.tokenStreamValue()
The TokesStream for this field to be used when indexing, or null.
|
Modifier and Type | Method and Description |
---|---|
void |
Field.setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.
|
Constructor and Description |
---|
Field(String name,
TokenStream tokenStream)
Create a tokenized and indexed field that is not stored.
|
Field(String name,
TokenStream tokenStream,
Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with
storing term vectors.
|
Modifier and Type | Class and Description |
---|---|
class |
EnhancementsCategoryTokenizer
A tokenizer which adds to each category token payload according to the
CategoryEnhancement s defined in the given
EnhancementsIndexingParams . |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
EnhancementsDocumentBuilder.getParentsStream(CategoryAttributesStream categoryAttributesStream) |
Modifier and Type | Method and Description |
---|---|
protected CategoryListTokenizer |
EnhancementsDocumentBuilder.getCategoryListTokenizer(TokenStream categoryStream) |
CategoryListTokenizer |
CategoryEnhancement.getCategoryListTokenizer(TokenStream tokenizer,
EnhancementsIndexingParams indexingParams,
TaxonomyWriter taxonomyWriter)
Get the
CategoryListTokenizer which generates the category list for
this enhancement. |
protected CategoryTokenizer |
EnhancementsDocumentBuilder.getCategoryTokenizer(TokenStream categoryStream) |
Constructor and Description |
---|
EnhancementsCategoryTokenizer(TokenStream input,
EnhancementsIndexingParams indexingParams)
Constructor.
|
Modifier and Type | Class and Description |
---|---|
class |
AssociationListTokenizer
Tokenizer for associations of a category
|
Modifier and Type | Method and Description |
---|---|
CategoryListTokenizer |
AssociationEnhancement.getCategoryListTokenizer(TokenStream tokenizer,
EnhancementsIndexingParams indexingParams,
TaxonomyWriter taxonomyWriter) |
Constructor and Description |
---|
AssociationListTokenizer(TokenStream input,
EnhancementsIndexingParams indexingParams,
CategoryEnhancement enhancement) |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
CategoryDocumentBuilder.getParentsStream(CategoryAttributesStream categoryAttributesStream)
Get a stream of categories which includes the parents, according to
policies defined in indexing parameters.
|
Modifier and Type | Method and Description |
---|---|
protected CategoryListTokenizer |
CategoryDocumentBuilder.getCategoryListTokenizer(TokenStream categoryStream)
Get a category list tokenizer (or a series of such tokenizers) to create
the category list tokens.
|
protected CategoryTokenizer |
CategoryDocumentBuilder.getCategoryTokenizer(TokenStream categoryStream)
Get a
CategoryTokenizer to create the category tokens. |
protected CountingListTokenizer |
CategoryDocumentBuilder.getCountingListTokenizer(TokenStream categoryStream)
Get a
CountingListTokenizer for creating counting list token. |
Modifier and Type | Class and Description |
---|---|
class |
CategoryAttributesStream
An attribute stream built from an
Iterable of
CategoryAttribute . |
class |
CategoryListTokenizer
A base class for category list tokenizers, which add category list tokens to
category streams.
|
class |
CategoryParentsStream
This class adds parents to a
CategoryAttributesStream . |
class |
CategoryTokenizer
Basic class for setting the
CharTermAttribute s and
PayloadAttribute s of category tokens. |
class |
CategoryTokenizerBase
A base class for all token filters which add term and payload attributes to
tokens and are to be used in
CategoryDocumentBuilder . |
class |
CountingListTokenizer
CategoryListTokenizer for facet counting |
Constructor and Description |
---|
CategoryListTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
CategoryTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
CategoryTokenizerBase(TokenStream input,
FacetIndexingParams indexingParams)
Constructor.
|
CountingListTokenizer(TokenStream input,
FacetIndexingParams indexingParams) |
Modifier and Type | Method and Description |
---|---|
<T> TokenStream |
MemoryIndex.keywordTokenStream(Collection<T> keywords)
Convenience method; Creates and returns a token stream that generates a
token for each keyword in the given collection, "as is", without any
transforming text analysis.
|
Modifier and Type | Method and Description |
---|---|
void |
MemoryIndex.addField(String fieldName,
TokenStream stream)
Equivalent to
addField(fieldName, stream, 1.0f) . |
void |
MemoryIndex.addField(String fieldName,
TokenStream stream,
float boost)
Iterates over the given token stream and adds the resulting terms to the index;
Equivalent to adding a tokenized, indexed, termVectorStored, unstored,
Lucene
Field . |
Modifier and Type | Class and Description |
---|---|
static class |
QueryParserTestBase.QPTestFilter
Filter which discards the token 'stop' and which expands the
token 'phrase' into 'phrase1 phrase2'
|
Modifier and Type | Method and Description |
---|---|
TokenStream |
QueryParserTestBase.QPTestAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
QueryParserTestBase.QPTestFilter(TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the
current offset.
|
class |
TokenStreamFromTermPositionVector |
Modifier and Type | Method and Description |
---|---|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer)
A convenience method that tries a number of approaches to getting a token
stream.
|
static TokenStream |
TokenSources.getAnyTokenStream(IndexReader reader,
int docId,
String field,
Document doc,
Analyzer analyzer)
A convenience method that tries to first get a TermPositionVector for the
specified docId, then, falls back to using the passed in
Document to retrieve the TokenStream. |
TokenStream |
WeightedSpanTermExtractor.getTokenStream() |
static TokenStream |
TokenSources.getTokenStream(Document doc,
String field,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field) |
static TokenStream |
TokenSources.getTokenStream(IndexReader reader,
int docId,
String field,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(String field,
String contents,
Analyzer analyzer) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv) |
static TokenStream |
TokenSources.getTokenStream(TermPositionVector tpv,
boolean tokenPositionsGuaranteedContiguous)
Low level api.
|
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
Modifier and Type | Method and Description |
---|---|
String |
Highlighter.getBestFragment(TokenStream tokenStream,
String text)
Highlights chosen terms in a text, extracting the most relevant section.
|
String[] |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments)
Highlights chosen terms in a text, extracting the most relevant sections.
|
String |
Highlighter.getBestFragments(TokenStream tokenStream,
String text,
int maxNumFragments,
String separator)
Highlights terms in the text , extracting the most relevant sections
and concatenating the chosen fragments with a separator (typically "...").
|
TextFragment[] |
Highlighter.getBestTextFragments(TokenStream tokenStream,
String text,
boolean mergeContiguousFragments,
int maxNumFragments)
Low level api to get the most relevant (formatted) sections of the document.
|
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTerms(Query query,
TokenStream tokenStream,
String fieldName)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
Map<String,WeightedSpanTerm> |
WeightedSpanTermExtractor.getWeightedSpanTermsWithScores(Query query,
TokenStream tokenStream,
String fieldName,
IndexReader reader)
Creates a Map of
WeightedSpanTerms from the given Query and TokenStream . |
TokenStream |
Scorer.init(TokenStream tokenStream)
Called to init the Scorer with a
TokenStream . |
TokenStream |
QueryTermScorer.init(TokenStream tokenStream) |
TokenStream |
QueryScorer.init(TokenStream tokenStream) |
void |
SimpleSpanFragmenter.start(String originalText,
TokenStream tokenStream) |
void |
SimpleFragmenter.start(String originalText,
TokenStream stream) |
void |
NullFragmenter.start(String s,
TokenStream tokenStream) |
void |
Fragmenter.start(String originalText,
TokenStream tokenStream)
Initializes the Fragmenter.
|
Constructor and Description |
---|
OffsetLimitTokenFilter(TokenStream input,
int offsetLimit) |
TokenGroup(TokenStream tokenStream) |