Package | Description |
---|---|
org.apache.lucene.analysis.standard |
Fast, general-purpose grammar-based tokenizer
StandardTokenizer
implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in
Unicode Standard Annex #29. |
org.apache.lucene.analysis.tokenattributes |
General-purpose attributes for text analysis.
|
Class and Description |
---|
CharTermAttribute
The term text of a Token.
|
Class and Description |
---|
BytesTermAttribute
This attribute can be used if you have the raw term bytes to be indexed.
|
CharTermAttribute
The term text of a Token.
|
CharTermAttributeImpl
Default implementation of
CharTermAttribute . |
FlagsAttribute
This attribute can be used to pass different flags down the
Tokenizer chain,
e.g. |
KeywordAttribute
This attribute can be used to mark a token as a keyword.
|
OffsetAttribute
The start and end character offset of a Token.
|
PackedTokenAttributeImpl
Default implementation of the common attributes used by Lucene:
CharTermAttribute
TypeAttribute
PositionIncrementAttribute
PositionLengthAttribute
OffsetAttribute
TermFrequencyAttribute
|
PayloadAttribute
The payload of a Token.
|
PayloadAttributeImpl
Default implementation of
PayloadAttribute . |
PositionIncrementAttribute
Determines the position of this token
relative to the previous Token in a TokenStream, used in phrase
searching.
|
PositionLengthAttribute
Determines how many positions this
token spans.
|
TermFrequencyAttribute
Sets the custom term frequency of a term within one document.
|
TermToBytesRefAttribute
This attribute is requested by TermsHashPerField to index the contents.
|
TypeAttribute
A Token's lexical type.
|
Copyright © 2000-2017 Apache Software Foundation. All Rights Reserved.