Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.index |
Code to maintain and access indices.
|
org.apache.lucene.util |
Some utility classes.
|
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1.
This class is included for use with existing
indexes and will be removed in a future release (possibly Lucene 4.0). |
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TokenStream
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Method and Description |
---|---|
abstract boolean |
TeeSinkTokenFilter.SinkFilter.accept(AttributeSource source)
Returns true, iff the current state of the passed-in
AttributeSource shall be stored
in the sink. |
Constructor and Description |
---|
CharTokenizer(AttributeSource source,
Reader input)
Deprecated.
use
CharTokenizer.CharTokenizer(Version, AttributeSource, Reader) instead. This will be
removed in Lucene 4.0. |
CharTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
Creates a new
CharTokenizer instance |
KeywordTokenizer(AttributeSource source,
Reader input,
int bufferSize) |
LetterTokenizer(AttributeSource source,
Reader in)
Deprecated.
use
LetterTokenizer.LetterTokenizer(Version, AttributeSource, Reader) instead.
This will be removed in Lucene 4.0. |
LetterTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Construct a new LetterTokenizer using a given
AttributeSource . |
LowerCaseTokenizer(AttributeSource source,
Reader in)
Deprecated.
use
LowerCaseTokenizer.LowerCaseTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0. |
LowerCaseTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Construct a new LowerCaseTokenizer using a given
AttributeSource . |
NumericTokenStream(AttributeSource source,
int precisionStep)
Expert: Creates a token stream for numeric values with the specified
precisionStep using the given AttributeSource . |
Tokenizer(AttributeSource source)
Deprecated.
use
Tokenizer.Tokenizer(AttributeSource, Reader) instead. |
Tokenizer(AttributeSource source,
Reader input)
Construct a token stream processing the given input using the given AttributeSource.
|
TokenStream(AttributeSource input)
A TokenStream that uses the same attributes as the supplied one.
|
WhitespaceTokenizer(AttributeSource source,
Reader in)
Deprecated.
use
WhitespaceTokenizer.WhitespaceTokenizer(Version, AttributeSource, Reader)
instead. This will be removed in Lucene 4.0. |
WhitespaceTokenizer(Version matchVersion,
AttributeSource source,
Reader in)
Construct a new WhitespaceTokenizer using a given
AttributeSource . |
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer
A grammar-based tokenizer constructed with JFlex
|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor and Description |
---|
ClassicTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
Creates a new ClassicTokenizer with a given
AttributeSource . |
StandardTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
Creates a new StandardTokenizer with a given
AttributeSource . |
UAX29URLEmailTokenizer(AttributeSource source,
Reader input)
Deprecated.
|
UAX29URLEmailTokenizer(Version matchVersion,
AttributeSource source,
Reader input)
Creates a new UAX29URLEmailTokenizer with a given
AttributeSource . |
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
Modifier and Type | Method and Description |
---|---|
AttributeSource |
FieldInvertState.getAttributeSource() |
Modifier and Type | Method and Description |
---|---|
AttributeSource |
AttributeSource.cloneAttributes()
Performs a clone of all
AttributeImpl instances returned in a new
AttributeSource instance. |
Modifier and Type | Method and Description |
---|---|
void |
AttributeSource.copyTo(AttributeSource target)
Copies the contents of this
AttributeSource to the given target AttributeSource . |
Constructor and Description |
---|
AttributeSource(AttributeSource input)
An AttributeSource that uses the same attributes as the supplied one.
|