Package | Description |
---|---|
org.apache.lucene.analysis |
API and code to convert text into indexable/searchable tokens.
|
org.apache.lucene.analysis.standard |
Standards-based analyzers implemented with JFlex.
|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
org.apache.lucene.document |
The logical representation of a
Document for indexing and searching. |
Modifier and Type | Class and Description |
---|---|
class |
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters
which are not in the first 127 ASCII characters (the "Basic Latin" Unicode
block) into their ASCII equivalents, if one exists.
|
class |
CachingTokenFilter
This class can be used if the token attributes of a TokenStream
are intended to be consumed more than once.
|
class |
CharTokenizer
An abstract base class for simple, character-oriented tokenizers.
|
class |
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.
|
class |
ISOLatin1AccentFilter
Deprecated.
If you build a new index, use
ASCIIFoldingFilter
which covers a superset of Latin 1.
This class is included for use with existing
indexes and will be removed in a future release (possibly Lucene 4.0). |
class |
KeywordMarkerFilter
Marks terms as keywords via the
KeywordAttribute . |
class |
KeywordTokenizer
Emits the entire input as a single token.
|
class |
LengthFilter
Removes words that are too long or too short from the stream.
|
class |
LetterTokenizer
A LetterTokenizer is a tokenizer that divides text at non-letters.
|
class |
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.
|
class |
LowerCaseFilter
Normalizes token text to lower case.
|
class |
LowerCaseTokenizer
LowerCaseTokenizer performs the function of LetterTokenizer
and LowerCaseFilter together.
|
class |
NumericTokenStream
Expert: This class provides a
TokenStream
for indexing numeric values that can be used by NumericRangeQuery or NumericRangeFilter . |
class |
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.
|
class |
StopFilter
Removes stop words from a token stream.
|
class |
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states
that have already been analyzed.
|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee with optional filtering.
|
class |
TokenFilter
A TokenFilter is a TokenStream whose input is another TokenStream.
|
class |
Tokenizer
A Tokenizer is a TokenStream whose input is a Reader.
|
class |
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.
|
class |
WhitespaceTokenizer
A WhitespaceTokenizer is a tokenizer that divides text at whitespace.
|
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
TokenFilter.input
The source of tokens for this filter.
|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.sink |
Modifier and Type | Method and Description |
---|---|
protected TokenStream |
ReusableAnalyzerBase.TokenStreamComponents.getTokenStream()
Returns the sink
TokenStream |
TokenStream |
ReusableAnalyzerBase.reusableTokenStream(String fieldName,
Reader reader)
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents . |
TokenStream |
PerFieldAnalyzerWrapper.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
LimitTokenCountAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
Analyzer.reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
TokenStream |
ReusableAnalyzerBase.tokenStream(String fieldName,
Reader reader)
This method uses
ReusableAnalyzerBase.createComponents(String, Reader) to obtain an
instance of ReusableAnalyzerBase.TokenStreamComponents and returns the sink of the
components. |
TokenStream |
PerFieldAnalyzerWrapper.tokenStream(String fieldName,
Reader reader) |
TokenStream |
LimitTokenCountAnalyzer.tokenStream(String fieldName,
Reader reader) |
abstract TokenStream |
Analyzer.tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
Constructor and Description |
---|
ASCIIFoldingFilter(TokenStream input) |
CachingTokenFilter(TokenStream input) |
FilteringTokenFilter(boolean enablePositionIncrements,
TokenStream input) |
ISOLatin1AccentFilter(TokenStream input)
Deprecated.
|
KeywordMarkerFilter(TokenStream in,
CharArraySet keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
KeywordMarkerFilter(TokenStream in,
Set<?> keywordSet)
Create a new KeywordMarkerFilter, that marks the current token as a
keyword if the tokens term buffer is contained in the given set via the
KeywordAttribute . |
LengthFilter(boolean enablePositionIncrements,
TokenStream in,
int min,
int max)
Build a filter that removes words that are too long or too
short from the text.
|
LengthFilter(TokenStream in,
int min,
int max)
Deprecated.
|
LimitTokenCountFilter(TokenStream in,
int maxTokenCount)
Build a filter that only accepts tokens up to a maximum number.
|
LowerCaseFilter(TokenStream in)
Deprecated.
|
LowerCaseFilter(Version matchVersion,
TokenStream in)
Create a new LowerCaseFilter, that normalizes token text to lower case.
|
PorterStemFilter(TokenStream in) |
ReusableAnalyzerBase.TokenStreamComponents(Tokenizer source,
TokenStream result)
Creates a new
ReusableAnalyzerBase.TokenStreamComponents instance. |
StopFilter(boolean enablePositionIncrements,
TokenStream in,
Set<?> stopWords)
Deprecated.
use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
StopFilter(boolean enablePositionIncrements,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase)
Deprecated.
Use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
StopFilter(Version matchVersion,
TokenStream in,
Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are
named in the Set.
|
StopFilter(Version matchVersion,
TokenStream input,
Set<?> stopWords,
boolean ignoreCase)
Deprecated.
Use
StopFilter.StopFilter(Version, TokenStream, Set) instead |
TeeSinkTokenFilter(TokenStream input)
Instantiates a new TeeSinkTokenFilter.
|
TokenFilter(TokenStream input)
Construct a token stream filtering the given input.
|
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes) |
TypeTokenFilter(boolean enablePositionIncrements,
TokenStream input,
Set<String> stopTypes,
boolean useWhiteList) |
Modifier and Type | Class and Description |
---|---|
class |
ClassicFilter
Normalizes tokens extracted with
ClassicTokenizer . |
class |
ClassicTokenizer
A grammar-based tokenizer constructed with JFlex
This should be a good tokenizer for most European-language documents:
Splits words at punctuation characters, removing punctuation.
|
class |
StandardFilter
Normalizes tokens extracted with
StandardTokenizer . |
class |
StandardTokenizer
A grammar-based tokenizer constructed with JFlex.
|
class |
UAX29URLEmailTokenizer
This class implements Word Break rules from the Unicode Text Segmentation
algorithm, as specified in
Unicode Standard Annex #29
URLs and email addresses are also tokenized according to the relevant RFCs.
|
Constructor and Description |
---|
ClassicFilter(TokenStream in)
Construct filtering in.
|
StandardFilter(TokenStream in)
Deprecated.
Use
StandardFilter.StandardFilter(Version, TokenStream) instead. |
StandardFilter(Version matchVersion,
TokenStream in) |
Modifier and Type | Class and Description |
---|---|
class |
CollationKeyFilter
Converts each token into its
CollationKey , and then
encodes the CollationKey with IndexableBinaryStringTools , to allow
it to be stored as an index term. |
Modifier and Type | Method and Description |
---|---|
TokenStream |
CollationKeyAnalyzer.reusableTokenStream(String fieldName,
Reader reader) |
TokenStream |
CollationKeyAnalyzer.tokenStream(String fieldName,
Reader reader) |
Constructor and Description |
---|
CollationKeyFilter(TokenStream input,
Collator collator) |
Modifier and Type | Field and Description |
---|---|
protected TokenStream |
AbstractField.tokenStream |
Modifier and Type | Method and Description |
---|---|
TokenStream |
NumericField.tokenStreamValue()
Returns a
NumericTokenStream for indexing the numeric value. |
TokenStream |
Fieldable.tokenStreamValue()
The TokenStream for this field to be used when indexing, or null.
|
TokenStream |
Field.tokenStreamValue()
The TokesStream for this field to be used when indexing, or null.
|
Modifier and Type | Method and Description |
---|---|
void |
Field.setTokenStream(TokenStream tokenStream)
Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true.
|
Constructor and Description |
---|
Field(String name,
TokenStream tokenStream)
Create a tokenized and indexed field that is not stored.
|
Field(String name,
TokenStream tokenStream,
Field.TermVector termVector)
Create a tokenized and indexed field that is not stored, optionally with
storing term vectors.
|