public abstract class Analyzer extends Object implements Closeable
Typical implementations first build a Tokenizer, which breaks the stream of characters from the Reader into raw Tokens. One or more TokenFilters may then be applied to the output of the Tokenizer.
The Analyzer
-API in Lucene is based on the decorator pattern.
Therefore all non-abstract subclasses must be final or their tokenStream(java.lang.String, java.io.Reader)
and reusableTokenStream(java.lang.String, java.io.Reader)
implementations must be final! This is checked
when Java assertions are enabled.
Modifier | Constructor and Description |
---|---|
protected |
Analyzer() |
Modifier and Type | Method and Description |
---|---|
void |
close()
Frees persistent resources used by this Analyzer
|
int |
getOffsetGap(Fieldable field)
Just like
getPositionIncrementGap(java.lang.String) , except for
Token offsets instead. |
int |
getPositionIncrementGap(String fieldName)
Invoked before indexing a Fieldable instance if
terms have already been added to that field.
|
protected Object |
getPreviousTokenStream()
Used by Analyzers that implement reusableTokenStream
to retrieve previously saved TokenStreams for re-use
by the same thread.
|
TokenStream |
reusableTokenStream(String fieldName,
Reader reader)
Creates a TokenStream that is allowed to be re-used
from the previous time that the same thread called
this method.
|
protected void |
setPreviousTokenStream(Object obj)
Used by Analyzers that implement reusableTokenStream
to save a TokenStream for later re-use by the same
thread.
|
abstract TokenStream |
tokenStream(String fieldName,
Reader reader)
Creates a TokenStream which tokenizes all the text in the provided
Reader.
|
public abstract TokenStream tokenStream(String fieldName, Reader reader)
public TokenStream reusableTokenStream(String fieldName, Reader reader) throws IOException
IOException
protected Object getPreviousTokenStream()
protected void setPreviousTokenStream(Object obj)
public int getPositionIncrementGap(String fieldName)
fieldName
- Fieldable name being indexed.tokenStream(String,Reader)
public int getOffsetGap(Fieldable field)
getPositionIncrementGap(java.lang.String)
, except for
Token offsets instead. By default this returns 1 for
tokenized fields and, as if the fields were joined
with an extra space character, and 0 for un-tokenized
fields. This method is only called if the field
produced at least one token for indexing.field
- the field just indexedtokenStream(String,Reader)