|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.apache.lucene.analysis.Analyzer
public abstract class Analyzer
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.
In order to define what analysis is done, subclasses must define their
TokenStreamComponents in createComponents(String, Reader).
The components are then reused in each call to tokenStream(String, Reader).
Simple example:
Analyzer analyzer = new Analyzer() {
@Override
protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
Tokenizer source = new FooTokenizer(reader);
TokenStream filter = new FooFilter(source);
filter = new BarFilter(filter);
return new TokenStreamComponents(source, filter);
}
};
For more examples, see the Analysis package documentation.
For some concrete implementations bundled with Lucene, look in the analysis modules:
| Nested Class Summary | |
|---|---|
static class |
Analyzer.GlobalReuseStrategy
Deprecated. This implementation class will be hidden in Lucene 5.0. Use GLOBAL_REUSE_STRATEGY instead! |
static class |
Analyzer.PerFieldReuseStrategy
Deprecated. This implementation class will be hidden in Lucene 5.0. Use PER_FIELD_REUSE_STRATEGY instead! |
static class |
Analyzer.ReuseStrategy
Strategy defining how TokenStreamComponents are reused per call to tokenStream(String, java.io.Reader). |
static class |
Analyzer.TokenStreamComponents
This class encapsulates the outer components of a token stream. |
| Field Summary | |
|---|---|
static Analyzer.ReuseStrategy |
GLOBAL_REUSE_STRATEGY
A predefined Analyzer.ReuseStrategy that reuses the same components for
every field. |
static Analyzer.ReuseStrategy |
PER_FIELD_REUSE_STRATEGY
A predefined Analyzer.ReuseStrategy that reuses components per-field by
maintaining a Map of TokenStreamComponent per field name. |
| Constructor Summary | |
|---|---|
Analyzer()
Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader). |
|
Analyzer(Analyzer.ReuseStrategy reuseStrategy)
Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy. |
|
| Method Summary | |
|---|---|
void |
close()
Frees persistent resources used by this Analyzer |
protected abstract Analyzer.TokenStreamComponents |
createComponents(String fieldName,
Reader reader)
Creates a new Analyzer.TokenStreamComponents instance for this analyzer. |
int |
getOffsetGap(String fieldName)
Just like getPositionIncrementGap(java.lang.String), except for
Token offsets instead. |
int |
getPositionIncrementGap(String fieldName)
Invoked before indexing a IndexableField instance if terms have already been added to that field. |
Analyzer.ReuseStrategy |
getReuseStrategy()
Returns the used Analyzer.ReuseStrategy. |
protected Reader |
initReader(String fieldName,
Reader reader)
Override this if you want to add a CharFilter chain. |
TokenStream |
tokenStream(String fieldName,
Reader reader)
Returns a TokenStream suitable for fieldName, tokenizing
the contents of reader. |
TokenStream |
tokenStream(String fieldName,
String text)
Returns a TokenStream suitable for fieldName, tokenizing
the contents of text. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY
Analyzer.ReuseStrategy that reuses the same components for
every field.
public static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY
Analyzer.ReuseStrategy that reuses components per-field by
maintaining a Map of TokenStreamComponent per field name.
| Constructor Detail |
|---|
public Analyzer()
tokenStream(String, Reader).
public Analyzer(Analyzer.ReuseStrategy reuseStrategy)
Analyzer.ReuseStrategy.
NOTE: if you just want to reuse on a per-field basis, its easier to
use a subclass of AnalyzerWrapper such as
PerFieldAnalyerWrapper instead.
| Method Detail |
|---|
protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName,
Reader reader)
Analyzer.TokenStreamComponents instance for this analyzer.
fieldName - the name of the fields content passed to the
Analyzer.TokenStreamComponents sink as a readerreader - the reader passed to the Tokenizer constructor
Analyzer.TokenStreamComponents for this analyzer.
public final TokenStream tokenStream(String fieldName,
Reader reader)
throws IOException
fieldName, tokenizing
the contents of reader.
This method uses createComponents(String, Reader) to obtain an
instance of Analyzer.TokenStreamComponents. It returns the sink of the
components and stores the components internally. Subsequent calls to this
method will reuse the previously stored components after resetting them
through Analyzer.TokenStreamComponents.setReader(Reader).
NOTE: After calling this method, the consumer must follow the
workflow described in TokenStream to properly consume its contents.
See the Analysis package documentation for
some examples demonstrating this.
NOTE: If your data is available as a String, use
tokenStream(String, String) which reuses a StringReader-like
instance internally.
fieldName - the name of the field the created TokenStream is used forreader - the reader the streams source reads from
reader
AlreadyClosedException - if the Analyzer is closed.
IOException - if an i/o error occurs.tokenStream(String, String)
public final TokenStream tokenStream(String fieldName,
String text)
throws IOException
fieldName, tokenizing
the contents of text.
This method uses createComponents(String, Reader) to obtain an
instance of Analyzer.TokenStreamComponents. It returns the sink of the
components and stores the components internally. Subsequent calls to this
method will reuse the previously stored components after resetting them
through Analyzer.TokenStreamComponents.setReader(Reader).
NOTE: After calling this method, the consumer must follow the
workflow described in TokenStream to properly consume its contents.
See the Analysis package documentation for
some examples demonstrating this.
fieldName - the name of the field the created TokenStream is used fortext - the String the streams source reads from
reader
AlreadyClosedException - if the Analyzer is closed.
IOException - if an i/o error occurs (may rarely happen for strings).tokenStream(String, Reader)
protected Reader initReader(String fieldName,
Reader reader)
The default implementation returns reader
unchanged.
fieldName - IndexableField name being indexedreader - original Reader
public int getPositionIncrementGap(String fieldName)
fieldName - IndexableField name being indexed.
tokenStream(String,Reader).
This value must be >= 0.public int getOffsetGap(String fieldName)
getPositionIncrementGap(java.lang.String), except for
Token offsets instead. By default this returns 1.
This method is only called if the field
produced at least one token for indexing.
fieldName - the field just indexed
tokenStream(String,Reader).
This value must be >= 0.public final Analyzer.ReuseStrategy getReuseStrategy()
Analyzer.ReuseStrategy.
public void close()
close in interface Closeable
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||