Analyzer (Lucene 4.6.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer

All Implemented Interfaces:

Closeable

Direct Known Subclasses:

AnalyzerWrapper
```
public abstract class Analyzer
extends Object
implements Closeable
```
An Analyzer builds TokenStreams, which analyze text. It thus represents a policy for extracting index terms from text.
In order to define what analysis is done, subclasses must define their TokenStreamComponents in createComponents(String, Reader). The components are then reused in each call to tokenStream(String, Reader).
Simple example:
```
 Analyzer analyzer = new Analyzer() {
  @Override
   protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
     Tokenizer source = new FooTokenizer(reader);
     TokenStream filter = new FooFilter(source);
     filter = new BarFilter(filter);
     return new TokenStreamComponents(source, filter);
   }
 };
 
```
For more examples, see the Analysis package documentation.
For some concrete implementations bundled with Lucene, look in the analysis modules:
- Common: Analyzers for indexing content in different languages and domains.
- ICU: Exposes functionality from ICU to Apache Lucene.
- Kuromoji: Morphological analyzer for Japanese text.
- Morfologik: Dictionary-driven lemmatization for the Polish language.
- Phonetic: Analysis for indexing phonetic signatures (for sounds-alike search).
- Smart Chinese: Analyzer for Simplified Chinese, which indexes words.
- Stempel: Algorithmic Stemmer for the Polish Language.
- UIMA: Analysis integration with Apache UIMA.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`Analyzer.GlobalReuseStrategy` Deprecated. This implementation class will be hidden in Lucene 5.0. Use `GLOBAL_REUSE_STRATEGY` instead!
`static class`	`Analyzer.PerFieldReuseStrategy` Deprecated. This implementation class will be hidden in Lucene 5.0. Use `PER_FIELD_REUSE_STRATEGY` instead!
`static class`	`Analyzer.ReuseStrategy` Strategy defining how TokenStreamComponents are reused per call to `tokenStream(String, java.io.Reader)`.
`static class`	`Analyzer.TokenStreamComponents` This class encapsulates the outer components of a token stream.

Field Summary

Fields
Modifier and Type	Field and Description
`static Analyzer.ReuseStrategy`	`GLOBAL_REUSE_STRATEGY` A predefined `Analyzer.ReuseStrategy` that reuses the same components for every field.
`static Analyzer.ReuseStrategy`	`PER_FIELD_REUSE_STRATEGY` A predefined `Analyzer.ReuseStrategy` that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.

Constructor Summary

Constructors
Constructor and Description
`Analyzer()` Create a new Analyzer, reusing the same set of components per-thread across calls to `tokenStream(String, Reader)`.
`Analyzer(Analyzer.ReuseStrategy reuseStrategy)` Expert: create a new Analyzer with a custom `Analyzer.ReuseStrategy`.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`close()` Frees persistent resources used by this Analyzer
`protected abstract Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates a new `Analyzer.TokenStreamComponents` instance for this analyzer.
`int`	`getOffsetGap(String fieldName)` Just like `getPositionIncrementGap(java.lang.String)`, except for Token offsets instead.
`int`	`getPositionIncrementGap(String fieldName)` Invoked before indexing a IndexableField instance if terms have already been added to that field.
`Analyzer.ReuseStrategy`	`getReuseStrategy()` Returns the used `Analyzer.ReuseStrategy`.
`protected Reader`	`initReader(String fieldName, Reader reader)` Override this if you want to add a CharFilter chain.
`TokenStream`	`tokenStream(String fieldName, Reader reader)` Returns a TokenStream suitable for `fieldName`, tokenizing the contents of `reader`.
`TokenStream`	`tokenStream(String fieldName, String text)` Returns a TokenStream suitable for `fieldName`, tokenizing the contents of `text`.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - GLOBAL_REUSE_STRATEGY
```
public static final Analyzer.ReuseStrategy GLOBAL_REUSE_STRATEGY
```
    A predefined Analyzer.ReuseStrategy that reuses the same components for every field.
  - PER_FIELD_REUSE_STRATEGY
```
public static final Analyzer.ReuseStrategy PER_FIELD_REUSE_STRATEGY
```
    A predefined Analyzer.ReuseStrategy that reuses components per-field by maintaining a Map of TokenStreamComponent per field name.
- Constructor Detail
  - Analyzer
```
public Analyzer()
```
    Create a new Analyzer, reusing the same set of components per-thread across calls to tokenStream(String, Reader).
  - Analyzer
```
public Analyzer(Analyzer.ReuseStrategy reuseStrategy)
```
    Expert: create a new Analyzer with a custom Analyzer.ReuseStrategy.
    NOTE: if you just want to reuse on a per-field basis, its easier to use a subclass of AnalyzerWrapper such as PerFieldAnalyerWrapper instead.
- Method Detail
  - createComponents
```
protected abstract Analyzer.TokenStreamComponents createComponents(String fieldName,
                                              Reader reader)
```
    Creates a new Analyzer.TokenStreamComponents instance for this analyzer.
    
    Parameters:
    fieldName - the name of the fields content passed to the Analyzer.TokenStreamComponents sink as a reader
    reader - the reader passed to the Tokenizer constructor
    
    Returns:
    the Analyzer.TokenStreamComponents for this analyzer.
  - tokenStream
```
public final TokenStream tokenStream(String fieldName,
                      Reader reader)
                              throws IOException
```
    Returns a TokenStream suitable for fieldName, tokenizing the contents of reader.
    This method uses createComponents(String, Reader) to obtain an instance of Analyzer.TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through Analyzer.TokenStreamComponents.setReader(Reader).
    NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Analysis package documentation for some examples demonstrating this. NOTE: If your data is available as a String, use tokenStream(String, String) which reuses a StringReader-like instance internally.
    
    Parameters:
    fieldName - the name of the field the created TokenStream is used for
    reader - the reader the streams source reads from
    
    Returns:
    TokenStream for iterating the analyzed content of reader
    
    Throws:
    
    AlreadyClosedException - if the Analyzer is closed.
    
    IOException - if an i/o error occurs.
    See Also:
    tokenStream(String, String)
  - tokenStream
```
public final TokenStream tokenStream(String fieldName,
                      String text)
                              throws IOException
```
    Returns a TokenStream suitable for fieldName, tokenizing the contents of text.
    This method uses createComponents(String, Reader) to obtain an instance of Analyzer.TokenStreamComponents. It returns the sink of the components and stores the components internally. Subsequent calls to this method will reuse the previously stored components after resetting them through Analyzer.TokenStreamComponents.setReader(Reader).
    NOTE: After calling this method, the consumer must follow the workflow described in TokenStream to properly consume its contents. See the Analysis package documentation for some examples demonstrating this.
    
    Parameters:
    fieldName - the name of the field the created TokenStream is used for
    text - the String the streams source reads from
    
    Returns:
    TokenStream for iterating the analyzed content of reader
    
    Throws:
    
    AlreadyClosedException - if the Analyzer is closed.
    
    IOException - if an i/o error occurs (may rarely happen for strings).
    See Also:
    tokenStream(String, Reader)
  - initReader
```
protected Reader initReader(String fieldName,
                Reader reader)
```
    Override this if you want to add a CharFilter chain.
    The default implementation returns reader unchanged.
    
    Parameters:
    fieldName - IndexableField name being indexed
    reader - original Reader
    
    Returns:
    reader, optionally decorated with CharFilter(s)
  - getPositionIncrementGap
```
public int getPositionIncrementGap(String fieldName)
```
    Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.
    
    Parameters:
    fieldName - IndexableField name being indexed.
    
    Returns:
    position increment gap, added to the next token emitted from tokenStream(String,Reader). This value must be >= 0.
  - getOffsetGap
```
public int getOffsetGap(String fieldName)
```
    Just like getPositionIncrementGap(java.lang.String), except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.
    
    Parameters:
    fieldName - the field just indexed
    
    Returns:
    offset gap, added to the next token emitted from tokenStream(String,Reader). This value must be >= 0.
  - getReuseStrategy
```
public final Analyzer.ReuseStrategy getReuseStrategy()
```
    Returns the used Analyzer.ReuseStrategy.
  - close
```
public void close()
```
    Frees persistent resources used by this Analyzer
    
    Specified by:
    
    close in interface Closeable

Class Analyzer

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

GLOBAL_REUSE_STRATEGY

PER_FIELD_REUSE_STRATEGY

Constructor Detail

Analyzer

Analyzer

Method Detail

createComponents

tokenStream

tokenStream

initReader

getPositionIncrementGap

getOffsetGap

getReuseStrategy

close