Class AnalyzerWrapper

java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.AnalyzerWrapper
All Implemented Interfaces:
Closeable, AutoCloseable
Direct Known Subclasses:
DelegatingAnalyzerWrapper

public abstract class AnalyzerWrapper extends Analyzer
Extension to Analyzer suitable for Analyzers which wrap other Analyzers.

getWrappedAnalyzer(String) allows the Analyzer to wrap multiple Analyzers which are selected on a per field basis.

wrapComponents(String, Analyzer.TokenStreamComponents) allows the TokenStreamComponents of the wrapped Analyzer to then be wrapped (such as adding a new TokenFilter to form new TokenStreamComponents.

wrapReader(String, Reader) allows the Reader of the wrapped Analyzer to then be wrapped (such as adding a new CharFilter.

Important: If you do not want to wrap the TokenStream using wrapComponents(String, Analyzer.TokenStreamComponents) or the Reader using wrapReader(String, Reader) and just delegate to other analyzers (like by field name), use DelegatingAnalyzerWrapper as superclass!

Since:
4.0.0
See Also:
  • Constructor Details

    • AnalyzerWrapper

      protected AnalyzerWrapper(Analyzer.ReuseStrategy reuseStrategy)
      Creates a new AnalyzerWrapper with the given reuse strategy.

      If you want to wrap a single delegate Analyzer you can probably reuse its strategy when instantiating this subclass: super(delegate.getReuseStrategy());.

      If you choose different analyzers per field, use Analyzer.PER_FIELD_REUSE_STRATEGY.

      See Also:
  • Method Details

    • getWrappedAnalyzer

      protected abstract Analyzer getWrappedAnalyzer(String fieldName)
      Retrieves the wrapped Analyzer appropriate for analyzing the field with the given name
      Parameters:
      fieldName - Name of the field which is to be analyzed
      Returns:
      Analyzer for the field with the given name. Assumed to be non-null
    • wrapComponents

      protected Analyzer.TokenStreamComponents wrapComponents(String fieldName, Analyzer.TokenStreamComponents components)
      Wraps / alters the given TokenStreamComponents, taken from the wrapped Analyzer, to form new components. It is through this method that new TokenFilters can be added by AnalyzerWrappers. By default, the given components are returned.
      Parameters:
      fieldName - Name of the field which is to be analyzed
      components - TokenStreamComponents taken from the wrapped Analyzer
      Returns:
      Wrapped / altered TokenStreamComponents.
    • wrapTokenStreamForNormalization

      protected TokenStream wrapTokenStreamForNormalization(String fieldName, TokenStream in)
      Wraps / alters the given TokenStream for normalization purposes, taken from the wrapped Analyzer, to form new components. It is through this method that new TokenFilters can be added by AnalyzerWrappers. By default, the given token stream are returned.
      Parameters:
      fieldName - Name of the field which is to be analyzed
      in - TokenStream taken from the wrapped Analyzer
      Returns:
      Wrapped / altered TokenStreamComponents.
    • wrapReader

      protected Reader wrapReader(String fieldName, Reader reader)
      Wraps / alters the given Reader. Through this method AnalyzerWrappers can implement initReader(String, Reader). By default, the given reader is returned.
      Parameters:
      fieldName - name of the field which is to be analyzed
      reader - the reader to wrap
      Returns:
      the wrapped reader
    • wrapReaderForNormalization

      protected Reader wrapReaderForNormalization(String fieldName, Reader reader)
      Wraps / alters the given Reader. Through this method AnalyzerWrappers can implement initReaderForNormalization(String, Reader). By default, the given reader is returned.
      Parameters:
      fieldName - name of the field which is to be analyzed
      reader - the reader to wrap
      Returns:
      the wrapped reader
    • createComponents

      protected final Analyzer.TokenStreamComponents createComponents(String fieldName)
      Description copied from class: Analyzer
      Creates a new Analyzer.TokenStreamComponents instance for this analyzer.
      Specified by:
      createComponents in class Analyzer
      Parameters:
      fieldName - the name of the fields content passed to the Analyzer.TokenStreamComponents sink as a reader
      Returns:
      the Analyzer.TokenStreamComponents for this analyzer.
    • normalize

      protected final TokenStream normalize(String fieldName, TokenStream in)
      Description copied from class: Analyzer
      Wrap the given TokenStream in order to apply normalization filters. The default implementation returns the TokenStream as-is. This is used by Analyzer.normalize(String, String).
      Overrides:
      normalize in class Analyzer
    • getPositionIncrementGap

      public int getPositionIncrementGap(String fieldName)
      Description copied from class: Analyzer
      Invoked before indexing a IndexableField instance if terms have already been added to that field. This allows custom analyzers to place an automatic position increment gap between IndexbleField instances using the same field name. The default value position increment gap is 0. With a 0 position increment gap and the typical default token position increment of 1, all terms in a field, including across IndexableField instances, are in successive positions, allowing exact PhraseQuery matches, for instance, across IndexableField instance boundaries.
      Overrides:
      getPositionIncrementGap in class Analyzer
      Parameters:
      fieldName - IndexableField name being indexed.
      Returns:
      position increment gap, added to the next token emitted from Analyzer.tokenStream(String,Reader). This value must be >= 0.
    • getOffsetGap

      public int getOffsetGap(String fieldName)
      Description copied from class: Analyzer
      Just like Analyzer.getPositionIncrementGap(java.lang.String), except for Token offsets instead. By default this returns 1. This method is only called if the field produced at least one token for indexing.
      Overrides:
      getOffsetGap in class Analyzer
      Parameters:
      fieldName - the field just indexed
      Returns:
      offset gap, added to the next token emitted from Analyzer.tokenStream(String,Reader). This value must be >= 0.
    • initReader

      public final Reader initReader(String fieldName, Reader reader)
      Description copied from class: Analyzer
      Override this if you want to add a CharFilter chain.

      The default implementation returns reader unchanged.

      Overrides:
      initReader in class Analyzer
      Parameters:
      fieldName - IndexableField name being indexed
      reader - original Reader
      Returns:
      reader, optionally decorated with CharFilter(s)
    • initReaderForNormalization

      protected final Reader initReaderForNormalization(String fieldName, Reader reader)
      Description copied from class: Analyzer
      Wrap the given Reader with CharFilters that make sense for normalization. This is typically a subset of the CharFilters that are applied in Analyzer.initReader(String, Reader). This is used by Analyzer.normalize(String, String).
      Overrides:
      initReaderForNormalization in class Analyzer
    • attributeFactory

      protected final AttributeFactory attributeFactory(String fieldName)
      Description copied from class: Analyzer
      Return the AttributeFactory to be used for analysis and normalization on the given FieldName. The default implementation returns TokenStream.DEFAULT_TOKEN_ATTRIBUTE_FACTORY.
      Overrides:
      attributeFactory in class Analyzer