org.apache.lucene.analysis.custom.CustomAnalyzer

All Implemented Interfaces:: Closeable, AutoCloseable

public final class CustomAnalyzer extends Analyzer

A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

You can create an instance of this Analyzer using the builder by passing the SPI names (as defined by ServiceLoader interface) to it:

 Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
   .withTokenizer(StandardTokenizerFactory.NAME)
   .addTokenFilter(LowerCaseFilterFactory.NAME)
   .addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
   .build();

The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

This is the same as the above:

 Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
   .withTokenizer("standard")
   .addTokenFilter("lowercase")
   .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
   .build();

The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(), and CharFilterFactory.availableCharFilters().

You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...) and CustomAnalyzer.Builder.whenTerm(Predicate):

 Analyzer ana = CustomAnalyzer.builder()
    .withTokenizer("standard")
    .addTokenFilter("lowercase")
    .whenTerm(t -> t.length() > 10)
      .addTokenFilter("reversestring")
    .endwhen()
    .build();

Since:: 5.0.0

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final class

CustomAnalyzer.Builder

Builder for CustomAnalyzer.

static class

CustomAnalyzer.ConditionBuilder

Factory class for a ConditionalTokenFilter

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
Field Summary

Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
Method Summary

Modifier and Type

Method

Description

static CustomAnalyzer.Builder

builder()

Returns a builder for custom analyzers that loads all resources from Lucene's classloader.

static CustomAnalyzer.Builder

builder(Path configDir)

Returns a builder for custom analyzers that loads all resources from the given file system base directory.

static CustomAnalyzer.Builder

builder(ResourceLoader loader)

Returns a builder for custom analyzers that loads all resources using the given ResourceLoader.

protected Analyzer.TokenStreamComponents

createComponents(String fieldName)

List<CharFilterFactory>

getCharFilterFactories()

Returns the list of char filters that are used in this analyzer.

int

getOffsetGap(String fieldName)

int

getPositionIncrementGap(String fieldName)

List<TokenFilterFactory>

getTokenFilterFactories()

Returns the list of token filters that are used in this analyzer.

TokenizerFactory

getTokenizerFactory()

Returns the tokenizer that is used in this analyzer.

protected Reader

initReader(String fieldName, Reader reader)

protected Reader

initReaderForNormalization(String fieldName, Reader reader)

protected TokenStream

normalize(String fieldName, TokenStream in)

String

toString()

Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, normalize, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Method Details
- builder
  
  public static CustomAnalyzer.Builder builder()
  
  Returns a builder for custom analyzers that loads all resources from Lucene's classloader. All path names given must be absolute with package prefixes.
- builder
  
  public static CustomAnalyzer.Builder builder(Path configDir)
  
  Returns a builder for custom analyzers that loads all resources from the given file system base directory. Place, e.g., stop word files there. Files that are not in the given directory are loaded from Lucene's classloader.
- builder
  
  public static CustomAnalyzer.Builder builder(ResourceLoader loader)
  
  Returns a builder for custom analyzers that loads all resources using the given ResourceLoader.
- initReader
  
  protected Reader initReader(String fieldName, Reader reader)
  
  Overrides:
  
  initReader in class Analyzer
- initReaderForNormalization
  
  protected Reader initReaderForNormalization(String fieldName, Reader reader)
  
  Overrides:
  
  initReaderForNormalization in class Analyzer
- createComponents
  
  protected Analyzer.TokenStreamComponents createComponents(String fieldName)
  
  Specified by:
  
  createComponents in class Analyzer
- normalize
  
  protected TokenStream normalize(String fieldName, TokenStream in)
  
  Overrides:
  
  normalize in class Analyzer
- getPositionIncrementGap
  
  public int getPositionIncrementGap(String fieldName)
  
  Overrides:
  
  getPositionIncrementGap in class Analyzer
- getOffsetGap
  
  public int getOffsetGap(String fieldName)
  
  Overrides:
  
  getOffsetGap in class Analyzer
- getCharFilterFactories
  
  public List<CharFilterFactory> getCharFilterFactories()
  
  Returns the list of char filters that are used in this analyzer.
- getTokenizerFactory
  
  public TokenizerFactory getTokenizerFactory()
  
  Returns the tokenizer that is used in this analyzer.
- getTokenFilterFactories
  
  public List<TokenFilterFactory> getTokenFilterFactories()
  
  Returns the list of token filters that are used in this analyzer.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class CustomAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.Analyzer

Method Summary

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Method Details

builder

builder

builder

initReader

initReaderForNormalization

createComponents

normalize

getPositionIncrementGap

getOffsetGap

getCharFilterFactories

getTokenizerFactory

getTokenFilterFactories

toString