Class CustomAnalyzer

java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.custom.CustomAnalyzer
All Implemented Interfaces:
Closeable, AutoCloseable

public final class CustomAnalyzer extends Analyzer
A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

You can create an instance of this Analyzer using the builder by passing the SPI names (as defined by ServiceLoader interface) to it:

 Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
   .withTokenizer(StandardTokenizerFactory.NAME)
   .addTokenFilter(LowerCaseFilterFactory.NAME)
   .addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
   .build();
 
The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

This is the same as the above:

 Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
   .withTokenizer("standard")
   .addTokenFilter("lowercase")
   .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
   .build();
 

The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(), and CharFilterFactory.availableCharFilters().

You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...) and CustomAnalyzer.Builder.whenTerm(Predicate):

 Analyzer ana = CustomAnalyzer.builder()
    .withTokenizer("standard")
    .addTokenFilter("lowercase")
    .whenTerm(t -> t.length() > 10)
      .addTokenFilter("reversestring")
    .endwhen()
    .build();
 
Since:
5.0.0