Class CustomAnalyzer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class CustomAnalyzer
    extends Analyzer
    A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classes TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

    You can create an instance of this Analyzer using the builder:

     Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
       .withTokenizer(StandardTokenizerFactory.class)
       .addTokenFilter(StandardFilterFactory.class)
       .addTokenFilter(LowerCaseFilterFactory.class)
       .addTokenFilter(StopFilterFactory.class, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
       .build();
     
    The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of TokenizerFactory, TokenFilterFactory, and CharFilterFactory.

    You can also use the SPI names (as defined by ServiceLoader interface):

     Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir"))
       .withTokenizer("standard")
       .addTokenFilter("standard")
       .addTokenFilter("lowercase")
       .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset")
       .build();
     

    The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers(), TokenFilterFactory.availableTokenFilters(), and CharFilterFactory.availableCharFilters().

    You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...) and CustomAnalyzer.Builder.whenTerm(Predicate):

     Analyzer ana = CustomAnalyzer.builder()
        .withTokenizer("standard")
        .addTokenFilter("lowercase")
        .whenTerm(t -> t.length() > 10)
          .addTokenFilter("reversestring")
        .endwhen()
        .build();
     
    Since:
    5.0.0