Class CustomAnalyzer
- java.lang.Object
-
- org.apache.lucene.analysis.Analyzer
-
- org.apache.lucene.analysis.custom.CustomAnalyzer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class CustomAnalyzer extends Analyzer
A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses the factory classesTokenizerFactory
,TokenFilterFactory
, andCharFilterFactory
.You can create an instance of this Analyzer using the builder by passing the SPI names (as defined by
ServiceLoader
interface) to it:Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer(StandardTokenizerFactory.NAME) .addTokenFilter(LowerCaseFilterFactory.NAME) .addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();
The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses ofTokenizerFactory
,TokenFilterFactory
, andCharFilterFactory
.This is the same as the above:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer("standard") .addTokenFilter("lowercase") .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();
The list of names to be used for components can be looked up through:
TokenizerFactory.availableTokenizers()
,TokenFilterFactory.availableTokenFilters()
, andCharFilterFactory.availableCharFilters()
.You can create conditional branches in the analyzer by using
CustomAnalyzer.Builder.when(String, String...)
andCustomAnalyzer.Builder.whenTerm(Predicate)
:Analyzer ana = CustomAnalyzer.builder() .withTokenizer("standard") .addTokenFilter("lowercase") .whenTerm(t -> t.length() > 10) .addTokenFilter("reversestring") .endwhen() .build();
- Since:
- 5.0.0
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CustomAnalyzer.Builder
Builder forCustomAnalyzer
.static class
CustomAnalyzer.ConditionBuilder
Factory class for aConditionalTokenFilter
-
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static CustomAnalyzer.Builder
builder()
Returns a builder for custom analyzers that loads all resources from Lucene's classloader.static CustomAnalyzer.Builder
builder(Path configDir)
Returns a builder for custom analyzers that loads all resources from the given file system base directory.static CustomAnalyzer.Builder
builder(ResourceLoader loader)
Returns a builder for custom analyzers that loads all resources using the givenResourceLoader
.protected Analyzer.TokenStreamComponents
createComponents(String fieldName)
List<CharFilterFactory>
getCharFilterFactories()
Returns the list of char filters that are used in this analyzer.int
getOffsetGap(String fieldName)
int
getPositionIncrementGap(String fieldName)
List<TokenFilterFactory>
getTokenFilterFactories()
Returns the list of token filters that are used in this analyzer.TokenizerFactory
getTokenizerFactory()
Returns the tokenizer that is used in this analyzer.protected Reader
initReader(String fieldName, Reader reader)
protected Reader
initReaderForNormalization(String fieldName, Reader reader)
protected TokenStream
normalize(String fieldName, TokenStream in)
String
toString()
-
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, normalize, tokenStream, tokenStream
-
-
-
-
Method Detail
-
builder
public static CustomAnalyzer.Builder builder()
Returns a builder for custom analyzers that loads all resources from Lucene's classloader. All path names given must be absolute with package prefixes.
-
builder
public static CustomAnalyzer.Builder builder(Path configDir)
Returns a builder for custom analyzers that loads all resources from the given file system base directory. Place, e.g., stop word files there. Files that are not in the given directory are loaded from Lucene's classloader.
-
builder
public static CustomAnalyzer.Builder builder(ResourceLoader loader)
Returns a builder for custom analyzers that loads all resources using the givenResourceLoader
.
-
initReader
protected Reader initReader(String fieldName, Reader reader)
- Overrides:
initReader
in classAnalyzer
-
initReaderForNormalization
protected Reader initReaderForNormalization(String fieldName, Reader reader)
- Overrides:
initReaderForNormalization
in classAnalyzer
-
createComponents
protected Analyzer.TokenStreamComponents createComponents(String fieldName)
- Specified by:
createComponents
in classAnalyzer
-
normalize
protected TokenStream normalize(String fieldName, TokenStream in)
-
getPositionIncrementGap
public int getPositionIncrementGap(String fieldName)
- Overrides:
getPositionIncrementGap
in classAnalyzer
-
getOffsetGap
public int getOffsetGap(String fieldName)
- Overrides:
getOffsetGap
in classAnalyzer
-
getCharFilterFactories
public List<CharFilterFactory> getCharFilterFactories()
Returns the list of char filters that are used in this analyzer.
-
getTokenizerFactory
public TokenizerFactory getTokenizerFactory()
Returns the tokenizer that is used in this analyzer.
-
getTokenFilterFactories
public List<TokenFilterFactory> getTokenFilterFactories()
Returns the list of token filters that are used in this analyzer.
-
-