Class CustomAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.analysis.custom.CustomAnalyzer
- All Implemented Interfaces:
Closeable
,AutoCloseable
A general-purpose Analyzer that can be created with a builder-style API. Under the hood it uses
the factory classes
TokenizerFactory
, TokenFilterFactory
, and CharFilterFactory
.
You can create an instance of this Analyzer using the builder by passing the SPI names (as
defined by ServiceLoader
interface) to it:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer(StandardTokenizerFactory.NAME) .addTokenFilter(LowerCaseFilterFactory.NAME) .addTokenFilter(StopFilterFactory.NAME, "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();The parameters passed to components are also used by Apache Solr and are documented on their corresponding factory classes. Refer to documentation of subclasses of
TokenizerFactory
,
TokenFilterFactory
, and CharFilterFactory
.
This is the same as the above:
Analyzer ana = CustomAnalyzer.builder(Paths.get("/path/to/config/dir")) .withTokenizer("standard") .addTokenFilter("lowercase") .addTokenFilter("stop", "ignoreCase", "false", "words", "stopwords.txt", "format", "wordset") .build();
The list of names to be used for components can be looked up through: TokenizerFactory.availableTokenizers()
, TokenFilterFactory.availableTokenFilters()
, and
CharFilterFactory.availableCharFilters()
.
You can create conditional branches in the analyzer by using CustomAnalyzer.Builder.when(String, String...)
and CustomAnalyzer.Builder.whenTerm(Predicate)
:
Analyzer ana = CustomAnalyzer.builder() .withTokenizer("standard") .addTokenFilter("lowercase") .whenTerm(t -> t.length() > 10) .addTokenFilter("reversestring") .endwhen() .build();
- Since:
- 5.0.0
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic final class
Builder forCustomAnalyzer
.static class
Factory class for aConditionalTokenFilter
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Method Summary
Modifier and TypeMethodDescriptionstatic CustomAnalyzer.Builder
builder()
Returns a builder for custom analyzers that loads all resources from Lucene's classloader.static CustomAnalyzer.Builder
Returns a builder for custom analyzers that loads all resources from the given file system base directory.static CustomAnalyzer.Builder
builder
(ResourceLoader loader) Returns a builder for custom analyzers that loads all resources using the givenResourceLoader
.protected Analyzer.TokenStreamComponents
createComponents
(String fieldName) Returns the list of char filters that are used in this analyzer.int
getOffsetGap
(String fieldName) int
getPositionIncrementGap
(String fieldName) Returns the list of token filters that are used in this analyzer.Returns the tokenizer that is used in this analyzer.protected Reader
initReader
(String fieldName, Reader reader) protected Reader
initReaderForNormalization
(String fieldName, Reader reader) protected TokenStream
normalize
(String fieldName, TokenStream in) toString()
Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, normalize, tokenStream, tokenStream
-
Method Details
-
builder
Returns a builder for custom analyzers that loads all resources from Lucene's classloader. All path names given must be absolute with package prefixes. -
builder
Returns a builder for custom analyzers that loads all resources from the given file system base directory. Place, e.g., stop word files there. Files that are not in the given directory are loaded from Lucene's classloader. -
builder
Returns a builder for custom analyzers that loads all resources using the givenResourceLoader
. -
initReader
- Overrides:
initReader
in classAnalyzer
-
initReaderForNormalization
- Overrides:
initReaderForNormalization
in classAnalyzer
-
createComponents
- Specified by:
createComponents
in classAnalyzer
-
normalize
-
getPositionIncrementGap
- Overrides:
getPositionIncrementGap
in classAnalyzer
-
getOffsetGap
- Overrides:
getOffsetGap
in classAnalyzer
-
getCharFilterFactories
Returns the list of char filters that are used in this analyzer. -
getTokenizerFactory
Returns the tokenizer that is used in this analyzer. -
getTokenFilterFactories
Returns the list of token filters that are used in this analyzer. -
toString
-