Package org.apache.lucene.tests.analysis
Class MockAnalyzer
java.lang.Object
org.apache.lucene.analysis.Analyzer
org.apache.lucene.tests.analysis.MockAnalyzer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Analyzer for testing
This analyzer is a replacement for Whitespace/Simple/KeywordAnalyzers for unit tests. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, it's a great idea to test it with this analyzer instead. MockAnalyzer has the following behavior:
- By default, the assertions in
MockTokenizer
are turned on for extra checks that the consumer is consuming properly. These checks can be disabled withsetEnableChecks(boolean)
. - Payload data is randomly injected into the stream for more thorough testing of payloads.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.Analyzer
GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY
-
Constructor Summary
ConstructorDescriptionMockAnalyzer
(Random random) Create a Whitespace-lowercasing analyzer with no stopwords removal.MockAnalyzer
(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase) MockAnalyzer
(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase, CharacterRunAutomaton filter) Creates a new MockAnalyzer. -
Method Summary
Modifier and TypeMethodDescriptioncreateComponents
(String fieldName) int
getOffsetGap
(String fieldName) Get the offset gap between tokens in fields if several fields with the same name were added.int
getPositionIncrementGap
(String fieldName) protected TokenStream
normalize
(String fieldName, TokenStream in) void
setEnableChecks
(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.void
setMaxTokenLength
(int length) Toggle maxTokenLength for MockTokenizervoid
setOffsetGap
(int offsetGap) Set a new offset gap which will then be added to the offset when several fields with the same name are indexedvoid
setPositionIncrementGap
(int positionIncrementGap) Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getReuseStrategy, initReader, initReaderForNormalization, normalize, tokenStream, tokenStream
-
Constructor Details
-
MockAnalyzer
public MockAnalyzer(Random random, CharacterRunAutomaton runAutomaton, boolean lowerCase, CharacterRunAutomaton filter) Creates a new MockAnalyzer.- Parameters:
random
- Random for payloads behaviorrunAutomaton
- DFA describing how tokenization should happen (e.g. [a-zA-Z]+)lowerCase
- true if the tokenizer should lowercase termsfilter
- DFA describing how terms should be filtered (set of stopwords, etc)
-
MockAnalyzer
-
MockAnalyzer
Create a Whitespace-lowercasing analyzer with no stopwords removal.Calls
MockAnalyzer(random, MockTokenizer.WHITESPACE, true, MockTokenFilter.EMPTY_STOPSET, false
).
-
-
Method Details
-
createComponents
- Specified by:
createComponents
in classAnalyzer
-
normalize
-
setPositionIncrementGap
public void setPositionIncrementGap(int positionIncrementGap) -
getPositionIncrementGap
- Overrides:
getPositionIncrementGap
in classAnalyzer
-
setOffsetGap
public void setOffsetGap(int offsetGap) Set a new offset gap which will then be added to the offset when several fields with the same name are indexed- Parameters:
offsetGap
- The offset gap that should be used.
-
getOffsetGap
Get the offset gap between tokens in fields if several fields with the same name were added.- Overrides:
getOffsetGap
in classAnalyzer
- Parameters:
fieldName
- Currently not used, the same offset gap is returned for each field.
-
setEnableChecks
public void setEnableChecks(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled. -
setMaxTokenLength
public void setMaxTokenLength(int length) Toggle maxTokenLength for MockTokenizer
-