Package org.apache.lucene.tests.analysis
Class MockTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.tests.analysis.MockTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Tokenizer for testing.
This tokenizer is a replacement for WHITESPACE
, SIMPLE
, and KEYWORD
tokenizers. If you are writing a component such as a TokenFilter, it's a great idea to test it
wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:
- An internal state-machine is used for checking consumer consistency. These checks can be
disabled with
setEnableChecks(boolean)
. - For convenience, optionally lowercases terms that it outputs.
-
Field Summary
Modifier and TypeFieldDescriptionstatic final int
static final CharacterRunAutomaton
Acts Similar to KeywordTokenizer.static final CharacterRunAutomaton
Acts like LetterTokenizer.static final CharacterRunAutomaton
Acts Similar to WhitespaceTokenizerFields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionMockTokenizer
(AttributeFactory factory) MockTokenizer
(AttributeFactory factory, CharacterRunAutomaton runAutomaton, boolean lowerCase) MockTokenizer
(AttributeFactory factory, CharacterRunAutomaton runAutomaton, boolean lowerCase, int maxTokenLength) MockTokenizer
(CharacterRunAutomaton runAutomaton, boolean lowerCase) MockTokenizer
(CharacterRunAutomaton runAutomaton, boolean lowerCase, int maxTokenLength) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
void
end()
final boolean
protected boolean
isTokenChar
(int c) protected int
normalize
(int c) protected int
readChar()
protected int
void
reset()
void
setEnableChecks
(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.protected void
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset, setReader
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
WHITESPACE
Acts Similar to WhitespaceTokenizer -
KEYWORD
Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader... -
SIMPLE
Acts like LetterTokenizer. -
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH- See Also:
-
-
Constructor Details
-
MockTokenizer
public MockTokenizer(AttributeFactory factory, CharacterRunAutomaton runAutomaton, boolean lowerCase, int maxTokenLength) -
MockTokenizer
-
MockTokenizer
-
MockTokenizer
public MockTokenizer() -
MockTokenizer
public MockTokenizer(AttributeFactory factory, CharacterRunAutomaton runAutomaton, boolean lowerCase) -
MockTokenizer
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
readCodePoint
- Throws:
IOException
-
readChar
- Throws:
IOException
-
isTokenChar
protected boolean isTokenChar(int c) -
normalize
protected int normalize(int c) -
reset
- Overrides:
reset
in classTokenizer
- Throws:
IOException
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenizer
- Throws:
IOException
-
setReaderTestPoint
protected void setReaderTestPoint()- Overrides:
setReaderTestPoint
in classTokenizer
-
end
- Overrides:
end
in classTokenStream
- Throws:
IOException
-
setEnableChecks
public void setEnableChecks(boolean enableChecks) Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
-