org.apache.lucene.analysis
Class MockTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.MockTokenizer
- All Implemented Interfaces:
- Closeable
public class MockTokenizer
- extends Tokenizer
Tokenizer for testing.
This tokenizer is a replacement for WHITESPACE
, SIMPLE
, and KEYWORD
tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test
it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:
- An internal state-machine is used for checking consumer consistency. These checks can
be disabled with
setEnableChecks(boolean)
.
- For convenience, optionally lowercases terms that it outputs.
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Constructor Summary |
MockTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
int maxTokenLength)
|
MockTokenizer(Reader input)
Calls MockTokenizer(Reader, WHITESPACE, true) |
MockTokenizer(Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase)
|
MockTokenizer(Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
int maxTokenLength)
|
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState |
WHITESPACE
public static final CharacterRunAutomaton WHITESPACE
- Acts Similar to WhitespaceTokenizer
KEYWORD
public static final CharacterRunAutomaton KEYWORD
- Acts Similar to KeywordTokenizer.
TODO: Keyword returns an "empty" token for an empty reader...
SIMPLE
public static final CharacterRunAutomaton SIMPLE
- Acts like LetterTokenizer.
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
MockTokenizer
public MockTokenizer(AttributeSource.AttributeFactory factory,
Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
int maxTokenLength)
MockTokenizer
public MockTokenizer(Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase,
int maxTokenLength)
MockTokenizer
public MockTokenizer(Reader input,
CharacterRunAutomaton runAutomaton,
boolean lowerCase)
MockTokenizer
public MockTokenizer(Reader input)
- Calls
MockTokenizer(Reader, WHITESPACE, true)
incrementToken
public final boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class TokenStream
- Throws:
IOException
readCodePoint
protected int readCodePoint()
throws IOException
- Throws:
IOException
readChar
protected int readChar()
throws IOException
- Throws:
IOException
isTokenChar
protected boolean isTokenChar(int c)
normalize
protected int normalize(int c)
reset
public void reset()
throws IOException
- Overrides:
reset
in class TokenStream
- Throws:
IOException
close
public void close()
throws IOException
- Specified by:
close
in interface Closeable
- Overrides:
close
in class Tokenizer
- Throws:
IOException
end
public void end()
throws IOException
- Overrides:
end
in class TokenStream
- Throws:
IOException
setEnableChecks
public void setEnableChecks(boolean enableChecks)
- Toggle consumer workflow checking: if your test consumes tokenstreams normally you
should leave this enabled.
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.