org.apache.lucene.analysis
Class MockTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.MockTokenizer
- All Implemented Interfaces:
- Closeable
public class MockTokenizer
- extends org.apache.lucene.analysis.Tokenizer
Tokenizer for testing.
This tokenizer is a replacement for WHITESPACE
, SIMPLE
, and KEYWORD
tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test
it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:
- An internal state-machine is used for checking consumer consistency. These checks can
be disabled with
setEnableChecks(boolean)
.
- For convenience, optionally lowercases terms that it outputs.
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource |
org.apache.lucene.util.AttributeSource.AttributeFactory |
Fields inherited from class org.apache.lucene.analysis.Tokenizer |
input |
Constructor Summary |
MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
Reader input,
int pattern,
boolean lowerCase,
int maxTokenLength)
|
MockTokenizer(Reader input,
int pattern,
boolean lowerCase)
|
MockTokenizer(Reader input,
int pattern,
boolean lowerCase,
int maxTokenLength)
|
Methods inherited from class org.apache.lucene.analysis.Tokenizer |
correctOffset |
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
WHITESPACE
public static final int WHITESPACE
- Acts Similar to WhitespaceTokenizer
- See Also:
- Constant Field Values
KEYWORD
public static final int KEYWORD
- Acts Similar to KeywordTokenizer.
TODO: Keyword returns an "empty" token for an empty reader...
- See Also:
- Constant Field Values
SIMPLE
public static final int SIMPLE
- Acts like LetterTokenizer.
- See Also:
- Constant Field Values
DEFAULT_MAX_TOKEN_LENGTH
public static final int DEFAULT_MAX_TOKEN_LENGTH
- See Also:
- Constant Field Values
MockTokenizer
public MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
Reader input,
int pattern,
boolean lowerCase,
int maxTokenLength)
MockTokenizer
public MockTokenizer(Reader input,
int pattern,
boolean lowerCase,
int maxTokenLength)
MockTokenizer
public MockTokenizer(Reader input,
int pattern,
boolean lowerCase)
incrementToken
public final boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
readCodePoint
protected int readCodePoint()
throws IOException
- Throws:
IOException
isTokenChar
protected boolean isTokenChar(int c)
normalize
protected int normalize(int c)
reset
public void reset()
throws IOException
- Overrides:
reset
in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
close
public void close()
throws IOException
- Specified by:
close
in interface Closeable
- Overrides:
close
in class org.apache.lucene.analysis.Tokenizer
- Throws:
IOException
reset
public void reset(Reader input)
throws IOException
- Overrides:
reset
in class org.apache.lucene.analysis.Tokenizer
- Throws:
IOException
end
public void end()
throws IOException
- Overrides:
end
in class org.apache.lucene.analysis.TokenStream
- Throws:
IOException
setEnableChecks
public void setEnableChecks(boolean enableChecks)
- Toggle consumer workflow checking: if your test consumes tokenstreams normally you
should leave this enabled.
Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.