org.apache.lucene.analysis
Class MockTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.MockTokenizer
All Implemented Interfaces:
Closeable

public class MockTokenizer
extends Tokenizer

Tokenizer for testing.

This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory
 
Field Summary
static int DEFAULT_MAX_TOKEN_LENGTH
           
static CharacterRunAutomaton KEYWORD
          Acts Similar to KeywordTokenizer.
static CharacterRunAutomaton SIMPLE
          Acts like LetterTokenizer.
static CharacterRunAutomaton WHITESPACE
          Acts Similar to WhitespaceTokenizer
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
MockTokenizer(AttributeSource.AttributeFactory factory, Reader input, CharacterRunAutomaton runAutomaton, boolean lowerCase, int maxTokenLength)
           
MockTokenizer(Reader input)
          Calls MockTokenizer(Reader, WHITESPACE, true)
MockTokenizer(Reader input, CharacterRunAutomaton runAutomaton, boolean lowerCase)
           
MockTokenizer(Reader input, CharacterRunAutomaton runAutomaton, boolean lowerCase, int maxTokenLength)
           
 
Method Summary
 void close()
           
 void end()
           
 boolean incrementToken()
           
protected  boolean isTokenChar(int c)
           
protected  int normalize(int c)
           
protected  int readChar()
           
protected  int readCodePoint()
           
 void reset()
           
 void setEnableChecks(boolean enableChecks)
          Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset, setReader
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

WHITESPACE

public static final CharacterRunAutomaton WHITESPACE
Acts Similar to WhitespaceTokenizer


KEYWORD

public static final CharacterRunAutomaton KEYWORD
Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...


SIMPLE

public static final CharacterRunAutomaton SIMPLE
Acts like LetterTokenizer.


DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH
See Also:
Constant Field Values
Constructor Detail

MockTokenizer

public MockTokenizer(AttributeSource.AttributeFactory factory,
                     Reader input,
                     CharacterRunAutomaton runAutomaton,
                     boolean lowerCase,
                     int maxTokenLength)

MockTokenizer

public MockTokenizer(Reader input,
                     CharacterRunAutomaton runAutomaton,
                     boolean lowerCase,
                     int maxTokenLength)

MockTokenizer

public MockTokenizer(Reader input,
                     CharacterRunAutomaton runAutomaton,
                     boolean lowerCase)

MockTokenizer

public MockTokenizer(Reader input)
Calls MockTokenizer(Reader, WHITESPACE, true)

Method Detail

incrementToken

public final boolean incrementToken()
                             throws IOException
Specified by:
incrementToken in class TokenStream
Throws:
IOException

readCodePoint

protected int readCodePoint()
                     throws IOException
Throws:
IOException

readChar

protected int readChar()
                throws IOException
Throws:
IOException

isTokenChar

protected boolean isTokenChar(int c)

normalize

protected int normalize(int c)

reset

public void reset()
           throws IOException
Overrides:
reset in class TokenStream
Throws:
IOException

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Overrides:
close in class Tokenizer
Throws:
IOException

end

public void end()
         throws IOException
Overrides:
end in class TokenStream
Throws:
IOException

setEnableChecks

public void setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.



Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.