org.apache.lucene.analysis
Class MockTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.MockTokenizer
All Implemented Interfaces:
Closeable

public class MockTokenizer
extends org.apache.lucene.analysis.Tokenizer

Tokenizer for testing.

This tokenizer is a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks. This tokenizer has the following behavior:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory
 
Field Summary
static int DEFAULT_MAX_TOKEN_LENGTH
           
static int KEYWORD
          Acts Similar to KeywordTokenizer.
static int SIMPLE
          Acts like LetterTokenizer.
static int WHITESPACE
          Acts Similar to WhitespaceTokenizer
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader input, int pattern, boolean lowerCase, int maxTokenLength)
           
MockTokenizer(Reader input, int pattern, boolean lowerCase)
           
MockTokenizer(Reader input, int pattern, boolean lowerCase, int maxTokenLength)
           
 
Method Summary
 void close()
           
 void end()
           
 boolean incrementToken()
           
protected  boolean isTokenChar(int c)
           
protected  int normalize(int c)
           
protected  int readCodePoint()
           
 void reset()
           
 void reset(Reader input)
           
 void setEnableChecks(boolean enableChecks)
          Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Field Detail

WHITESPACE

public static final int WHITESPACE
Acts Similar to WhitespaceTokenizer

See Also:
Constant Field Values

KEYWORD

public static final int KEYWORD
Acts Similar to KeywordTokenizer. TODO: Keyword returns an "empty" token for an empty reader...

See Also:
Constant Field Values

SIMPLE

public static final int SIMPLE
Acts like LetterTokenizer.

See Also:
Constant Field Values

DEFAULT_MAX_TOKEN_LENGTH

public static final int DEFAULT_MAX_TOKEN_LENGTH
See Also:
Constant Field Values
Constructor Detail

MockTokenizer

public MockTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                     Reader input,
                     int pattern,
                     boolean lowerCase,
                     int maxTokenLength)

MockTokenizer

public MockTokenizer(Reader input,
                     int pattern,
                     boolean lowerCase,
                     int maxTokenLength)

MockTokenizer

public MockTokenizer(Reader input,
                     int pattern,
                     boolean lowerCase)
Method Detail

incrementToken

public final boolean incrementToken()
                             throws IOException
Specified by:
incrementToken in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

readCodePoint

protected int readCodePoint()
                     throws IOException
Throws:
IOException

isTokenChar

protected boolean isTokenChar(int c)

normalize

protected int normalize(int c)

reset

public void reset()
           throws IOException
Overrides:
reset in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

close

public void close()
           throws IOException
Specified by:
close in interface Closeable
Overrides:
close in class org.apache.lucene.analysis.Tokenizer
Throws:
IOException

reset

public void reset(Reader input)
           throws IOException
Overrides:
reset in class org.apache.lucene.analysis.Tokenizer
Throws:
IOException

end

public void end()
         throws IOException
Overrides:
end in class org.apache.lucene.analysis.TokenStream
Throws:
IOException

setEnableChecks

public void setEnableChecks(boolean enableChecks)
Toggle consumer workflow checking: if your test consumes tokenstreams normally you should leave this enabled.



Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.