Class LetterTokenizer

  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.CharTokenizer
                  extended by org.apache.lucene.analysis.LetterTokenizer
Direct Known Subclasses:

public class LetterTokenizer
extends CharTokenizer

A LetterTokenizer is a tokenizer that divides text at non-letters. That's to say, it defines tokens as maximal strings of adjacent letters, as defined by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible job for some Asian languages, where words are not separated by spaces.

Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Field Summary
Fields inherited from class org.apache.lucene.analysis.Tokenizer
Constructor Summary
LetterTokenizer(AttributeSource.AttributeFactory factory, Reader in)
          Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory.
LetterTokenizer(AttributeSource source, Reader in)
          Construct a new LetterTokenizer using a given AttributeSource.
LetterTokenizer(Reader in)
          Construct a new LetterTokenizer.
Method Summary
protected  boolean isTokenChar(char c)
          Collects only characters which satisfy Character.isLetter(char).
Methods inherited from class org.apache.lucene.analysis.CharTokenizer
end, incrementToken, next, next, normalize, reset
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset
Methods inherited from class org.apache.lucene.analysis.TokenStream
getOnlyUseNewAPI, reset, setOnlyUseNewAPI
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, restoreState, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail


public LetterTokenizer(Reader in)
Construct a new LetterTokenizer.


public LetterTokenizer(AttributeSource source,
                       Reader in)
Construct a new LetterTokenizer using a given AttributeSource.


public LetterTokenizer(AttributeSource.AttributeFactory factory,
                       Reader in)
Construct a new LetterTokenizer using a given AttributeSource.AttributeFactory.

Method Detail


protected boolean isTokenChar(char c)
Collects only characters which satisfy Character.isLetter(char).

Specified by:
isTokenChar in class CharTokenizer

Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.