NGramTokenFilter (Lucene 4.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.ngram
Class NGramTokenFilter

java.lang.Object
  org.apache.lucene.util.AttributeSource
      org.apache.lucene.analysis.TokenStream
          org.apache.lucene.analysis.TokenFilter
              org.apache.lucene.analysis.ngram.NGramTokenFilter

All Implemented Interfaces:: Closeable

public final class NGramTokenFilter
extends TokenFilter
extends TokenFilter

Tokenizes the input into n-grams of the given size(s).

You must specify the required Version compatibility when creating a NGramTokenFilter. As of Lucene 4.4, this token filters:

handles supplementary characters correctly,
emits all n-grams for the same token at the same position,
does not modify offsets,
sorts n-grams by their offset in the original token first, then increasing length (meaning that "abc" will give "a", "ab", "abc", "b", "bc", "c").

You can make this filter use the old behavior by providing a version < Version.LUCENE_44 in the constructor but this is not recommended as it will lead to broken TokenStreams that will cause highlighting bugs.

If you were using this TokenFilter to perform partial highlighting, this won't work anymore since this filter doesn't update offsets. You should modify your analysis chain to use NGramTokenizer, and potentially override NGramTokenizer.isTokenChar(int) to perform pre-tokenization.

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
`AttributeSource.AttributeFactory, AttributeSource.State`

Field Summary
`static int`	`DEFAULT_MAX_NGRAM_SIZE`
`static int`	`DEFAULT_MIN_NGRAM_SIZE`

Fields inherited from class org.apache.lucene.analysis.TokenFilter
`input`

Constructor Summary
`NGramTokenFilter(Version version, TokenStream input)` Creates NGramTokenFilter with default min and max n-grams.
`NGramTokenFilter(Version version, TokenStream input, int minGram, int maxGram)` Creates NGramTokenFilter with given min and max n-grams.

Method Summary
`boolean`	`incrementToken()` Returns the next token in the stream, or null at EOS.
`void`	`reset()`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
`close, end`

Methods inherited from class org.apache.lucene.util.AttributeSource
`addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState`

Methods inherited from class java.lang.Object
`clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait`

Field Detail

DEFAULT_MIN_NGRAM_SIZE

public static final int DEFAULT_MIN_NGRAM_SIZE

See Also:: Constant Field Values

DEFAULT_MAX_NGRAM_SIZE

public static final int DEFAULT_MAX_NGRAM_SIZE

See Also:: Constant Field Values

Constructor Detail