Class CodepointCountFilter

  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.util.FilteringTokenFilter
                  extended by org.apache.lucene.analysis.miscellaneous.CodepointCountFilter
All Implemented Interfaces:

public final class CodepointCountFilter
extends FilteringTokenFilter

Removes words that are too long or too short from the stream.

Note: Length is calculated as the number of Unicode codepoints.

Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
Field Summary
Fields inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
Fields inherited from class org.apache.lucene.analysis.TokenFilter
Constructor Summary
CodepointCountFilter(Version version, TokenStream in, int min, int max)
          Create a new CodepointCountFilter.
Method Summary
 boolean accept()
          Override this method and return if the current input token should be returned by FilteringTokenFilter.incrementToken().
Methods inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
end, getEnablePositionIncrements, incrementToken, reset, setEnablePositionIncrements
Methods inherited from class org.apache.lucene.analysis.TokenFilter
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Constructor Detail


public CodepointCountFilter(Version version,
                            TokenStream in,
                            int min,
                            int max)
Create a new CodepointCountFilter. This will filter out tokens whose CharTermAttribute is either too short (Character.codePointCount(char[], int, int) < min) or too long (Character.codePointCount(char[], int, int) > max).

version - the Lucene match version
in - the TokenStream to consume
min - the minimum length
max - the maximum length
Method Detail


public boolean accept()
Description copied from class: FilteringTokenFilter
Override this method and return if the current input token should be returned by FilteringTokenFilter.incrementToken().

Specified by:
accept in class FilteringTokenFilter

Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.