org.apache.lucene.analysis.miscellaneous
Class CodepointCountFilter

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.TokenFilter
              extended by org.apache.lucene.analysis.util.FilteringTokenFilter
                  extended by org.apache.lucene.analysis.miscellaneous.CodepointCountFilter
All Implemented Interfaces:
Closeable

public final class CodepointCountFilter
extends FilteringTokenFilter

Removes words that are too long or too short from the stream.

Note: Length is calculated as the number of Unicode codepoints.


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
version
 
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
 
Constructor Summary
CodepointCountFilter(Version version, TokenStream in, int min, int max)
          Create a new CodepointCountFilter.
 
Method Summary
 boolean accept()
          Override this method and return if the current input token should be returned by FilteringTokenFilter.incrementToken().
 
Methods inherited from class org.apache.lucene.analysis.util.FilteringTokenFilter
end, getEnablePositionIncrements, incrementToken, reset, setEnablePositionIncrements
 
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

CodepointCountFilter

public CodepointCountFilter(Version version,
                            TokenStream in,
                            int min,
                            int max)
Create a new CodepointCountFilter. This will filter out tokens whose CharTermAttribute is either too short (Character.codePointCount(char[], int, int) < min) or too long (Character.codePointCount(char[], int, int) > max).

Parameters:
version - the Lucene match version
in - the TokenStream to consume
min - the minimum length
max - the maximum length
Method Detail

accept

public boolean accept()
Description copied from class: FilteringTokenFilter
Override this method and return if the current input token should be returned by FilteringTokenFilter.incrementToken().

Specified by:
accept in class FilteringTokenFilter


Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.