Class ShingleFilter
- java.lang.Object
- 
- org.apache.lucene.util.AttributeSource
- 
- org.apache.lucene.analysis.TokenStream
- 
- org.apache.lucene.analysis.TokenFilter
- 
- org.apache.lucene.analysis.shingle.ShingleFilter
 
 
 
 
- 
- All Implemented Interfaces:
- Closeable,- AutoCloseable,- Unwrappable<TokenStream>
 
 public final class ShingleFilter extends TokenFilter A ShingleFilter constructs shingles (token n-grams) from a token stream. In other words, it creates combinations of tokens as a single token.For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles". This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_"). It does not handle a position increment of 0. 
- 
- 
Nested Class Summary- 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSourceAttributeSource.State
 
- 
 - 
Field SummaryFields Modifier and Type Field Description static StringDEFAULT_FILLER_TOKENfiller token for when positionIncrement is more than 1static intDEFAULT_MAX_SHINGLE_SIZEdefault maximum shingle size is 2.static intDEFAULT_MIN_SHINGLE_SIZEdefault minimum shingle size is 2.static StringDEFAULT_TOKEN_SEPARATORThe default string to use when joining adjacent tokens to form a shinglestatic StringDEFAULT_TOKEN_TYPEdefault token type attribute value is "shingle"- 
Fields inherited from class org.apache.lucene.analysis.TokenFilterinput
 - 
Fields inherited from class org.apache.lucene.analysis.TokenStreamDEFAULT_TOKEN_ATTRIBUTE_FACTORY
 
- 
 - 
Constructor SummaryConstructors Constructor Description ShingleFilter(TokenStream input)Construct a ShingleFilter with default shingle size: 2.ShingleFilter(TokenStream input, int maxShingleSize)Constructs a ShingleFilter with the specified shingle size from theTokenStreaminputShingleFilter(TokenStream input, int minShingleSize, int maxShingleSize)Constructs a ShingleFilter with the specified shingle size from theTokenStreaminputShingleFilter(TokenStream input, String tokenType)Construct a ShingleFilter with the specified token type for shingle tokens and the default shingle size: 2
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description voidend()booleanincrementToken()voidreset()voidsetFillerToken(String fillerToken)Sets the string to insert for each position at which there is no token (i.e., when position increment is greater than one).voidsetMaxShingleSize(int maxShingleSize)Set the max shingle size (default: 2)voidsetMinShingleSize(int minShingleSize)Set the min shingle size (default: 2).voidsetOutputUnigrams(boolean outputUnigrams)Shall the output stream contain the input tokens (unigrams) as well as shingles? (default: true.)voidsetOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles)Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)voidsetTokenSeparator(String tokenSeparator)Sets the string to use when joining adjacent tokens to form a shinglevoidsetTokenType(String tokenType)Set the type of the shingle tokens produced by this filter.- 
Methods inherited from class org.apache.lucene.analysis.TokenFilterclose, unwrap
 - 
Methods inherited from class org.apache.lucene.util.AttributeSourceaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
 
- 
 
- 
- 
- 
Field Detail- 
DEFAULT_FILLER_TOKENpublic static final String DEFAULT_FILLER_TOKEN filler token for when positionIncrement is more than 1- See Also:
- Constant Field Values
 
 - 
DEFAULT_MAX_SHINGLE_SIZEpublic static final int DEFAULT_MAX_SHINGLE_SIZE default maximum shingle size is 2.- See Also:
- Constant Field Values
 
 - 
DEFAULT_MIN_SHINGLE_SIZEpublic static final int DEFAULT_MIN_SHINGLE_SIZE default minimum shingle size is 2.- See Also:
- Constant Field Values
 
 - 
DEFAULT_TOKEN_TYPEpublic static final String DEFAULT_TOKEN_TYPE default token type attribute value is "shingle"- See Also:
- Constant Field Values
 
 - 
DEFAULT_TOKEN_SEPARATORpublic static final String DEFAULT_TOKEN_SEPARATOR The default string to use when joining adjacent tokens to form a shingle- See Also:
- Constant Field Values
 
 
- 
 - 
Constructor Detail- 
ShingleFilterpublic ShingleFilter(TokenStream input, int minShingleSize, int maxShingleSize) Constructs a ShingleFilter with the specified shingle size from theTokenStreaminput- Parameters:
- input- input stream
- minShingleSize- minimum shingle size produced by the filter.
- maxShingleSize- maximum shingle size produced by the filter.
 
 - 
ShingleFilterpublic ShingleFilter(TokenStream input, int maxShingleSize) Constructs a ShingleFilter with the specified shingle size from theTokenStreaminput- Parameters:
- input- input stream
- maxShingleSize- maximum shingle size produced by the filter.
 
 - 
ShingleFilterpublic ShingleFilter(TokenStream input) Construct a ShingleFilter with default shingle size: 2.- Parameters:
- input- input stream
 
 - 
ShingleFilterpublic ShingleFilter(TokenStream input, String tokenType) Construct a ShingleFilter with the specified token type for shingle tokens and the default shingle size: 2- Parameters:
- input- input stream
- tokenType- token type for shingle tokens
 
 
- 
 - 
Method Detail- 
setTokenTypepublic void setTokenType(String tokenType) Set the type of the shingle tokens produced by this filter. (default: "shingle")- Parameters:
- tokenType- token tokenType
 
 - 
setOutputUnigramspublic void setOutputUnigrams(boolean outputUnigrams) Shall the output stream contain the input tokens (unigrams) as well as shingles? (default: true.)- Parameters:
- outputUnigrams- Whether or not the output stream shall contain the input tokens (unigrams)
 
 - 
setOutputUnigramsIfNoShinglespublic void setOutputUnigramsIfNoShingles(boolean outputUnigramsIfNoShingles) Shall we override the behavior of outputUnigrams==false for those times when no shingles are available (because there are fewer than minShingleSize tokens in the input stream)? (default: false.)Note that if outputUnigrams==true, then unigrams are always output, regardless of whether any shingles are available. - Parameters:
- outputUnigramsIfNoShingles- Whether or not to output a single unigram when no shingles are available.
 
 - 
setMaxShingleSizepublic void setMaxShingleSize(int maxShingleSize) Set the max shingle size (default: 2)- Parameters:
- maxShingleSize- max size of output shingles
 
 - 
setMinShingleSizepublic void setMinShingleSize(int minShingleSize) Set the min shingle size (default: 2).This method requires that the passed in minShingleSize is not greater than maxShingleSize, so make sure that maxShingleSize is set before calling this method. The unigram output option is independent of the min shingle size. - Parameters:
- minShingleSize- min size of output shingles
 
 - 
setTokenSeparatorpublic void setTokenSeparator(String tokenSeparator) Sets the string to use when joining adjacent tokens to form a shingle- Parameters:
- tokenSeparator- used to separate input stream tokens in output shingles
 
 - 
setFillerTokenpublic void setFillerToken(String fillerToken) Sets the string to insert for each position at which there is no token (i.e., when position increment is greater than one).- Parameters:
- fillerToken- string to insert at each position where there is no token
 
 - 
incrementTokenpublic boolean incrementToken() throws IOException- Specified by:
- incrementTokenin class- TokenStream
- Throws:
- IOException
 
 - 
endpublic void end() throws IOException- Overrides:
- endin class- TokenFilter
- Throws:
- IOException
 
 - 
resetpublic void reset() throws IOException- Overrides:
- resetin class- TokenFilter
- Throws:
- IOException
 
 
- 
 
-