Class FixedShingleFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.GraphTokenFilter
org.apache.lucene.analysis.shingle.FixedShingleFilter
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Unwrappable<TokenStream>
A FixedShingleFilter constructs shingles (token n-grams) from a token stream. In other words, it
creates combinations of tokens as a single token.
Unlike the ShingleFilter
, FixedShingleFilter only emits shingles of a fixed size, and
never emits unigrams, even at the end of a TokenStream. In addition, if the filter encounters
stacked tokens (eg synonyms), then it will output stacked shingles
For example, the sentence "please divide this sentence into shingles" might be tokenized into shingles "please divide", "divide this", "this sentence", "sentence into", and "into shingles".
This filter handles position increments > 1 by inserting filler tokens (tokens with termtext "_").
- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.GraphTokenFilter
MAX_GRAPH_STACK_SIZE, MAX_TOKEN_CACHE_SIZE
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionFixedShingleFilter
(TokenStream input, int shingleSize) Creates a FixedShingleFilter over an input token streamFixedShingleFilter
(TokenStream input, int shingleSize, String tokenSeparator, String fillerToken) Creates a FixedShingleFilter over an input token stream -
Method Summary
Methods inherited from class org.apache.lucene.analysis.GraphTokenFilter
end, getTrailingPositions, incrementBaseToken, incrementGraph, incrementGraphToken, reset
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, unwrap
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
FixedShingleFilter
Creates a FixedShingleFilter over an input token stream- Parameters:
input
- the input streamshingleSize
- the shingle size
-
FixedShingleFilter
public FixedShingleFilter(TokenStream input, int shingleSize, String tokenSeparator, String fillerToken) Creates a FixedShingleFilter over an input token stream- Parameters:
input
- the input tokenstreamshingleSize
- the shingle sizetokenSeparator
- a String to use as a token separatorfillerToken
- a String to use to represent gaps in the input stream (due to eg stopwords)
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-