CommonGramsFilter (Lucene 5.4.0 API)

java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.TokenFilter
    - - org.apache.lucene.analysis.commongrams.CommonGramsFilter

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class CommonGramsFilter
extends TokenFilter
```
Construct bigrams for frequently occurring terms while indexing. Single terms are still indexed too, with bigrams overlaid. This is achieved through the use of PositionIncrementAttribute.setPositionIncrement(int). Bigrams have a type of GRAM_TYPE Example:
- input:"the quick brown fox"
- output:|"the","the-quick"|"brown"|"fox"|
- "the-quick" has a position increment of 0 so it is in the same position as "the" "the-quick" has a term.type() of "gram"

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary

Fields
Modifier and Type Field and Description

static String GRAM_TYPE
- Fields inherited from class org.apache.lucene.analysis.TokenFilter
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Fields
Modifier and Type	Field and Description
`static String`	`GRAM_TYPE`

Constructor Summary

Constructors
Constructor and Description
`CommonGramsFilter(TokenStream input, CharArraySet commonWords)` Construct a token stream filtering the given input using a Set of common words to create bigrams.

Method Summary

Methods
Modifier and Type Method and Description

boolean incrementToken()
Inserts bigrams for common words into a token stream.

void reset()
- Methods inherited from class org.apache.lucene.analysis.TokenFilter
  close, end
- Methods inherited from class org.apache.lucene.util.AttributeSource
  addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
- Methods inherited from class java.lang.Object
  clone, finalize, getClass, notify, notifyAll, wait, wait, wait

Methods
Modifier and Type	Method and Description
`boolean`	`incrementToken()` Inserts bigrams for common words into a token stream.
`void`	`reset()`

- Field Detail
  - GRAM_TYPE
```
public static final String GRAM_TYPE
```
    See Also:
    Constant Field Values
- Constructor Detail
  - CommonGramsFilter
```
public CommonGramsFilter(TokenStream input,
                 CharArraySet commonWords)
```
    Construct a token stream filtering the given input using a Set of common words to create bigrams. Outputs both unigrams with position increment and bigrams with position increment 0 type=gram where one or both of the words in a potential bigram are in the set of common words .
    
    Parameters:
    input - TokenStream input in filter chain
    commonWords - The set of common words.
- Method Detail
  - incrementToken
```
public boolean incrementToken()
                       throws IOException
```
    Inserts bigrams for common words into a token stream. For each input token, output the token. If the token and/or the following token are in the list of common words also output a bigram with position increment 0 and type="gram" TODO:Consider adding an option to not emit unigram stopwords as in CDL XTF BigramStopFilter, CommonGramsQueryFilter would need to be changed to work with this. TODO: Consider optimizing for the case of three commongrams i.e "man of the year" normally produces 3 bigrams: "man-of", "of-the", "the-year" but with proper management of positions we could eliminate the middle bigram "of-the"and save a disk seek and a whole set of position lookups.
    
    Specified by:
    
    incrementToken in class TokenStream
    
    Throws:
    
    IOException
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class TokenFilter
    
    Throws:
    
    IOException

Class CommonGramsFilter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter

Fields inherited from class org.apache.lucene.analysis.TokenStream

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenFilter

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Field Detail

GRAM_TYPE

Constructor Detail

CommonGramsFilter

Method Detail

incrementToken

reset