CJKAnalyzer (Lucene 8.4.1 API)

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.StopwordAnalyzerBase
  - - org.apache.lucene.analysis.cjk.CJKAnalyzer

All Implemented Interfaces:

Closeable, AutoCloseable
```
public final class CJKAnalyzer
extends StopwordAnalyzerBase
```
An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

Since:

3.1

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type Field and Description

static String DEFAULT_STOPWORD_FILE
File containing default CJK stopwords.
- Fields inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
  stopwords
- Fields inherited from class org.apache.lucene.analysis.Analyzer
  GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY

Constructor Summary

Constructors
Constructor and Description
`CJKAnalyzer()` Builds an analyzer which removes words in `getDefaultStopSet()`.
`CJKAnalyzer(CharArraySet stopwords)` Builds an analyzer with the given stop words

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`protected Analyzer.TokenStreamComponents`	`createComponents(String fieldName)`
`static CharArraySet`	`getDefaultStopSet()` Returns an unmodifiable instance of the default stop-words set.
`protected TokenStream`	`normalize(String fieldName, TokenStream in)`

Methods inherited from class org.apache.lucene.analysis.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
attributeFactory, close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, getVersion, initReader, initReaderForNormalization, normalize, setVersion, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_STOPWORD_FILE
```
public static final String DEFAULT_STOPWORD_FILE
```
    File containing default CJK stopwords.
    Currently it contains some common English words that are not usually useful for searching and some double-byte interpunctions.
    
    See Also:
    
    Constant Field Values
- Constructor Detail
  - CJKAnalyzer
```
public CJKAnalyzer()
```
    Builds an analyzer which removes words in getDefaultStopSet().
  - CJKAnalyzer
```
public CJKAnalyzer(CharArraySet stopwords)
```
    Builds an analyzer with the given stop words
    
    Parameters:
    
    stopwords - a stopword set
- Method Detail
  - getDefaultStopSet
```
public static CharArraySet getDefaultStopSet()
```
    Returns an unmodifiable instance of the default stop-words set.
    
    Returns:
    
    an unmodifiable instance of the default stop-words set.
  - createComponents
```
protected Analyzer.TokenStreamComponents createComponents(String fieldName)
```
    Specified by:
    
    createComponents in class Analyzer
  - normalize
```
protected TokenStream normalize(String fieldName,
                                TokenStream in)
```
    Overrides:
    
    normalize in class Analyzer

Skip navigation links

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2020 Apache Software Foundation. All Rights Reserved.