CJKAnalyzer (Lucene 4.7.1 API)

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.util.StopwordAnalyzerBase
  - - org.apache.lucene.analysis.cjk.CJKAnalyzer

All Implemented Interfaces:

Closeable
```
public final class CJKAnalyzer
extends StopwordAnalyzerBase
```
An Analyzer that tokenizes text with StandardTokenizer, normalizes content with CJKWidthFilter, folds case with LowerCaseFilter, forms bigrams of CJK with CJKBigramFilter, and filters stopwords with StopFilter

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type Field and Description

static String DEFAULT_STOPWORD_FILE
File containing default CJK stopwords.
- Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
  matchVersion, stopwords
- Fields inherited from class org.apache.lucene.analysis.Analyzer
  GLOBAL_REUSE_STRATEGY, PER_FIELD_REUSE_STRATEGY

Constructor Summary

Constructors
Constructor and Description
`CJKAnalyzer(Version matchVersion)` Builds an analyzer which removes words in `getDefaultStopSet()`.
`CJKAnalyzer(Version matchVersion, CharArraySet stopwords)` Builds an analyzer with the given stop words

Method Summary

Methods
Modifier and Type	Method and Description
`protected Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)`
`static CharArraySet`	`getDefaultStopSet()` Returns an unmodifiable instance of the default stop-words set.

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, getReuseStrategy, initReader, tokenStream, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_STOPWORD_FILE
```
public static final String DEFAULT_STOPWORD_FILE
```
    File containing default CJK stopwords.
    Currently it contains some common English words that are not usually useful for searching and some double-byte interpunctions.
    
    See Also:
    Constant Field Values
- Constructor Detail
  - CJKAnalyzer
```
public CJKAnalyzer(Version matchVersion)
```
    Builds an analyzer which removes words in getDefaultStopSet().
  - CJKAnalyzer
```
public CJKAnalyzer(Version matchVersion,
           CharArraySet stopwords)
```
    Builds an analyzer with the given stop words
    
    Parameters:
    matchVersion - lucene compatibility version
    stopwords - a stopword set
- Method Detail
  - getDefaultStopSet
```
public static CharArraySet getDefaultStopSet()
```
    Returns an unmodifiable instance of the default stop-words set.
    
    Returns:
    an unmodifiable instance of the default stop-words set.
  - createComponents
```
protected Analyzer.TokenStreamComponents createComponents(String fieldName,
                                              Reader reader)
```
    Specified by:
    
    createComponents in class Analyzer

Prev Class
Next Class

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.