PersianAnalyzer (Lucene 4.0.0 API)

java.lang.Object
- org.apache.lucene.analysis.Analyzer
- - org.apache.lucene.analysis.util.StopwordAnalyzerBase
  - - org.apache.lucene.analysis.fa.PersianAnalyzer

All Implemented Interfaces:

Closeable
```
public final class PersianAnalyzer
extends StopwordAnalyzerBase
```
Analyzer for Persian.
This Analyzer uses PersianCharFilter which implies tokenizing around zero-width non-joiner in addition to whitespace. Some persian-specific variant forms (such as farsi yeh and keheh) are standardized. "Stemming" is accomplished via stopwords.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer
  Analyzer.GlobalReuseStrategy, Analyzer.PerFieldReuseStrategy, Analyzer.ReuseStrategy, Analyzer.TokenStreamComponents

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`DEFAULT_STOPWORD_FILE` File containing default Persian stopwords.
`static String`	`STOPWORDS_COMMENT` The comment character in the stopwords file.

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
matchVersion, stopwords

Constructor Summary

Constructors
Constructor and Description
`PersianAnalyzer(Version matchVersion)` Builds an analyzer with the default stop words: `DEFAULT_STOPWORD_FILE`.
`PersianAnalyzer(Version matchVersion, CharArraySet stopwords)` Builds an analyzer with the given stop words

Method Summary

Methods
Modifier and Type	Method and Description
`protected Analyzer.TokenStreamComponents`	`createComponents(String fieldName, Reader reader)` Creates `Analyzer.TokenStreamComponents` used to tokenize all the text in the provided `Reader`.
`static CharArraySet`	`getDefaultStopSet()` Returns an unmodifiable instance of the default stop-words set.
`protected Reader`	`initReader(String fieldName, Reader reader)` Wraps the Reader with `PersianCharFilter`

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase
getStopwordSet, loadStopwordSet, loadStopwordSet, loadStopwordSet

Methods inherited from class org.apache.lucene.analysis.Analyzer
close, getOffsetGap, getPositionIncrementGap, tokenStream

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - DEFAULT_STOPWORD_FILE
```
public static final String DEFAULT_STOPWORD_FILE
```
    File containing default Persian stopwords. Default stopword list is from http://members.unine.ch/jacques.savoy/clef/index.html The stopword list is BSD-Licensed.
    
    See Also:
    Constant Field Values
  - STOPWORDS_COMMENT
```
public static final String STOPWORDS_COMMENT
```
    The comment character in the stopwords file. All lines prefixed with this will be ignored
    
    See Also:
    Constant Field Values
- Constructor Detail
  - PersianAnalyzer
```
public PersianAnalyzer(Version matchVersion)
```
    Builds an analyzer with the default stop words: DEFAULT_STOPWORD_FILE.
  - PersianAnalyzer
```
public PersianAnalyzer(Version matchVersion,
               CharArraySet stopwords)
```
    Builds an analyzer with the given stop words
    
    Parameters:
    matchVersion - lucene compatibility version
    stopwords - a stopword set
- Method Detail
  - getDefaultStopSet
```
public static CharArraySet getDefaultStopSet()
```
    Returns an unmodifiable instance of the default stop-words set.
    
    Returns:
    an unmodifiable instance of the default stop-words set.
  - createComponents
```
protected Analyzer.TokenStreamComponents createComponents(String fieldName,
                                              Reader reader)
```
    Creates Analyzer.TokenStreamComponents used to tokenize all the text in the provided Reader.
    
    Specified by:
    
    createComponents in class Analyzer
    
    Returns:
    Analyzer.TokenStreamComponents built from a StandardTokenizer filtered with LowerCaseFilter, ArabicNormalizationFilter, PersianNormalizationFilter and Persian Stop words
  - initReader
```
protected Reader initReader(String fieldName,
                Reader reader)
```
    Wraps the Reader with PersianCharFilter
    
    Overrides:
    
    initReader in class Analyzer

Class PersianAnalyzer

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.analysis.Analyzer

Field Summary

Fields inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.util.StopwordAnalyzerBase

Methods inherited from class org.apache.lucene.analysis.Analyzer

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_STOPWORD_FILE

STOPWORDS_COMMENT

Constructor Detail

PersianAnalyzer

PersianAnalyzer

Method Detail

getDefaultStopSet

createComponents

initReader