org.apache.lucene.analysis.ja
Class JapaneseIterationMarkCharFilter

java.lang.Object
  extended by java.io.Reader
      extended by org.apache.lucene.analysis.CharFilter
          extended by org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter
All Implemented Interfaces:
Closeable, Readable

public class JapaneseIterationMarkCharFilter
extends CharFilter

Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.

Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.


Field Summary
static boolean NORMALIZE_KANA_DEFAULT
          Normalize kana iteration marks by default
static boolean NORMALIZE_KANJI_DEFAULT
          Normalize kanji iteration marks by default
 
Fields inherited from class org.apache.lucene.analysis.CharFilter
input
 
Fields inherited from class java.io.Reader
lock
 
Constructor Summary
JapaneseIterationMarkCharFilter(Reader input)
          Constructor.
JapaneseIterationMarkCharFilter(Reader input, boolean normalizeKanji, boolean normalizeKana)
          Constructor
 
Method Summary
protected  int correct(int currentOff)
           
 int read()
          
 int read(char[] buffer, int offset, int length)
          
 
Methods inherited from class org.apache.lucene.analysis.CharFilter
close, correctOffset
 
Methods inherited from class java.io.Reader
mark, markSupported, read, read, ready, reset, skip
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

NORMALIZE_KANJI_DEFAULT

public static final boolean NORMALIZE_KANJI_DEFAULT
Normalize kanji iteration marks by default

See Also:
Constant Field Values

NORMALIZE_KANA_DEFAULT

public static final boolean NORMALIZE_KANA_DEFAULT
Normalize kana iteration marks by default

See Also:
Constant Field Values
Constructor Detail

JapaneseIterationMarkCharFilter

public JapaneseIterationMarkCharFilter(Reader input)
Constructor. Normalizes both kanji and kana iteration marks by default.

Parameters:
input - char stream

JapaneseIterationMarkCharFilter

public JapaneseIterationMarkCharFilter(Reader input,
                                       boolean normalizeKanji,
                                       boolean normalizeKana)
Constructor

Parameters:
input - char stream
normalizeKanji - indicates whether kanji iteration marks should be normalized
normalizeKana - indicates whether kana iteration marks should be normalized
Method Detail

read

public int read(char[] buffer,
                int offset,
                int length)
         throws IOException

Specified by:
read in class Reader
Throws:
IOException

read

public int read()
         throws IOException

Overrides:
read in class Reader
Throws:
IOException

correct

protected int correct(int currentOff)
Specified by:
correct in class CharFilter


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.