|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object java.io.Reader org.apache.lucene.analysis.CharFilter org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter
public class JapaneseIterationMarkCharFilter
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.
Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though "?" isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.
The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.
Field Summary | |
---|---|
static boolean |
NORMALIZE_KANA_DEFAULT
Normalize kana iteration marks by default |
static boolean |
NORMALIZE_KANJI_DEFAULT
Normalize kanji iteration marks by default |
Fields inherited from class org.apache.lucene.analysis.CharFilter |
---|
input |
Fields inherited from class java.io.Reader |
---|
lock |
Constructor Summary | |
---|---|
JapaneseIterationMarkCharFilter(Reader input)
Constructor. |
|
JapaneseIterationMarkCharFilter(Reader input,
boolean normalizeKanji,
boolean normalizeKana)
Constructor |
Method Summary | |
---|---|
protected int |
correct(int currentOff)
|
int |
read()
|
int |
read(char[] buffer,
int offset,
int length)
|
Methods inherited from class org.apache.lucene.analysis.CharFilter |
---|
close, correctOffset |
Methods inherited from class java.io.Reader |
---|
mark, markSupported, read, read, ready, reset, skip |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final boolean NORMALIZE_KANJI_DEFAULT
public static final boolean NORMALIZE_KANA_DEFAULT
Constructor Detail |
---|
public JapaneseIterationMarkCharFilter(Reader input)
input
- char streampublic JapaneseIterationMarkCharFilter(Reader input, boolean normalizeKanji, boolean normalizeKana)
input
- char streamnormalizeKanji
- indicates whether kanji iteration marks should be normalizednormalizeKana
- indicates whether kana iteration marks should be normalizedMethod Detail |
---|
public int read(char[] buffer, int offset, int length) throws IOException
read
in class Reader
IOException
public int read() throws IOException
read
in class Reader
IOException
protected int correct(int currentOff)
correct
in class CharFilter
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |