Class JapaneseIterationMarkCharFilter
- java.lang.Object
-
- java.io.Reader
-
- org.apache.lucene.analysis.CharFilter
-
- org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,Readable
public class JapaneseIterationMarkCharFilter extends CharFilter
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though the question mark isn't hiragana.
Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.
The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.
-
-
Field Summary
Fields Modifier and Type Field Description static boolean
NORMALIZE_KANA_DEFAULT
Normalize kana iteration marks by defaultstatic boolean
NORMALIZE_KANJI_DEFAULT
Normalize kanji iteration marks by default-
Fields inherited from class org.apache.lucene.analysis.CharFilter
input
-
-
Constructor Summary
Constructors Constructor Description JapaneseIterationMarkCharFilter(Reader input)
Constructor.JapaneseIterationMarkCharFilter(Reader input, boolean normalizeKanji, boolean normalizeKana)
Constructor
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description protected int
correct(int currentOff)
int
read()
int
read(char[] buffer, int offset, int length)
-
Methods inherited from class org.apache.lucene.analysis.CharFilter
close, correctOffset
-
Methods inherited from class java.io.Reader
mark, markSupported, nullReader, read, read, ready, reset, skip, transferTo
-
-
-
-
Field Detail
-
NORMALIZE_KANJI_DEFAULT
public static final boolean NORMALIZE_KANJI_DEFAULT
Normalize kanji iteration marks by default- See Also:
- Constant Field Values
-
NORMALIZE_KANA_DEFAULT
public static final boolean NORMALIZE_KANA_DEFAULT
Normalize kana iteration marks by default- See Also:
- Constant Field Values
-
-
Constructor Detail
-
JapaneseIterationMarkCharFilter
public JapaneseIterationMarkCharFilter(Reader input)
Constructor. Normalizes both kanji and kana iteration marks by default.- Parameters:
input
- char stream
-
JapaneseIterationMarkCharFilter
public JapaneseIterationMarkCharFilter(Reader input, boolean normalizeKanji, boolean normalizeKana)
Constructor- Parameters:
input
- char streamnormalizeKanji
- indicates whether kanji iteration marks should be normalizednormalizeKana
- indicates whether kana iteration marks should be normalized
-
-
Method Detail
-
read
public int read(char[] buffer, int offset, int length) throws IOException
- Specified by:
read
in classReader
- Throws:
IOException
-
read
public int read() throws IOException
- Overrides:
read
in classReader
- Throws:
IOException
-
correct
protected int correct(int currentOff)
- Specified by:
correct
in classCharFilter
-
-