Class JapaneseIterationMarkCharFilter

java.lang.Object
java.io.Reader
org.apache.lucene.analysis.CharFilter
org.apache.lucene.analysis.ja.JapaneseIterationMarkCharFilter
All Implemented Interfaces:
Closeable, AutoCloseable, Readable

public class JapaneseIterationMarkCharFilter extends CharFilter
Normalizes Japanese horizontal iteration marks (odoriji) to their expanded form.

Sequences of iteration marks are supported. In case an illegal sequence of iteration marks is encountered, the implementation emits the illegal source character as-is without considering its script. For example, with input "?ゝ", we get "??" even though the question mark isn't hiragana.

Note that a full stop punctuation character "。" (U+3002) can not be iterated (see below). Iteration marks themselves can be emitted in case they are illegal, i.e. if they go back past the beginning of the character stream.

The implementation buffers input until a full stop punctuation character (U+3002) or EOF is reached in order to not keep a copy of the character stream in memory. Vertical iteration marks, which are even rarer than horizontal iteration marks in contemporary Japanese, are unsupported.

  • Field Details

    • NORMALIZE_KANJI_DEFAULT

      public static final boolean NORMALIZE_KANJI_DEFAULT
      Normalize kanji iteration marks by default
      See Also:
    • NORMALIZE_KANA_DEFAULT

      public static final boolean NORMALIZE_KANA_DEFAULT
      Normalize kana iteration marks by default
      See Also:
  • Constructor Details

    • JapaneseIterationMarkCharFilter

      public JapaneseIterationMarkCharFilter(Reader input)
      Constructor. Normalizes both kanji and kana iteration marks by default.
      Parameters:
      input - char stream
    • JapaneseIterationMarkCharFilter

      public JapaneseIterationMarkCharFilter(Reader input, boolean normalizeKanji, boolean normalizeKana)
      Constructor
      Parameters:
      input - char stream
      normalizeKanji - indicates whether kanji iteration marks should be normalized
      normalizeKana - indicates whether kana iteration marks should be normalized
  • Method Details