SoraniNormalizer (Lucene 4.7.2 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

org.apache.lucene.analysis.ckb
Class SoraniNormalizer

java.lang.Object
  org.apache.lucene.analysis.ckb.SoraniNormalizer

Normalizes the Unicode representation of Sorani text.

Normalization consists of:

Alternate forms of 'y' (0064, 0649) are converted to 06CC (FARSI YEH)
Alternate form of 'k' (0643) is converted to 06A9 (KEHEH)
Alternate forms of vowel 'e' (0647+200C, word-final 0647, 0629) are converted to 06D5 (AE)
Alternate (joining) form of 'h' (06BE) is converted to 0647
Alternate forms of 'rr' (0692, word-initial 0631) are converted to 0695 (REH WITH SMALL V BELOW)
Harakat, tatweel, and formatting characters such as directional controls are removed.

Constructor Summary
`SoraniNormalizer()`

Method Summary
`int`	`normalize(char[] s, int len)` Normalize an input buffer of Sorani text

Methods inherited from class java.lang.Object
`clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

public SoraniNormalizer()

Method Detail

public int normalize(char[] s,
                     int len)

Normalize an input buffer of Sorani text