|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object com.ibm.icu.text.BreakIterator org.apache.lucene.analysis.icu.segmentation.LaoBreakIterator
public class LaoBreakIterator
Syllable iterator for Lao text.
This breaks Lao text into syllables according to: Syllabification of Lao Script for Line Breaking Phonpasit Phissamay, Valaxay Dalolay, Chitaphone Chanhsililath, Oulaiphone Silimasak, Sarmad Hussain, Nadir Durrani, Science Technology and Environment Agency, CRULP.
Most work is accomplished with RBBI rules, however some additional special logic is needed that cannot be coded in a grammar, and this is implemented here.
For example, what appears to be a final consonant might instead be part of the next syllable. Rules match in a greedy fashion, leaving an illegal sequence that matches no rules.
Take for instance the text ກວ່າດອກ The first rule greedily matches ກວ່າດ, but then ອກ is encountered, which is illegal. What LaoBreakIterator does, according to the paper:
Finally, LaoBreakIterator also takes care of the second concern mentioned in the paper. This is the issue of combining marks being in the wrong order (typos).
Field Summary |
---|
Fields inherited from class com.ibm.icu.text.BreakIterator |
---|
DONE, KIND_CHARACTER, KIND_LINE, KIND_SENTENCE, KIND_TITLE, KIND_WORD |
Constructor Summary | |
---|---|
LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
Creates a new iterator, performing the backtracking verification across the provided rules . |
Method Summary | |
---|---|
LaoBreakIterator |
clone()
Clone method. |
int |
current()
|
int |
first()
|
int |
following(int offset)
|
CharacterIterator |
getText()
|
int |
last()
|
int |
next()
|
int |
next(int n)
|
int |
previous()
|
void |
setText(CharacterIterator text)
|
void |
setText(String newText)
|
Methods inherited from class com.ibm.icu.text.BreakIterator |
---|
getAvailableLocales, getAvailableULocales, getBreakInstance, getCharacterInstance, getCharacterInstance, getCharacterInstance, getLineInstance, getLineInstance, getLineInstance, getLocale, getSentenceInstance, getSentenceInstance, getSentenceInstance, getTitleInstance, getTitleInstance, getTitleInstance, getWordInstance, getWordInstance, getWordInstance, isBoundary, preceding, registerInstance, registerInstance, unregister |
Methods inherited from class java.lang.Object |
---|
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public LaoBreakIterator(com.ibm.icu.text.RuleBasedBreakIterator rules)
rules
.
Method Detail |
---|
public int current()
current
in class com.ibm.icu.text.BreakIterator
public int first()
first
in class com.ibm.icu.text.BreakIterator
public int following(int offset)
following
in class com.ibm.icu.text.BreakIterator
public CharacterIterator getText()
getText
in class com.ibm.icu.text.BreakIterator
public int last()
last
in class com.ibm.icu.text.BreakIterator
public int next()
next
in class com.ibm.icu.text.BreakIterator
public int next(int n)
next
in class com.ibm.icu.text.BreakIterator
public int previous()
previous
in class com.ibm.icu.text.BreakIterator
public void setText(CharacterIterator text)
setText
in class com.ibm.icu.text.BreakIterator
public void setText(String newText)
setText
in class com.ibm.icu.text.BreakIterator
public LaoBreakIterator clone()
clone
in class com.ibm.icu.text.BreakIterator
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |