Class WordBreakTestUnicode_12_1_0
WordBreakTest.txt indicates the points in the provided character sequences at which conforming implementations must and must not break words. This class tests for expected token extraction from each of the test sequences in WordBreakTest.txt, where the expected tokens are those character sequences bounded by word breaks and containing at least one character from one of the following character sets:
\p{Script = Han} (From http://www.unicode.org/Public/12.1.0/ucd/Scripts.txt) \p{Script = Hiragana} \p{LineBreak = Complex_Context} (From http://www.unicode.org/Public/12.1.0/ucd/LineBreak.txt) \p{WordBreak = ALetter} (From http://www.unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakProperty.txt) \p{WordBreak = Hebrew_Letter} \p{WordBreak = Katakana} \p{WordBreak = Numeric} \p{Extended_Pictographic} (From http://www.unicode.org/Public/emoji/12.1/emoji-data.txt)
-
Constructor Summary
-
Method Summary
-
Constructor Details
-
WordBreakTestUnicode_12_1_0
public WordBreakTestUnicode_12_1_0()
-
-
Method Details
-
test
- Throws:
Exception
-