Class WordBreakTestUnicode_12_1_0


public class WordBreakTestUnicode_12_1_0 extends BaseTokenStreamTestCase
This class was automatically generated by generateJavaUnicodeWordBreakTest.pl from: http://www.unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakTest.txt

WordBreakTest.txt indicates the points in the provided character sequences at which conforming implementations must and must not break words. This class tests for expected token extraction from each of the test sequences in WordBreakTest.txt, where the expected tokens are those character sequences bounded by word breaks and containing at least one character from one of the following character sets:

\p{Script = Han} (From http://www.unicode.org/Public/12.1.0/ucd/Scripts.txt) \p{Script = Hiragana} \p{LineBreak = Complex_Context} (From http://www.unicode.org/Public/12.1.0/ucd/LineBreak.txt) \p{WordBreak = ALetter} (From http://www.unicode.org/Public/12.1.0/ucd/auxiliary/WordBreakProperty.txt) \p{WordBreak = Hebrew_Letter} \p{WordBreak = Katakana} \p{WordBreak = Numeric} \p{Extended_Pictographic} (From http://www.unicode.org/Public/emoji/12.1/emoji-data.txt)

  • Constructor Details

    • WordBreakTestUnicode_12_1_0

      public WordBreakTestUnicode_12_1_0()
  • Method Details