public final class UnicodeUtil extends Object
Modifier and Type | Field and Description |
---|---|
static BytesRef |
BIG_TERM
A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms
(e.g.
|
static int |
UNI_REPLACEMENT_CHAR |
static int |
UNI_SUR_HIGH_END |
static int |
UNI_SUR_HIGH_START |
static int |
UNI_SUR_LOW_END |
static int |
UNI_SUR_LOW_START |
Modifier and Type | Method and Description |
---|---|
static int |
codePointCount(BytesRef utf8)
Returns the number of code points in this UTF8 sequence.
|
static String |
newString(int[] codePoints,
int offset,
int count)
Cover JDK 1.5 API.
|
static String |
toHexString(String s) |
static void |
UTF16toUTF8(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at
offset for length chars.
|
static void |
UTF16toUTF8(CharSequence s,
int offset,
int length,
BytesRef result)
Encode characters from this String, starting at offset
for length characters.
|
static void |
UTF8toUTF16(byte[] utf8,
int offset,
int length,
CharsRef chars)
Interprets the given byte array as UTF-8 and converts to UTF-16.
|
static void |
UTF8toUTF16(BytesRef bytesRef,
CharsRef chars)
Utility method for
UTF8toUTF16(byte[], int, int, CharsRef) |
static void |
UTF8toUTF32(BytesRef utf8,
IntsRef utf32)
This method assumes valid UTF8 input.
|
static boolean |
validUTF16String(char[] s,
int size) |
static boolean |
validUTF16String(CharSequence s) |
public static final BytesRef BIG_TERM
WARNING: This is not a valid UTF8 Term
public static final int UNI_SUR_HIGH_START
public static final int UNI_SUR_HIGH_END
public static final int UNI_SUR_LOW_START
public static final int UNI_SUR_LOW_END
public static final int UNI_REPLACEMENT_CHAR
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)
public static boolean validUTF16String(CharSequence s)
public static boolean validUTF16String(char[] s, int size)
public static int codePointCount(BytesRef utf8)
This method assumes valid UTF8 input. This method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
IllegalArgumentException
- If invalid codepoint header byte occurs or the
content is prematurely truncated.public static void UTF8toUTF32(BytesRef utf8, IntsRef utf32)
This method assumes valid UTF8 input. This method does not perform full UTF8 validation, it will check only the first byte of each codepoint (for multi-byte sequences any bytes after the head are skipped).
IllegalArgumentException
- If invalid codepoint header byte occurs or the
content is prematurely truncated.public static String newString(int[] codePoints, int offset, int count)
codePoints
- The code arrayoffset
- The start of the text in the code point arraycount
- The number of code pointsIllegalArgumentException
- If an invalid code point is encounteredIndexOutOfBoundsException
- If the offset or count are out of bounds.public static void UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)
CharsRef
will be extended if
it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
public static void UTF8toUTF16(BytesRef bytesRef, CharsRef chars)
UTF8toUTF16(byte[], int, int, CharsRef)
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.