public final class UnicodeUtil extends Object
Modifier and Type | Field and Description |
---|---|
static BytesRef |
BIG_TERM
A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms
one would normally encounter, and definitely bigger than any UTF-8 terms.
|
static int |
UNI_REPLACEMENT_CHAR |
static int |
UNI_SUR_HIGH_END |
static int |
UNI_SUR_HIGH_START |
static int |
UNI_SUR_LOW_END |
static int |
UNI_SUR_LOW_START |
Modifier and Type | Method and Description |
---|---|
static int |
codePointCount(BytesRef utf8)
Returns the number of code points in this utf8
sequence.
|
static String |
newString(int[] codePoints,
int offset,
int count)
Cover JDK 1.5 API.
|
static String |
toHexString(String s) |
static void |
UTF16toUTF8(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at
offset for length chars.
|
static void |
UTF16toUTF8(CharSequence s,
int offset,
int length,
BytesRef result)
Encode characters from this String, starting at offset
for length characters.
|
static int |
UTF16toUTF8WithHash(char[] source,
int offset,
int length,
BytesRef result)
Encode characters from a char[] source, starting at
offset for length chars.
|
static void |
UTF8toUTF16(byte[] utf8,
int offset,
int length,
CharsRef chars)
Interprets the given byte array as UTF-8 and converts to UTF-16.
|
static void |
UTF8toUTF16(BytesRef bytesRef,
CharsRef chars)
Utility method for
UTF8toUTF16(byte[], int, int, CharsRef) |
static void |
UTF8toUTF32(BytesRef utf8,
IntsRef utf32) |
static boolean |
validUTF16String(char[] s,
int size) |
static boolean |
validUTF16String(CharSequence s) |
public static final BytesRef BIG_TERM
WARNING: This is not a valid UTF8 Term
public static final int UNI_SUR_HIGH_START
public static final int UNI_SUR_HIGH_END
public static final int UNI_SUR_LOW_START
public static final int UNI_SUR_LOW_END
public static final int UNI_REPLACEMENT_CHAR
public static int UTF16toUTF8WithHash(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(char[] source, int offset, int length, BytesRef result)
public static void UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)
public static boolean validUTF16String(CharSequence s)
public static boolean validUTF16String(char[] s, int size)
public static int codePointCount(BytesRef utf8)
public static String newString(int[] codePoints, int offset, int count)
codePoints
- The code arrayoffset
- The start of the text in the code point arraycount
- The number of code pointsIllegalArgumentException
- If an invalid code point is encounteredIndexOutOfBoundsException
- If the offset or count are out of bounds.public static void UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)
CharsRef
will be extended if
it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
public static void UTF8toUTF16(BytesRef bytesRef, CharsRef chars)
UTF8toUTF16(byte[], int, int, CharsRef)
Copyright © 2000-2012 Apache Software Foundation. All Rights Reserved.