UnicodeUtil (Lucene 4.0.0 API)

java.lang.Object
- org.apache.lucene.util.UnicodeUtil

```
public final class UnicodeUtil
extends Object
```
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Field Summary

Fields
Modifier and Type	Field and Description
`static BytesRef`	`BIG_TERM` A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms one would normally encounter, and definitely bigger than any UTF-8 terms.
`static int`	`UNI_REPLACEMENT_CHAR`
`static int`	`UNI_SUR_HIGH_END`
`static int`	`UNI_SUR_HIGH_START`
`static int`	`UNI_SUR_LOW_END`
`static int`	`UNI_SUR_LOW_START`

Method Summary

Methods
Modifier and Type	Method and Description
`static int`	`codePointCount(BytesRef utf8)` Returns the number of code points in this utf8 sequence.
`static String`	`newString(int[] codePoints, int offset, int count)` Cover JDK 1.5 API.
`static String`	`toHexString(String s)`
`static void`	`UTF16toUTF8(char[] source, int offset, int length, BytesRef result)` Encode characters from a char[] source, starting at offset for length chars.
`static void`	`UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)` Encode characters from this String, starting at offset for length characters.
`static int`	`UTF16toUTF8WithHash(char[] source, int offset, int length, BytesRef result)` Encode characters from a char[] source, starting at offset for length chars.
`static void`	`UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)` Interprets the given byte array as UTF-8 and converts to UTF-16.
`static void`	`UTF8toUTF16(BytesRef bytesRef, CharsRef chars)` Utility method for `UTF8toUTF16(byte[], int, int, CharsRef)`
`static void`	`UTF8toUTF32(BytesRef utf8, IntsRef utf32)`
`static boolean`	`validUTF16String(char[] s, int size)`
`static boolean`	`validUTF16String(CharSequence s)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - BIG_TERM
```
public static final BytesRef BIG_TERM
```
    A binary term consisting of a number of 0xff bytes, likely to be bigger than other terms one would normally encounter, and definitely bigger than any UTF-8 terms.
    WARNING: This is not a valid UTF8 Term
  - UNI_SUR_HIGH_START
```
public static final int UNI_SUR_HIGH_START
```
    See Also:
    Constant Field Values
  - UNI_SUR_HIGH_END
```
public static final int UNI_SUR_HIGH_END
```
    See Also:
    Constant Field Values
  - UNI_SUR_LOW_START
```
public static final int UNI_SUR_LOW_START
```
    See Also:
    Constant Field Values
  - UNI_SUR_LOW_END
```
public static final int UNI_SUR_LOW_END
```
    See Also:
    Constant Field Values
  - UNI_REPLACEMENT_CHAR
```
public static final int UNI_REPLACEMENT_CHAR
```
    See Also:
    Constant Field Values
- Method Detail
  - UTF16toUTF8WithHash
```
public static int UTF16toUTF8WithHash(char[] source,
                      int offset,
                      int length,
                      BytesRef result)
```
    Encode characters from a char[] source, starting at offset for length chars. Returns a hash of the resulting bytes. After encoding, result.offset will always be 0.
  - UTF16toUTF8
```
public static void UTF16toUTF8(char[] source,
               int offset,
               int length,
               BytesRef result)
```
    Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.
  - UTF16toUTF8
```
public static void UTF16toUTF8(CharSequence s,
               int offset,
               int length,
               BytesRef result)
```
    Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.
  - validUTF16String
```
public static boolean validUTF16String(CharSequence s)
```
  - validUTF16String
```
public static boolean validUTF16String(char[] s,
                       int size)
```
  - codePointCount
```
public static int codePointCount(BytesRef utf8)
```
    Returns the number of code points in this utf8 sequence. Behavior is undefined if the utf8 sequence is invalid.
  - UTF8toUTF32
```
public static void UTF8toUTF32(BytesRef utf8,
               IntsRef utf32)
```
  - newString
```
public static String newString(int[] codePoints,
               int offset,
               int count)
```
    Cover JDK 1.5 API. Create a String from an array of codePoints.
    
    Parameters:
    codePoints - The code array
    offset - The start of the text in the code point array
    count - The number of code points
    
    Returns:
    a String representing the code points between offset and count
    
    Throws:
    
    IllegalArgumentException - If an invalid code point is encountered
    
    IndexOutOfBoundsException - If the offset or count are out of bounds.
  - toHexString
```
public static String toHexString(String s)
```
  - UTF8toUTF16
```
public static void UTF8toUTF16(byte[] utf8,
               int offset,
               int length,
               CharsRef chars)
```
    Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
    NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
  - UTF8toUTF16
```
public static void UTF8toUTF16(BytesRef bytesRef,
               CharsRef chars)
```
    Utility method for UTF8toUTF16(byte[], int, int, CharsRef)
    
    See Also:
    UTF8toUTF16(byte[], int, int, CharsRef)

Class UnicodeUtil

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

BIG_TERM

UNI_SUR_HIGH_START

UNI_SUR_HIGH_END

UNI_SUR_LOW_START

UNI_SUR_LOW_END

UNI_REPLACEMENT_CHAR

Method Detail

UTF16toUTF8WithHash

UTF16toUTF8

UTF16toUTF8

validUTF16String

validUTF16String

codePointCount

UTF8toUTF32

newString

toHexString

UTF8toUTF16

UTF8toUTF16