UnicodeUtil (Lucene 3.6.0 API)

java.lang.Object
- org.apache.lucene.util.UnicodeUtil

```
public final class UnicodeUtil
extends Object
```
Class to encode java's UTF16 char[] into UTF8 byte[] without always allocating a new byte[] as String.getBytes("UTF-8") does.

NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`UnicodeUtil.UTF16Result` Holds decoded UTF16 code units.
`static class`	`UnicodeUtil.UTF8Result` Holds decoded UTF8 code units.

Field Summary

Fields
Modifier and Type	Field and Description
`static int`	`UNI_REPLACEMENT_CHAR`
`static int`	`UNI_SUR_HIGH_END`
`static int`	`UNI_SUR_HIGH_START`
`static int`	`UNI_SUR_LOW_END`
`static int`	`UNI_SUR_LOW_START`

Method Summary

Methods
Modifier and Type	Method and Description
`static String`	`newString(int[] codePoints, int offset, int count)` Cover JDK 1.5 API.
`static void`	`UTF16toUTF8(char[] source, int offset, int length, BytesRef result)` Encode characters from a char[] source, starting at offset for length chars.
`static void`	`UTF16toUTF8(char[] source, int offset, int length, UnicodeUtil.UTF8Result result)` Encode characters from a char[] source, starting at offset for length chars.
`static void`	`UTF16toUTF8(char[] source, int offset, UnicodeUtil.UTF8Result result)` Encode characters from a char[] source, starting at offset and stopping when the character 0xffff is seen.
`static void`	`UTF16toUTF8(CharSequence s, int offset, int length, BytesRef result)` Encode characters from this String, starting at offset for length characters.
`static void`	`UTF16toUTF8(String s, int offset, int length, UnicodeUtil.UTF8Result result)` Encode characters from this String, starting at offset for length characters.
`static int`	`UTF16toUTF8WithHash(char[] source, int offset, int length, BytesRef result)` Encode characters from a char[] source, starting at offset for length chars.
`static void`	`UTF8toUTF16(byte[] utf8, int offset, int length, CharsRef chars)` Interprets the given byte array as UTF-8 and converts to UTF-16.
`static void`	`UTF8toUTF16(byte[] utf8, int offset, int length, UnicodeUtil.UTF16Result result)` Convert UTF8 bytes into UTF16 characters.
`static void`	`UTF8toUTF16(BytesRef bytesRef, CharsRef chars)` Utility method for `UTF8toUTF16(byte[], int, int, CharsRef)`
`static boolean`	`validUTF16String(CharSequence s)`

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - UNI_SUR_HIGH_START
```
public static final int UNI_SUR_HIGH_START
```
    See Also:
    Constant Field Values
  - UNI_SUR_HIGH_END
```
public static final int UNI_SUR_HIGH_END
```
    See Also:
    Constant Field Values
  - UNI_SUR_LOW_START
```
public static final int UNI_SUR_LOW_START
```
    See Also:
    Constant Field Values
  - UNI_SUR_LOW_END
```
public static final int UNI_SUR_LOW_END
```
    See Also:
    Constant Field Values
  - UNI_REPLACEMENT_CHAR
```
public static final int UNI_REPLACEMENT_CHAR
```
    See Also:
    Constant Field Values
- Method Detail
  - UTF16toUTF8WithHash
```
public static int UTF16toUTF8WithHash(char[] source,
                      int offset,
                      int length,
                      BytesRef result)
```
    Encode characters from a char[] source, starting at offset for length chars. Returns a hash of the resulting bytes. After encoding, result.offset will always be 0.
  - UTF16toUTF8
```
public static void UTF16toUTF8(char[] source,
               int offset,
               UnicodeUtil.UTF8Result result)
```
    Encode characters from a char[] source, starting at offset and stopping when the character 0xffff is seen. Returns the number of bytes written to bytesOut.
  - UTF16toUTF8
```
public static void UTF16toUTF8(char[] source,
               int offset,
               int length,
               UnicodeUtil.UTF8Result result)
```
    Encode characters from a char[] source, starting at offset for length chars. Returns the number of bytes written to bytesOut.
  - UTF16toUTF8
```
public static void UTF16toUTF8(String s,
               int offset,
               int length,
               UnicodeUtil.UTF8Result result)
```
    Encode characters from this String, starting at offset for length characters. Returns the number of bytes written to bytesOut.
  - UTF16toUTF8
```
public static void UTF16toUTF8(CharSequence s,
               int offset,
               int length,
               BytesRef result)
```
    Encode characters from this String, starting at offset for length characters. After encoding, result.offset will always be 0.
  - UTF16toUTF8
```
public static void UTF16toUTF8(char[] source,
               int offset,
               int length,
               BytesRef result)
```
    Encode characters from a char[] source, starting at offset for length chars. After encoding, result.offset will always be 0.
  - UTF8toUTF16
```
public static void UTF8toUTF16(byte[] utf8,
               int offset,
               int length,
               UnicodeUtil.UTF16Result result)
```
    Convert UTF8 bytes into UTF16 characters. If offset is non-zero, conversion starts at that starting point in utf8, re-using the results from the previous call up until offset.
  - newString
```
public static String newString(int[] codePoints,
               int offset,
               int count)
```
    Cover JDK 1.5 API. Create a String from an array of codePoints.
    
    Parameters:
    codePoints - The code array
    offset - The start of the text in the code point array
    count - The number of code points
    
    Returns:
    a String representing the code points between offset and count
    
    Throws:
    
    IllegalArgumentException - If an invalid code point is encountered
    
    IndexOutOfBoundsException - If the offset or count are out of bounds.
  - UTF8toUTF16
```
public static void UTF8toUTF16(byte[] utf8,
               int offset,
               int length,
               CharsRef chars)
```
    Interprets the given byte array as UTF-8 and converts to UTF-16. The CharsRef will be extended if it doesn't provide enough space to hold the worst case of each byte becoming a UTF-16 codepoint.
    NOTE: Full characters are read, even if this reads past the length passed (and can result in an ArrayOutOfBoundsException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed.
  - UTF8toUTF16
```
public static void UTF8toUTF16(BytesRef bytesRef,
               CharsRef chars)
```
    Utility method for UTF8toUTF16(byte[], int, int, CharsRef)
    
    See Also:
    UTF8toUTF16(byte[], int, int, CharsRef)
  - validUTF16String
```
public static boolean validUTF16String(CharSequence s)
```

Class UnicodeUtil

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

UNI_SUR_HIGH_START

UNI_SUR_HIGH_END

UNI_SUR_LOW_START

UNI_SUR_LOW_END

UNI_REPLACEMENT_CHAR

Method Detail

UTF16toUTF8WithHash

UTF16toUTF8

UTF16toUTF8

UTF16toUTF8

UTF16toUTF8

UTF16toUTF8

UTF8toUTF16

newString

UTF8toUTF16

UTF8toUTF16

validUTF16String