Package org.apache.lucene.analysis
Class CharacterUtils
- java.lang.Object
-
- org.apache.lucene.analysis.CharacterUtils
-
public final class CharacterUtils extends Object
Utility class to write tokenizers or token filters.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CharacterUtils.CharacterBuffer
A simple IO buffer to use withfill(CharacterBuffer, Reader)
.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static boolean
fill(CharacterUtils.CharacterBuffer buffer, Reader reader)
Convenience method which callsfill(buffer, reader, buffer.buffer.length)
.static boolean
fill(CharacterUtils.CharacterBuffer buffer, Reader reader, int numChars)
Fills theCharacterUtils.CharacterBuffer
with characters read from the given readerReader
.static CharacterUtils.CharacterBuffer
newCharacterBuffer(int bufferSize)
Creates a newCharacterUtils.CharacterBuffer
and allocates achar[]
of the given bufferSize.static int
toChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
Converts a sequence of unicode code points to a sequence of Java characters.static int
toCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
Converts a sequence of Java characters to a sequence of unicode code points.static void
toLowerCase(char[] buffer, int offset, int limit)
Converts each unicode codepoint to lowerCase viaCharacter.toLowerCase(int)
starting at the given offset.static void
toUpperCase(char[] buffer, int offset, int limit)
Converts each unicode codepoint to UpperCase viaCharacter.toUpperCase(int)
starting at the given offset.
-
-
-
Method Detail
-
newCharacterBuffer
public static CharacterUtils.CharacterBuffer newCharacterBuffer(int bufferSize)
Creates a newCharacterUtils.CharacterBuffer
and allocates achar[]
of the given bufferSize.- Parameters:
bufferSize
- the internal char buffer size, must be>= 2
- Returns:
- a new
CharacterUtils.CharacterBuffer
instance.
-
toLowerCase
public static void toLowerCase(char[] buffer, int offset, int limit)
Converts each unicode codepoint to lowerCase viaCharacter.toLowerCase(int)
starting at the given offset.- Parameters:
buffer
- the char buffer to lowercaseoffset
- the offset to start atlimit
- the max char in the buffer to lower case
-
toUpperCase
public static void toUpperCase(char[] buffer, int offset, int limit)
Converts each unicode codepoint to UpperCase viaCharacter.toUpperCase(int)
starting at the given offset.- Parameters:
buffer
- the char buffer to UPPERCASEoffset
- the offset to start atlimit
- the max char in the buffer to lower case
-
toCodePoints
public static int toCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
Converts a sequence of Java characters to a sequence of unicode code points.- Returns:
- the number of code points written to the destination buffer
-
toChars
public static int toChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
Converts a sequence of unicode code points to a sequence of Java characters.- Returns:
- the number of chars written to the destination buffer
-
fill
public static boolean fill(CharacterUtils.CharacterBuffer buffer, Reader reader, int numChars) throws IOException
Fills theCharacterUtils.CharacterBuffer
with characters read from the given readerReader
. This method tries to readnumChars
characters into theCharacterUtils.CharacterBuffer
, each call to fill will start filling the buffer from offset0
up tonumChars
. In case code points can span across 2 java characters, this method may only fillnumChars - 1
characters in order not to split in the middle of a surrogate pair, even if there are remaining characters in theReader
.This method guarantees that the given
CharacterUtils.CharacterBuffer
will never contain a high surrogate character as the last element in the buffer unless it is the last available character in the reader. In other words, high and low surrogate pairs will always be preserved across buffer boarders.A return value of
false
means that this method call exhausted the reader, but there may be some bytes which have been read, which can be verified by checking whetherbuffer.getLength() > 0
.- Parameters:
buffer
- the buffer to fill.reader
- the reader to read characters from.numChars
- the number of chars to read- Returns:
false
if and only if reader.read returned -1 while trying to fill the buffer- Throws:
IOException
- if the reader throws anIOException
.
-
fill
public static boolean fill(CharacterUtils.CharacterBuffer buffer, Reader reader) throws IOException
Convenience method which callsfill(buffer, reader, buffer.buffer.length)
.- Throws:
IOException
-
-