org.apache.lucene.analysis.util
public abstract class CharacterUtils extends Object
CharacterUtils
provides a unified interface to Character-related
operations to implement backwards compatible character operations based on a
Version
instance.Modifier and Type | Class and Description |
---|---|
static class |
CharacterUtils.CharacterBuffer
A simple IO buffer to use with
fill(CharacterBuffer, Reader) . |
Constructor and Description |
---|
CharacterUtils() |
Modifier and Type | Method and Description |
---|---|
abstract int |
codePointAt(char[] chars,
int offset,
int limit)
Returns the code point at the given index of the char array where only elements
with index less than the limit are used.
|
abstract int |
codePointAt(CharSequence seq,
int offset)
Returns the code point at the given index of the
CharSequence . |
abstract int |
codePointCount(CharSequence seq)
Return the number of characters in
seq . |
boolean |
fill(CharacterUtils.CharacterBuffer buffer,
Reader reader)
Convenience method which calls
fill(buffer, reader, buffer.buffer.length) . |
abstract boolean |
fill(CharacterUtils.CharacterBuffer buffer,
Reader reader,
int numChars)
Fills the
CharacterUtils.CharacterBuffer with characters read from the given
reader Reader . |
static CharacterUtils |
getInstance(Version matchVersion)
Returns a
CharacterUtils implementation according to the given
Version instance. |
static CharacterUtils |
getJava4Instance()
Return a
CharacterUtils instance compatible with Java 1.4. |
static CharacterUtils.CharacterBuffer |
newCharacterBuffer(int bufferSize)
Creates a new
CharacterUtils.CharacterBuffer and allocates a char[]
of the given bufferSize. |
abstract int |
offsetByCodePoints(char[] buf,
int start,
int count,
int index,
int offset)
Return the index within
buf[start:start+count] which is by offset
code points from index . |
int |
toChars(int[] src,
int srcOff,
int srcLen,
char[] dest,
int destOff)
Converts a sequence of unicode code points to a sequence of Java characters.
|
int |
toCodePoints(char[] src,
int srcOff,
int srcLen,
int[] dest,
int destOff)
Converts a sequence of Java characters to a sequence of unicode code points.
|
void |
toLowerCase(char[] buffer,
int offset,
int limit)
Converts each unicode codepoint to lowerCase via
Character.toLowerCase(int) starting
at the given offset. |
public static CharacterUtils getInstance(Version matchVersion)
CharacterUtils
implementation according to the given
Version
instance.matchVersion
- a version instanceCharacterUtils
implementation according to the given
Version
instance.public static CharacterUtils getJava4Instance()
CharacterUtils
instance compatible with Java 1.4.public abstract int codePointAt(CharSequence seq, int offset)
CharSequence
.
Depending on the Version
passed to
getInstance(Version)
this method mimics the behavior
of Character.codePointAt(char[], int)
as it would have been
available on a Java 1.4 JVM or on a later virtual machine version.seq
- a character sequenceoffset
- the offset to the char values in the chars array to be convertedNullPointerException
- - if the sequence is null.IndexOutOfBoundsException
- - if the value offset is negative or not less than the length of
the character sequence.public abstract int codePointAt(char[] chars, int offset, int limit)
Version
passed to
getInstance(Version)
this method mimics the behavior
of Character.codePointAt(char[], int)
as it would have been
available on a Java 1.4 JVM or on a later virtual machine version.chars
- a character arrayoffset
- the offset to the char values in the chars array to be convertedlimit
- the index afer the last element that should be used to calculate
codepoint.NullPointerException
- - if the array is null.IndexOutOfBoundsException
- - if the value offset is negative or not less than the length of
the char array.public abstract int codePointCount(CharSequence seq)
seq
.public static CharacterUtils.CharacterBuffer newCharacterBuffer(int bufferSize)
CharacterUtils.CharacterBuffer
and allocates a char[]
of the given bufferSize.bufferSize
- the internal char buffer size, must be >= 2
CharacterUtils.CharacterBuffer
instance.public final void toLowerCase(char[] buffer, int offset, int limit)
Character.toLowerCase(int)
starting
at the given offset.buffer
- the char buffer to lowercaseoffset
- the offset to start atlimit
- the max char in the buffer to lower casepublic final int toCodePoints(char[] src, int srcOff, int srcLen, int[] dest, int destOff)
public final int toChars(int[] src, int srcOff, int srcLen, char[] dest, int destOff)
public abstract boolean fill(CharacterUtils.CharacterBuffer buffer, Reader reader, int numChars) throws IOException
CharacterUtils.CharacterBuffer
with characters read from the given
reader Reader
. This method tries to read numChars
characters into the CharacterUtils.CharacterBuffer
, each call to fill will start
filling the buffer from offset 0
up to numChars
.
In case code points can span across 2 java characters, this method may
only fill numChars - 1
characters in order not to split in
the middle of a surrogate pair, even if there are remaining characters in
the Reader
.
Depending on the Version
passed to
getInstance(Version)
this method implements
supplementary character awareness when filling the given buffer. For all
Version
> 3.0 fill(CharacterBuffer, Reader, int)
guarantees
that the given CharacterUtils.CharacterBuffer
will never contain a high surrogate
character as the last element in the buffer unless it is the last available
character in the reader. In other words, high and low surrogate pairs will
always be preserved across buffer boarders.
A return value of false
means that this method call exhausted
the reader, but there may be some bytes which have been read, which can be
verified by checking whether buffer.getLength() > 0
.
buffer
- the buffer to fill.reader
- the reader to read characters from.numChars
- the number of chars to readfalse
if and only if reader.read returned -1 while trying to fill the bufferIOException
- if the reader throws an IOException
.public final boolean fill(CharacterUtils.CharacterBuffer buffer, Reader reader) throws IOException
fill(buffer, reader, buffer.buffer.length)
.IOException
public abstract int offsetByCodePoints(char[] buf, int start, int count, int index, int offset)
buf[start:start+count]
which is by offset
code points from index
.Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.