org.apache.lucene.analysis.ar
Class ArabicLetterTokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.CharTokenizer
                  extended by org.apache.lucene.analysis.LetterTokenizer
                      extended by org.apache.lucene.analysis.ar.ArabicLetterTokenizer
All Implemented Interfaces:
Closeable

Deprecated. (3.1) Use StandardTokenizer instead.

@Deprecated
public class ArabicLetterTokenizer
extends org.apache.lucene.analysis.LetterTokenizer

Tokenizer that breaks text into runs of letters and diacritics.

The problem with the standard Letter tokenizer is that it fails on diacritics. Handling similar to this is necessary for Indic Scripts, Hebrew, Thaana, etc.

You must specify the required Version compatibility when creating ArabicLetterTokenizer:


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
 
Field Summary
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
          Deprecated. use ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader) instead. This will be removed in Lucene 4.0.
ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source, Reader in)
          Deprecated. use ArabicLetterTokenizer(Version, AttributeSource, Reader) instead. This will be removed in Lucene 4.0.
ArabicLetterTokenizer(Reader in)
          Deprecated. use ArabicLetterTokenizer(Version, Reader) instead. This will be removed in Lucene 4.0.
ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource.AttributeFactory factory, Reader in)
          Deprecated. Construct a new ArabicLetterTokenizer using a given AttributeSource.AttributeFactory.
ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion, org.apache.lucene.util.AttributeSource source, Reader in)
          Deprecated. Construct a new ArabicLetterTokenizer using a given AttributeSource.
ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion, Reader in)
          Deprecated. Construct a new ArabicLetterTokenizer.
 
Method Summary
protected  boolean isTokenChar(int c)
          Deprecated. Allows for Letter category or NonspacingMark category
 
Methods inherited from class org.apache.lucene.analysis.CharTokenizer
end, incrementToken, isTokenChar, normalize, normalize, reset
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
reset
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
 

Constructor Detail

ArabicLetterTokenizer

public ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion,
                             Reader in)
Deprecated. 
Construct a new ArabicLetterTokenizer.

Parameters:
matchVersion - Lucene version to match See above
in - the input to split up into tokens

ArabicLetterTokenizer

public ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion,
                             org.apache.lucene.util.AttributeSource source,
                             Reader in)
Deprecated. 
Construct a new ArabicLetterTokenizer using a given AttributeSource.

Parameters:
matchVersion - Lucene version to match See above
source - the attribute source to use for this Tokenizer
in - the input to split up into tokens

ArabicLetterTokenizer

public ArabicLetterTokenizer(org.apache.lucene.util.Version matchVersion,
                             org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                             Reader in)
Deprecated. 
Construct a new ArabicLetterTokenizer using a given AttributeSource.AttributeFactory. * @param matchVersion Lucene version to match See above

Parameters:
factory - the attribute factory to use for this Tokenizer
in - the input to split up into tokens

ArabicLetterTokenizer

@Deprecated
public ArabicLetterTokenizer(Reader in)
Deprecated. use ArabicLetterTokenizer(Version, Reader) instead. This will be removed in Lucene 4.0.

Construct a new ArabicLetterTokenizer.


ArabicLetterTokenizer

@Deprecated
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource source,
                                        Reader in)
Deprecated. use ArabicLetterTokenizer(Version, AttributeSource, Reader) instead. This will be removed in Lucene 4.0.

Construct a new ArabicLetterTokenizer using a given AttributeSource.


ArabicLetterTokenizer

@Deprecated
public ArabicLetterTokenizer(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                                        Reader in)
Deprecated. use ArabicLetterTokenizer(Version, AttributeSource.AttributeFactory, Reader) instead. This will be removed in Lucene 4.0.

Construct a new ArabicLetterTokenizer using a given AttributeSource.AttributeFactory.

Method Detail

isTokenChar

protected boolean isTokenChar(int c)
Deprecated. 
Allows for Letter category or NonspacingMark category

Overrides:
isTokenChar in class org.apache.lucene.analysis.LetterTokenizer
See Also:
LetterTokenizer.isTokenChar(int)


Copyright © 2000-2011 Apache Software Foundation. All Rights Reserved.