Class OpenNLPTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.util.SegmentingTokenizerBase
org.apache.lucene.analysis.opennlp.OpenNLPTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
Run OpenNLP SentenceDetector and Tokenizer. The last token in each sentence is marked by setting
the
EOS_FLAG_BIT
in the FlagsAttribute; following filters can use this information to
apply operations to tokens one sentence at a time.-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
buffer, BUFFERMAX, offset
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionOpenNLPTokenizer
(AttributeFactory factory, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp) -
Method Summary
Modifier and TypeMethodDescriptionvoid
close()
protected boolean
void
reset()
protected void
setNextSentence
(int sentenceStart, int sentenceEnd) Methods inherited from class org.apache.lucene.analysis.util.SegmentingTokenizerBase
end, incrementToken, isSafeEnd
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset, setReader, setReaderTestPoint
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Field Details
-
EOS_FLAG_BIT
public static int EOS_FLAG_BIT
-
-
Constructor Details
-
OpenNLPTokenizer
public OpenNLPTokenizer(AttributeFactory factory, NLPSentenceDetectorOp sentenceOp, NLPTokenizerOp tokenizerOp) throws IOException - Throws:
IOException
-
-
Method Details
-
close
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenizer
- Throws:
IOException
-
setNextSentence
protected void setNextSentence(int sentenceStart, int sentenceEnd) - Specified by:
setNextSentence
in classSegmentingTokenizerBase
-
incrementWord
protected boolean incrementWord()- Specified by:
incrementWord
in classSegmentingTokenizerBase
-
reset
- Overrides:
reset
in classSegmentingTokenizerBase
- Throws:
IOException
-