Class SimplePatternSplitTokenizer
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.Tokenizer
org.apache.lucene.analysis.pattern.SimplePatternSplitTokenizer
- All Implemented Interfaces:
Closeable
,AutoCloseable
This tokenizer uses a Lucene
RegExp
or (expert usage) a pre-built determinized Automaton
, to locate tokens. The regexp syntax is more limited than PatternTokenizer
,
but the tokenization is quite a bit faster. This is just like SimplePatternTokenizer
except that the pattern should make valid token separator characters, like String.split
.
Empty string tokens are never produced.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
Nested Class Summary
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
Field Summary
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
Constructor Summary
ConstructorDescriptionSimplePatternSplitTokenizer
(String regexp) SeeRegExp
for the accepted syntax.SimplePatternSplitTokenizer
(AttributeFactory factory, String regexp, int determinizeWorkLimit) SeeRegExp
for the accepted syntax.SimplePatternSplitTokenizer
(AttributeFactory factory, Automaton dfa) Runs a pre-built automaton.Runs a pre-built automaton. -
Method Summary
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader, setReaderTestPoint
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
Constructor Details
-
SimplePatternSplitTokenizer
SeeRegExp
for the accepted syntax. -
SimplePatternSplitTokenizer
Runs a pre-built automaton. -
SimplePatternSplitTokenizer
public SimplePatternSplitTokenizer(AttributeFactory factory, String regexp, int determinizeWorkLimit) SeeRegExp
for the accepted syntax. -
SimplePatternSplitTokenizer
Runs a pre-built automaton.
-
-
Method Details
-
incrementToken
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
end
- Overrides:
end
in classTokenStream
- Throws:
IOException
-
reset
- Overrides:
reset
in classTokenizer
- Throws:
IOException
-