Class PatternTokenizer
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.Tokenizer
-
- org.apache.lucene.analysis.pattern.PatternTokenizer
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class PatternTokenizer extends Tokenizer
This tokenizer uses regex pattern matching to construct distinct tokens for the input stream. It takes two arguments: "pattern" and "group".- "pattern" is the regular expression.
- "group" says which group to extract into tokens.
group=-1 (the default) is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens):
String.split(java.lang.String)
Using group >= 0 selects the matching group as the token. For example, if you have:
pattern = \'([^\']+)\' group = 0 input = aaa 'bbb' 'ccc'
the output will be two tokens: 'bbb' and 'ccc' (including the ' marks). With the same input but using group=1, the output would be: bbb and ccc (no ' marks)NOTE: This Tokenizer does not output tokens that are of zero length.
- See Also:
Pattern
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description PatternTokenizer(Pattern pattern, int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)PatternTokenizer(AttributeFactory factory, Pattern pattern, int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
void
end()
boolean
incrementToken()
void
reset()
-
Methods inherited from class org.apache.lucene.analysis.Tokenizer
correctOffset, setReader
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
PatternTokenizer
public PatternTokenizer(Pattern pattern, int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
-
PatternTokenizer
public PatternTokenizer(AttributeFactory factory, Pattern pattern, int group)
creates a new PatternTokenizer returning tokens from group (-1 for split functionality)
-
-
Method Detail
-
incrementToken
public boolean incrementToken()
- Specified by:
incrementToken
in classTokenStream
-
end
public void end() throws IOException
- Overrides:
end
in classTokenStream
- Throws:
IOException
-
close
public void close() throws IOException
- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Overrides:
close
in classTokenizer
- Throws:
IOException
-
reset
public void reset() throws IOException
- Overrides:
reset
in classTokenizer
- Throws:
IOException
-
-