Class PatternCaptureGroupTokenFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.pattern.PatternCaptureGroupTokenFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class PatternCaptureGroupTokenFilter extends TokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture group in one or more patterns.For example, a pattern like:
"(https?://([a-zA-Z\-_0-9.]+))"
when matched against the string "http://www.foo.com/index" would return the tokens "https://www.foo.com" and "www.foo.com".
If none of the patterns match, or if preserveOriginal is true, the original token will be preserved.
Each pattern is matched as often as it can be, so the pattern
"(...)"
, when matched against"abcdefghi"
would produce["abc","def","ghi"]
A camelCaseFilter could be written as:
"([A-Z]{2,})", "(?<![A-Z])([A-Z][a-z]+)", "(?:^|\\b|(?<=[0-9_])|(?<=[A-Z]{2}))([a-z]+)", "([0-9]+)"
plus if
preserveOriginal
is true, it would also return"camelCaseFilter"
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
incrementToken()
void
reset()
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
PatternCaptureGroupTokenFilter
public PatternCaptureGroupTokenFilter(TokenStream input, boolean preserveOriginal, Pattern... patterns)
- Parameters:
input
- the inputTokenStream
preserveOriginal
- set to true to return the original token even if one of the patterns matchespatterns
- an array ofPattern
objects to match against each token
-
-
Method Detail
-
incrementToken
public boolean incrementToken() throws IOException
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
reset
public void reset() throws IOException
- Overrides:
reset
in classTokenFilter
- Throws:
IOException
-
-