org.apache.lucene.analysis.pattern
Class PatternCaptureGroupTokenFilter
java.lang.Object
org.apache.lucene.util.AttributeSource
org.apache.lucene.analysis.TokenStream
org.apache.lucene.analysis.TokenFilter
org.apache.lucene.analysis.pattern.PatternCaptureGroupTokenFilter
- All Implemented Interfaces:
- Closeable
public final class PatternCaptureGroupTokenFilter
- extends TokenFilter
CaptureGroup uses Java regexes to emit multiple tokens - one for each capture
group in one or more patterns.
For example, a pattern like:
"(https?://([a-zA-Z\-_0-9.]+))"
when matched against the string "http://www.foo.com/index" would return the
tokens "https://www.foo.com" and "www.foo.com".
If none of the patterns match, or if preserveOriginal is true, the original
token will be preserved.
Each pattern is matched as often as it can be, so the pattern
"(...)"
, when matched against "abcdefghi"
would
produce ["abc","def","ghi"]
A camelCaseFilter could be written as:
"([A-Z]{2,})",
"(?<![A-Z])([A-Z][a-z]+)",
"(?:^|\\b|(?<=[0-9_])|(?<=[A-Z]{2}))([a-z]+)",
"([0-9]+)"
plus if preserveOriginal
is true, it would also return
"camelCaseFilter
Methods inherited from class org.apache.lucene.util.AttributeSource |
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString |
PatternCaptureGroupTokenFilter
public PatternCaptureGroupTokenFilter(TokenStream input,
boolean preserveOriginal,
Pattern... patterns)
- Parameters:
input
- the input TokenStream
preserveOriginal
- set to true to return the original token even if one of the
patterns matchespatterns
- an array of Pattern
objects to match against each token
incrementToken
public boolean incrementToken()
throws IOException
- Specified by:
incrementToken
in class TokenStream
- Throws:
IOException
reset
public void reset()
throws IOException
- Overrides:
reset
in class TokenFilter
- Throws:
IOException
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.