public final class TeeSinkTokenFilter extends TokenFilter
It is also useful for doing things like entity extraction or proper noun analysis as part of the analysis workflow and saving off those tokens for use in another field.
TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer());
TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream();
TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream();
TokenStream final1 = new LowerCaseFilter(source1);
TokenStream final2 = new EntityDetect(sink1);
TokenStream final3 = new URLDetect(sink2);
d.add(new TextField("f1", final1));
d.add(new TextField("f2", final2));
d.add(new TextField("f3", final3));
In this example, sink1 and sink2 will both get tokens from source1 after whitespace
tokenization, and will further do additional token filtering, e.g. detect entities and URLs.
NOTE: it is important, that tees are consumed before sinks, therefore you should add them to the document
before the sinks. In the above example, f1 is added before the other fields, and so by the time they are
processed, it has already been consumed, which is the correct way to index the three streams. If for some reason you
cannot ensure that, you should call consumeAllTokens() before adding the sinks to document fields.
| Modifier and Type | Class and Description |
|---|---|
static class |
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee.
|
AttributeSource.StateinputDEFAULT_TOKEN_ATTRIBUTE_FACTORY| Constructor and Description |
|---|
TeeSinkTokenFilter(TokenStream input) |
| Modifier and Type | Method and Description |
|---|---|
void |
consumeAllTokens()
TeeSinkTokenFilter passes all tokens to the added sinks when itself is consumed. |
void |
end() |
boolean |
incrementToken() |
TokenStream |
newSinkTokenStream()
Returns a new
TeeSinkTokenFilter.SinkTokenStream that receives all tokens consumed by this stream. |
void |
reset() |
closeaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toStringpublic TeeSinkTokenFilter(TokenStream input)
public TokenStream newSinkTokenStream()
TeeSinkTokenFilter.SinkTokenStream that receives all tokens consumed by this stream.public void consumeAllTokens()
throws IOException
TeeSinkTokenFilter passes all tokens to the added sinks when itself is consumed. To be sure that all
tokens from the input stream are passed to the sinks, you can call this methods. This instance is exhausted after
this method returns, but all sinks are instant available.IOExceptionpublic boolean incrementToken()
throws IOException
incrementToken in class TokenStreamIOExceptionpublic final void end()
throws IOException
end in class TokenFilterIOExceptionpublic void reset()
throws IOException
reset in class TokenFilterIOExceptionCopyright © 2000-2016 Apache Software Foundation. All Rights Reserved.