Class TeeSinkTokenFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.sinks.TeeSinkTokenFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class TeeSinkTokenFilter extends TokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed. This is useful in situations where multiple fields share many common analysis steps and then go their separate ways.It is also useful for doing things like entity extraction or proper noun analysis as part of the analysis workflow and saving off those tokens for use in another field.
TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer()); TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream(); TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream(); TokenStream final1 = new LowerCaseFilter(source1); TokenStream final2 = new EntityDetect(sink1); TokenStream final3 = new URLDetect(sink2); d.add(new TextField("f1", final1)); d.add(new TextField("f2", final2)); d.add(new TextField("f3", final3));
In this example,
sink1
andsink2
will both get tokens fromsource1
after whitespace tokenization, and will further do additional token filtering, e.g. detect entities and URLs.NOTE: it is important, that tees are consumed before sinks, therefore you should add them to the document before the sinks. In the above example, f1 is added before the other fields, and so by the time they are processed, it has already been consumed, which is the correct way to index the three streams. If for some reason you cannot ensure that, you should call
consumeAllTokens()
before adding the sinks to document fields.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TeeSinkTokenFilter.SinkTokenStream
TokenStream output from a tee.-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
Fields inherited from class org.apache.lucene.analysis.TokenStream
DEFAULT_TOKEN_ATTRIBUTE_FACTORY
-
-
Constructor Summary
Constructors Constructor Description TeeSinkTokenFilter(TokenStream input)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
consumeAllTokens()
TeeSinkTokenFilter
passes all tokens to the added sinks when itself is consumed.void
end()
boolean
incrementToken()
TokenStream
newSinkTokenStream()
Returns a newTeeSinkTokenFilter.SinkTokenStream
that receives all tokens consumed by this stream.void
reset()
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString
-
-
-
-
Constructor Detail
-
TeeSinkTokenFilter
public TeeSinkTokenFilter(TokenStream input)
-
-
Method Detail
-
newSinkTokenStream
public TokenStream newSinkTokenStream()
Returns a newTeeSinkTokenFilter.SinkTokenStream
that receives all tokens consumed by this stream.
-
consumeAllTokens
public void consumeAllTokens() throws IOException
TeeSinkTokenFilter
passes all tokens to the added sinks when itself is consumed. To be sure that all tokens from the input stream are passed to the sinks, you can call this methods. This instance is exhausted after this method returns, but all sinks are instant available.- Throws:
IOException
-
incrementToken
public boolean incrementToken() throws IOException
- Specified by:
incrementToken
in classTokenStream
- Throws:
IOException
-
end
public final void end() throws IOException
- Overrides:
end
in classTokenFilter
- Throws:
IOException
-
reset
public void reset() throws IOException
- Overrides:
reset
in classTokenFilter
- Throws:
IOException
-
-