Class TeeSinkTokenFilter

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class TeeSinkTokenFilter
    extends TokenFilter
    This TokenFilter provides the ability to set aside attribute states that have already been analyzed. This is useful in situations where multiple fields share many common analysis steps and then go their separate ways.

    It is also useful for doing things like entity extraction or proper noun analysis as part of the analysis workflow and saving off those tokens for use in another field.

     TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer());
     TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream();
     TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream();
    
     TokenStream final1 = new LowerCaseFilter(source1);
     TokenStream final2 = new EntityDetect(sink1);
     TokenStream final3 = new URLDetect(sink2);
    
     d.add(new TextField("f1", final1));
     d.add(new TextField("f2", final2));
     d.add(new TextField("f3", final3));
     

    In this example, sink1 and sink2 will both get tokens from source1 after whitespace tokenization, and will further do additional token filtering, e.g. detect entities and URLs.

    NOTE: it is important, that tees are consumed before sinks, therefore you should add them to the document before the sinks. In the above example, f1 is added before the other fields, and so by the time they are processed, it has already been consumed, which is the correct way to index the three streams. If for some reason you cannot ensure that, you should call consumeAllTokens() before adding the sinks to document fields.