org.apache.lucene.analysis.uima
Class BaseUIMATokenizer

java.lang.Object
  extended by org.apache.lucene.util.AttributeSource
      extended by org.apache.lucene.analysis.TokenStream
          extended by org.apache.lucene.analysis.Tokenizer
              extended by org.apache.lucene.analysis.uima.BaseUIMATokenizer
All Implemented Interfaces:
Closeable
Direct Known Subclasses:
UIMAAnnotationsTokenizer, UIMATypeAwareAnnotationsTokenizer

public abstract class BaseUIMATokenizer
extends Tokenizer

Abstract base implementation of a Tokenizer which is able to analyze the given input with a UIMA AnalysisEngine


Nested Class Summary
 
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
 
Field Summary
protected  org.apache.uima.analysis_engine.AnalysisEngine ae
           
protected  org.apache.uima.cas.CAS cas
           
protected  org.apache.uima.cas.FSIterator<org.apache.uima.cas.text.AnnotationFS> iterator
           
 
Fields inherited from class org.apache.lucene.analysis.Tokenizer
input
 
Constructor Summary
protected BaseUIMATokenizer(Reader reader, String descriptorPath, Map<String,Object> configurationParameters)
           
 
Method Summary
protected  void analyzeInput()
          analyzes the tokenizer input using the given analysis engine

cas will be filled with extracted metadata (UIMA annotations, feature structures)

 void end()
           
protected abstract  void initializeIterator()
          initialize the FSIterator which is used to build tokens at each incrementToken() method call
 void reset()
           
 
Methods inherited from class org.apache.lucene.analysis.Tokenizer
close, correctOffset, setReader
 
Methods inherited from class org.apache.lucene.analysis.TokenStream
incrementToken
 
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState
 
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

iterator

protected org.apache.uima.cas.FSIterator<org.apache.uima.cas.text.AnnotationFS> iterator

ae

protected org.apache.uima.analysis_engine.AnalysisEngine ae

cas

protected org.apache.uima.cas.CAS cas
Constructor Detail

BaseUIMATokenizer

protected BaseUIMATokenizer(Reader reader,
                            String descriptorPath,
                            Map<String,Object> configurationParameters)
Method Detail

analyzeInput

protected void analyzeInput()
                     throws org.apache.uima.resource.ResourceInitializationException,
                            org.apache.uima.analysis_engine.AnalysisEngineProcessException,
                            IOException
analyzes the tokenizer input using the given analysis engine

cas will be filled with extracted metadata (UIMA annotations, feature structures)

Throws:
IOException - If there is a low-level I/O error.
org.apache.uima.resource.ResourceInitializationException
org.apache.uima.analysis_engine.AnalysisEngineProcessException

initializeIterator

protected abstract void initializeIterator()
                                    throws IOException
initialize the FSIterator which is used to build tokens at each incrementToken() method call

Throws:
IOException - If there is a low-level I/O error.

reset

public void reset()
           throws IOException
Overrides:
reset in class TokenStream
Throws:
IOException

end

public void end()
         throws IOException
Overrides:
end in class TokenStream
Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.