org.apache.lucene.search
Class FuzzyTermsEnum

java.lang.Object
  extended by org.apache.lucene.index.TermsEnum
      extended by org.apache.lucene.search.FuzzyTermsEnum
All Implemented Interfaces:
BytesRefIterator

public class FuzzyTermsEnum
extends TermsEnum

Subclass of TermsEnum for enumerating all terms that are similar to the specified filter term.

Term enumerations are always ordered by getComparator(). Each term in the enumeration is greater than all that precede it.


Nested Class Summary
static interface FuzzyTermsEnum.LevenshteinAutomataAttribute
          reuses compiled automata across different segments, because they are independent of the index
static class FuzzyTermsEnum.LevenshteinAutomataAttributeImpl
          Stores compiled automata as a list (indexed by edit distance)
 
Nested classes/interfaces inherited from class org.apache.lucene.index.TermsEnum
TermsEnum.SeekStatus
 
Field Summary
protected  int maxEdits
           
protected  float minSimilarity
           
protected  boolean raw
           
protected  int realPrefixLength
           
protected  float scale_factor
           
protected  int termLength
           
protected  Terms terms
           
protected  int[] termText
           
 
Fields inherited from class org.apache.lucene.index.TermsEnum
EMPTY
 
Constructor Summary
FuzzyTermsEnum(Terms terms, AttributeSource atts, Term term, float minSimilarity, int prefixLength, boolean transpositions)
          Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity.
 
Method Summary
 int docFreq()
          Returns the number of documents containing the current term.
 DocsEnum docs(Bits liveDocs, DocsEnum reuse, int flags)
          Get DocsEnum for the current term, with control over whether freqs are required.
 DocsAndPositionsEnum docsAndPositions(Bits liveDocs, DocsAndPositionsEnum reuse, int flags)
          Get DocsAndPositionsEnum for the current term, with control over whether offsets and payloads are required.
protected  TermsEnum getAutomatonEnum(int editDistance, BytesRef lastTerm)
          return an automata-based enum for matching up to editDistance from lastTerm, if possible
 Comparator<BytesRef> getComparator()
          Return the BytesRef Comparator used to sort terms provided by the iterator.
 float getMinSimilarity()
           
 float getScaleFactor()
           
protected  void maxEditDistanceChanged(BytesRef lastTerm, int maxEdits, boolean init)
           
 BytesRef next()
          Increments the iteration to the next BytesRef in the iterator.
 long ord()
          Returns ordinal position for current term.
 TermsEnum.SeekStatus seekCeil(BytesRef text, boolean useCache)
          Expert: just like TermsEnum.seekCeil(BytesRef) but allows you to control whether the implementation should attempt to use its term cache (if it uses one).
 boolean seekExact(BytesRef text, boolean useCache)
          Attempts to seek to the exact term, returning true if the term is found.
 void seekExact(BytesRef term, TermState state)
          Expert: Seeks a specific position by TermState previously obtained from TermsEnum.termState().
 void seekExact(long ord)
          Seeks to the specified term by ordinal (position) as previously returned by TermsEnum.ord().
protected  void setEnum(TermsEnum actualEnum)
          swap in a new actual enum to proxy to
 BytesRef term()
          Returns current term.
 TermState termState()
          Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.
 long totalTermFreq()
          Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term).
 
Methods inherited from class org.apache.lucene.index.TermsEnum
attributes, docs, docsAndPositions, seekCeil
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

minSimilarity

protected final float minSimilarity

scale_factor

protected final float scale_factor

termLength

protected final int termLength

maxEdits

protected int maxEdits

raw

protected final boolean raw

terms

protected final Terms terms

termText

protected final int[] termText

realPrefixLength

protected final int realPrefixLength
Constructor Detail

FuzzyTermsEnum

public FuzzyTermsEnum(Terms terms,
                      AttributeSource atts,
                      Term term,
                      float minSimilarity,
                      int prefixLength,
                      boolean transpositions)
               throws IOException
Constructor for enumeration of all terms from specified reader which share a prefix of length prefixLength with term and which have a fuzzy similarity > minSimilarity.

After calling the constructor the enumeration is already pointing to the first valid term if such a term exists.

Parameters:
terms - Delivers terms.
atts - AttributeSource created by the rewrite method of MultiTermQuery thats contains information about competitive boosts during rewrite. It is also used to cache DFAs between segment transitions.
term - Pattern term.
minSimilarity - Minimum required similarity for terms from the reader. Pass an integer value representing edit distance. Passing a fraction is deprecated.
prefixLength - Length of required common prefix. Default value is 0.
Throws:
IOException - if there is a low-level IO error
Method Detail

getAutomatonEnum

protected TermsEnum getAutomatonEnum(int editDistance,
                                     BytesRef lastTerm)
                              throws IOException
return an automata-based enum for matching up to editDistance from lastTerm, if possible

Throws:
IOException

setEnum

protected void setEnum(TermsEnum actualEnum)
swap in a new actual enum to proxy to


maxEditDistanceChanged

protected void maxEditDistanceChanged(BytesRef lastTerm,
                                      int maxEdits,
                                      boolean init)
                               throws IOException
Throws:
IOException

next

public BytesRef next()
              throws IOException
Description copied from interface: BytesRefIterator
Increments the iteration to the next BytesRef in the iterator. Returns the resulting BytesRef or null if the end of the iterator is reached. The returned BytesRef may be re-used across calls to next. After this method returns null, do not call it again: the results are undefined.

Returns:
the next BytesRef in the iterator or null if the end of the iterator is reached.
Throws:
IOException - If there is a low-level I/O error.

docFreq

public int docFreq()
            throws IOException
Description copied from class: TermsEnum
Returns the number of documents containing the current term. Do not call this when the enum is unpositioned. TermsEnum.SeekStatus.END.

Specified by:
docFreq in class TermsEnum
Throws:
IOException

totalTermFreq

public long totalTermFreq()
                   throws IOException
Description copied from class: TermsEnum
Returns the total number of occurrences of this term across all documents (the sum of the freq() for each doc that has this term). This will be -1 if the codec doesn't support this measure. Note that, like other term measures, this measure does not take deleted documents into account.

Specified by:
totalTermFreq in class TermsEnum
Throws:
IOException

docs

public DocsEnum docs(Bits liveDocs,
                     DocsEnum reuse,
                     int flags)
              throws IOException
Description copied from class: TermsEnum
Get DocsEnum for the current term, with control over whether freqs are required. Do not call this when the enum is unpositioned. This method will not return null.

Specified by:
docs in class TermsEnum
Parameters:
liveDocs - unset bits are documents that should not be returned
reuse - pass a prior DocsEnum for possible reuse
flags - specifies which optional per-document values you require; see DocsEnum.FLAG_FREQS
Throws:
IOException
See Also:
TermsEnum.docs(Bits, DocsEnum, int)

docsAndPositions

public DocsAndPositionsEnum docsAndPositions(Bits liveDocs,
                                             DocsAndPositionsEnum reuse,
                                             int flags)
                                      throws IOException
Description copied from class: TermsEnum
Get DocsAndPositionsEnum for the current term, with control over whether offsets and payloads are required. Some codecs may be able to optimize their implementation when offsets and/or payloads are not required. Do not call this when the enum is unpositioned. This will return null if positions were not indexed.

Specified by:
docsAndPositions in class TermsEnum
Parameters:
liveDocs - unset bits are documents that should not be returned
reuse - pass a prior DocsAndPositionsEnum for possible reuse
flags - specifies which optional per-position values you require; see DocsAndPositionsEnum.FLAG_OFFSETS and DocsAndPositionsEnum.FLAG_PAYLOADS.
Throws:
IOException

seekExact

public void seekExact(BytesRef term,
                      TermState state)
               throws IOException
Description copied from class: TermsEnum
Expert: Seeks a specific position by TermState previously obtained from TermsEnum.termState(). Callers should maintain the TermState to use this method. Low-level implementations may position the TermsEnum without re-seeking the term dictionary.

Seeking by TermState should only be used iff the enum the state was obtained from and the enum the state is used for seeking are obtained from the same IndexReader.

NOTE: Using this method with an incompatible TermState might leave this TermsEnum in undefined state. On a segment level TermState instances are compatible only iff the source and the target TermsEnum operate on the same field. If operating on segment level, TermState instances must not be used across segments.

NOTE: A seek by TermState might not restore the AttributeSource's state. AttributeSource states must be maintained separately if this method is used.

Overrides:
seekExact in class TermsEnum
Parameters:
term - the term the TermState corresponds to
state - the TermState
Throws:
IOException

termState

public TermState termState()
                    throws IOException
Description copied from class: TermsEnum
Expert: Returns the TermsEnums internal state to position the TermsEnum without re-seeking the term dictionary.

NOTE: A seek by TermState might not capture the AttributeSource's state. Callers must maintain the AttributeSource states separately

Overrides:
termState in class TermsEnum
Throws:
IOException
See Also:
TermState, TermsEnum.seekExact(BytesRef, TermState)

getComparator

public Comparator<BytesRef> getComparator()
Description copied from interface: BytesRefIterator
Return the BytesRef Comparator used to sort terms provided by the iterator. This may return null if there are no items or the iterator is not sorted. Callers may invoke this method many times, so it's best to cache a single instance & reuse it.


ord

public long ord()
         throws IOException
Description copied from class: TermsEnum
Returns ordinal position for current term. This is an optional method (the codec may throw UnsupportedOperationException). Do not call this when the enum is unpositioned.

Specified by:
ord in class TermsEnum
Throws:
IOException

seekExact

public boolean seekExact(BytesRef text,
                         boolean useCache)
                  throws IOException
Description copied from class: TermsEnum
Attempts to seek to the exact term, returning true if the term is found. If this returns false, the enum is unpositioned. For some codecs, seekExact may be substantially faster than TermsEnum.seekCeil(org.apache.lucene.util.BytesRef, boolean).

Overrides:
seekExact in class TermsEnum
Throws:
IOException

seekCeil

public TermsEnum.SeekStatus seekCeil(BytesRef text,
                                     boolean useCache)
                              throws IOException
Description copied from class: TermsEnum
Expert: just like TermsEnum.seekCeil(BytesRef) but allows you to control whether the implementation should attempt to use its term cache (if it uses one).

Specified by:
seekCeil in class TermsEnum
Throws:
IOException

seekExact

public void seekExact(long ord)
               throws IOException
Description copied from class: TermsEnum
Seeks to the specified term by ordinal (position) as previously returned by TermsEnum.ord(). The target ord may be before or after the current ord, and must be within bounds.

Specified by:
seekExact in class TermsEnum
Throws:
IOException

term

public BytesRef term()
              throws IOException
Description copied from class: TermsEnum
Returns current term. Do not call this when the enum is unpositioned.

Specified by:
term in class TermsEnum
Throws:
IOException

getMinSimilarity

public float getMinSimilarity()
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.

getScaleFactor

public float getScaleFactor()
NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.