Class HighFrequencyDictionary

  • All Implemented Interfaces:
    Dictionary

    public class HighFrequencyDictionary
    extends Object
    implements Dictionary
    HighFrequencyDictionary: terms taken from the given field of a Lucene index, which appear in a number of documents above a given threshold.

    Threshold is a value in [0..1] representing the minimum number of documents (of the total) where a term should appear.

    Based on LuceneDictionary.

    • Constructor Detail

      • HighFrequencyDictionary

        public HighFrequencyDictionary​(IndexReader reader,
                                       String field,
                                       float thresh)
        Creates a new Dictionary, pulling source terms from the specified field in the provided reader.

        Terms appearing in less than thresh percentage of documents will be excluded.