public class HighFreqTerms extends Object
HighFreqTerms class extracts the top n most frequent terms
(by document frequency) from an existing Lucene index and reports their
document frequency.
If the -t flag is given, both document frequency and total tf (total number of occurrences) are reported, ordered by descending total tf.
| Modifier and Type | Field and Description |
|---|---|
static int |
DEFAULTnumTerms |
static int |
numTerms |
| Constructor and Description |
|---|
HighFreqTerms() |
| Modifier and Type | Method and Description |
|---|---|
static TermStats[] |
getHighFreqTerms(IndexReader reader,
int numTerms,
String field)
Returns TermStats[] ordered by terms with highest docFreq first.
|
static long |
getTotalTermFreq(IndexReader reader,
Term term) |
static void |
main(String[] args) |
static TermStats[] |
sortByTotalTermFreq(IndexReader reader,
TermStats[] terms)
Takes array of TermStats.
|
public static final int DEFAULTnumTerms
public static int numTerms
public static TermStats[] getHighFreqTerms(IndexReader reader, int numTerms, String field) throws Exception
Exceptionpublic static TermStats[] sortByTotalTermFreq(IndexReader reader, TermStats[] terms) throws Exception
terms - TermStats[]Exceptionpublic static long getTotalTermFreq(IndexReader reader, Term term) throws Exception
ExceptionCopyright © 2000-2013 Apache Software Foundation. All Rights Reserved.