Lucene 3.0.3 API

Apache Lucene is a high-performance, full-featured text search engine library.


org.apache.lucene Top-level package.
org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.
org.apache.lucene.analysis.standard A fast grammar-based tokenizer constructed with JFlex.
org.apache.lucene.document The logical representation of a Document for indexing and searching.
org.apache.lucene.index Code to maintain and access indices.
org.apache.lucene.messages For Native Language Support (NLS), system of software internationalization.
org.apache.lucene.queryParser A simple query parser implemented with JavaCC. Code to search indices.
Programmatic control over documents scores.
The payloads package provides Query mechanisms for finding and using payloads. The calculus of spans. Binary i/o API, used for all index data.
org.apache.lucene.util Some utility classes.




contrib: Analysis Analyzer for Arabic. Analyzer for Brazilian Portuguese.
org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). Analyzer for Chinese, which indexes unigrams (individual chinese characters).
Analyzer for Simplified Chinese, which indexes words.
SmartChineseAnalyzer Hidden Markov Model package.
org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.
org.apache.lucene.analysis.compound.hyphenation The code for the compound word hyphenation is taken from the Apache FOP project. Analyzer for Czech. Analyzer for German.
org.apache.lucene.analysis.el Analyzer for Greek.
org.apache.lucene.analysis.fa Analyzer for Persian. Analyzer for French.
org.apache.lucene.analysis.miscellaneous Miscellaneous TokenStreams
org.apache.lucene.analysis.ngram Character n-gram tokenizers and filters. Analyzer for Dutch.
Provides various convenience classes for creating payloads on Tokens.
org.apache.lucene.analysis.position Filter for assigning position increments.
org.apache.lucene.analysis.query Automatically filter high-frequency stopwords.
org.apache.lucene.analysis.reverse Filter to reverse token text. Analyzer for Russian.
org.apache.lucene.analysis.shingle Word n-gram filters
Implementations of the SinkTokenizer that might be useful. Analyzer for Thai.


contrib: Ant
org.apache.lucene.ant Ant task to create Lucene indexes.


contrib: Benchmark

The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora.

Benchmarking Lucene By Tasks.
org.apache.lucene.benchmark.byTask.feeds Sources for benchmark inputs: documents and queries.
org.apache.lucene.benchmark.byTask.programmatic Sample performance test written programmatically - no algorithm file is needed here.
org.apache.lucene.benchmark.byTask.stats Statistics maintained when running benchmark tasks.
org.apache.lucene.benchmark.byTask.tasks Extendable benchmark tasks.
org.apache.lucene.benchmark.byTask.utils Utilities used for the benchmark, and for the reports.
org.apache.lucene.benchmark.quality Search Quality Benchmarking.
org.apache.lucene.benchmark.quality.trec Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs.
org.apache.lucene.benchmark.quality.utils Miscellaneous utilities for search quality benchmarking: query parsing, submission reports.


contrib: Collation
org.apache.lucene.collation CollationKeyFilter and ICUCollationKeyFilter convert each token into its binary CollationKey using the provided Collator, and then encode the CollationKey as a String using IndexableBinaryStringTools, to allow it to be stored as an index term.


contrib: DB
com.sleepycat.db Berkeley DB 4.3 based implementation of Directory. Berkeley DB Java Edition based implementation of Directory.


contrib: Fast Vector Highlighter This is an another highlighter implementation.


contrib: Highlighter The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages.


contrib: Instantiated InstantiatedIndex, alternative RAM store for small corpora.


contrib: Lucli
lucli Lucene Command Line Interface


contrib: Memory
org.apache.lucene.index.memory High-performance single-document main memory Apache Lucene fulltext search index.


contrib: Misc
org.apache.lucene.queryParser.analyzing QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer.
org.apache.lucene.queryParser.precedence QueryParser designed to handle operator precedence in a more sensible fashion than the default QueryParser.


contrib: Queries Document similarity query generators.


contrib: Query Parser
org.apache.lucene.queryParser.complexPhrase QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*"
org.apache.lucene.queryParser.core Contains the core classes of the flexible query parser framework Contains the necessary classes to implement query builders
org.apache.lucene.queryParser.core.config Contains the base classes used to configure the query processing
org.apache.lucene.queryParser.core.messages Contains messages usually used by query parser implementations
org.apache.lucene.queryParser.core.nodes Contains query nodes that are commonly used by query parser implementations
org.apache.lucene.queryParser.core.parser Contains the necessary interfaces to implement text parsers
org.apache.lucene.queryParser.core.processors Interfaces and implementations used by query node processors
org.apache.lucene.queryParser.core.util Utility classes to used with the Query Parser
org.apache.lucene.queryParser.standard Contains the implementation of the Lucene query parser using the flexible query parser frameworks Standard Lucene Query Node Builders
org.apache.lucene.queryParser.standard.config Standard Lucene Query Configuration
org.apache.lucene.queryParser.standard.nodes Standard Lucene Query Nodes
org.apache.lucene.queryParser.standard.parser Lucene Query Parser
org.apache.lucene.queryParser.standard.processors Lucene Query Node Processors


contrib: RegEx Regular expression Query.
org.apache.regexp This package exists to allow access to useful package protected data within Jakarta Regexp.


contrib: Snowball
org.apache.lucene.analysis.snowball TokenFilter and Analyzer implementations that use Snowball stemmers.


contrib: Spatial
org.apache.lucene.spatial.geohash Support for Geohash encoding, decoding, and filtering.
org.apache.lucene.spatial.tier Support for filtering based upon geographic location.


contrib: SpellChecker Suggest alternate spellings for words.


contrib: Surround Parser
org.apache.lucene.queryParser.surround.parser This package contains the QueryParser.jj source file for the Surround parser.
org.apache.lucene.queryParser.surround.query This package contains SrndQuery and its subclasses.


contrib: Swing
org.apache.lucene.swing.models Decorators for JTable TableModel and JList ListModel encapsulating Lucene indexing and searching functionality.


contrib: Wikipedia
org.apache.lucene.wikipedia.analysis Tokenizer that is aware of Wikipedia syntax.


contrib: WordNet
org.apache.lucene.wordnet This package uses synonyms defined by WordNet.


contrib: XML Query Parser
org.apache.lucene.xmlparser Parser that produces Lucene Query objects from XML streams.  


Other Packages


Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);

    // Store the index in memory:
    Directory directory = new RAMDirectory();
    // To store an index on disk, use this instead:
    //Directory directory ="/tmp/testindex");
    IndexWriter iwriter = new IndexWriter(directory, analyzer, true,
                                          new IndexWriter.MaxFieldLength(25000));
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, Field.Store.YES,
    // Now search the index:
    IndexSearcher isearcher = new IndexSearcher(directory, true)// read-only=true
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits =, null, 1000).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = isearcher.doc(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));

The Lucene API is divided into several packages:

To use Lucene, an application should:
  1. Create Documents by adding Fields;
  2. Create an IndexWriter and add documents to it with addDocument();
  3. Call QueryParser.parse() to build a query from a string; and
  4. Create an IndexSearcher and pass the query to its search() method.
Some simple examples of code which does this are: To demonstrate these, try something like:
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexFiles
  [ ... ]

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
  [ ... thirty-four documents contain the word "chowder" ... ]

Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
  [ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
    [ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

The IndexHTML demo is more sophisticated.  It incrementally maintains an index of HTML files, adding new files as they appear, deleting old files as they disappear and re-indexing files as they change.
> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML -create java/jdk1.1.6/docs/relnotes
adding java/jdk1.1.6/docs/relnotes/SMICopyright.html
  [ ... create an index containing all the relnotes ]

> rm java/jdk1.1.6/docs/relnotes/smicopyright.html

> java -cp lucene.jar:lucene-demo.jar org.apache.lucene.demo.IndexHTML java/jdk1.1.6/docs/relnotes
deleting java/jdk1.1.6/docs/relnotes/SMICopyright.html

