Overview (Lucene 9.11.1 core API)

Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):

    Analyzer analyzer = new StandardAnalyzer();

    Path indexPath = Files.createTempDirectory("tempIndex");
    Directory directory = FSDirectory.open(indexPath);
    IndexWriterConfig config = new IndexWriterConfig(analyzer);
    IndexWriter iwriter = new IndexWriter(directory, config);
    Document doc = new Document();
    String text = "This is the text to be indexed.";
    doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
    iwriter.addDocument(doc);
    iwriter.close();
    
    // Now search the index:
    DirectoryReader ireader = DirectoryReader.open(directory);
    IndexSearcher isearcher = new IndexSearcher(ireader);
    // Parse a simple query that searches for "text":
    QueryParser parser = new QueryParser("fieldname", analyzer);
    Query query = parser.parse("text");
    ScoreDoc[] hits = isearcher.search(query, 10).scoreDocs;
    assertEquals(1, hits.length);
    // Iterate through the results:
    StoredFields storedFields = isearcher.storedFields();
    for (int i = 0; i < hits.length; i++) {
      Document hitDoc = storedFields.document(hits[i].doc);
      assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
    }
    ireader.close();
    directory.close();
    IOUtils.rm(indexPath);

The Lucene API is divided into several packages:

org.apache.lucene.analysis defines an abstract Analyzer API for converting text from a Reader into a TokenStream, an enumeration of token Attributes. A TokenStream can be composed by applying TokenFilters to the output of a Tokenizer. Tokenizers and TokenFilters are strung together and applied with an Analyzer. analysis-common provides a number of Analyzer implementations, including StopAnalyzer and the grammar-based StandardAnalyzer.
org.apache.lucene.codecs provides an abstraction over the encoding and decoding of the inverted index structure, as well as different implementations that can be chosen depending upon application needs.
org.apache.lucene.document provides a simple Document class. A Document is simply a set of named Fields, whose values may be strings or instances of Reader.
org.apache.lucene.index provides two primary classes: IndexWriter, which creates and adds documents to indices; and IndexReader, which accesses the data in the index.
org.apache.lucene.search provides data structures to represent queries (ie TermQuery for individual words, PhraseQuery for phrases, and BooleanQuery for boolean combinations of queries) and the IndexSearcher which turns queries into TopDocs. A number of QueryParsers are provided for producing query structures from strings or xml.
org.apache.lucene.store defines an abstract class for storing persistent data, the Directory, which is a collection of named files written by an IndexOutput and read by an IndexInput. Multiple implementations are provided, but FSDirectory is generally recommended as it tries to use operating system disk buffer caches efficiently.
org.apache.lucene.util contains a few handy data structures and util classes, ie FixedBitSet and PriorityQueue.

To use Lucene, an application should:

Create Documents by adding Fields;
Create an IndexWriter and add documents to it with addDocument();
Call QueryParser.parse() to build a query from a string; and
Create an IndexSearcher and pass the query to its search() method.

Some simple examples of code which does this are:

IndexFiles.java creates an index for all the files contained in a directory.
SearchFiles.java prompts for queries and searches an index.

To demonstrate these, try something like:

> java -cp lucene-core.jar:lucene-demo.jar:lucene-analysis-common.jar org.apache.lucene.demo.IndexFiles -index index -docs rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]
> java -cp lucene-core.jar:lucene-demo.jar:lucene-queryparser.jar:lucene-analysis-common.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]
Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]

Packages
Package	Description
org.apache.lucene.analysis	Text analysis.
org.apache.lucene.analysis.standard	Fast, general-purpose grammar-based tokenizer `StandardTokenizer` implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29.
org.apache.lucene.analysis.tokenattributes	General-purpose attributes for text analysis.
org.apache.lucene.codecs	Codecs API: API for customization of the encoding and structure of the index.
org.apache.lucene.codecs.compressing	Compressing helper classes.
org.apache.lucene.codecs.hnsw	HNSW vector helper classes.
org.apache.lucene.codecs.lucene90	Lucene 9.0 file format.
org.apache.lucene.codecs.lucene90.blocktree	BlockTree terms dictionary.
org.apache.lucene.codecs.lucene90.compressing	Lucene 9.0 compressing format.
org.apache.lucene.codecs.lucene94	Lucene 9.4 file format.
org.apache.lucene.codecs.lucene95	Lucene 9.5 file format.
org.apache.lucene.codecs.lucene99	Lucene 9.9 file format.
org.apache.lucene.codecs.perfield	Postings format that can delegate to different formats per-field.
org.apache.lucene.document	The logical representation of a `Document` for indexing and searching.
org.apache.lucene.geo	Geospatial Utility Implementations for Lucene Core
org.apache.lucene.index	Code to maintain and access indices.
org.apache.lucene.internal.hppc	Internal copy of a subset of classes from the HPPC library.
org.apache.lucene.internal.tests	Internal bridges to package-private internals, for use by the lucene test framework only.
org.apache.lucene.internal.vectorization	Internal implementations to support SIMD vectorization.
org.apache.lucene.search	Code to search indices.
org.apache.lucene.search.comparators	Comparators, used to compare hits so as to determine their sort order when collecting the top results with `TopFieldCollector`.
org.apache.lucene.search.knn	Classes related to vector search: knn and vector fields.
org.apache.lucene.search.similarities	This package contains the various ranking models that can be used in Lucene.
org.apache.lucene.store	Binary i/o API, used for all index data.
org.apache.lucene.util	Some utility classes.
org.apache.lucene.util.automaton	Finite-state automaton for regular expressions.
org.apache.lucene.util.bkd	Block KD-tree, implementing the generic spatial data structure described in this paper.
org.apache.lucene.util.compress	Compression utilities.
org.apache.lucene.util.fst	Finite state transducers
org.apache.lucene.util.graph	Utility classes for working with token streams as graphs.
org.apache.lucene.util.hnsw	Navigable Small-World graph, nominally Hierarchical but currently only has a single layer.
org.apache.lucene.util.mutable	Comparable object wrappers
org.apache.lucene.util.packed	Packed integer arrays and streams.
org.apache.lucene.util.quantization	Provides quantization methods for scaling vector values to smaller data types and possibly fewer dimensions