|
||||||||||
PREV NEXT | FRAMES NO FRAMES |
See:
Description
Core | |
---|---|
org.apache.lucene | Top-level package. |
org.apache.lucene.analysis | API and code to convert text into indexable/searchable tokens. |
org.apache.lucene.analysis.standard | The org.apache.lucene.analysis.standard package contains three
fast grammar-based tokenizers constructed with JFlex: |
org.apache.lucene.analysis.tokenattributes | |
org.apache.lucene.document | The logical representation of a Document for indexing and searching. |
org.apache.lucene.index | Code to maintain and access indices. |
org.apache.lucene.messages | For Native Language Support (NLS), system of software internationalization. |
org.apache.lucene.queryParser | A simple query parser implemented with JavaCC. |
org.apache.lucene.search | Code to search indices. |
org.apache.lucene.search.function |
Programmatic control over documents scores. |
org.apache.lucene.search.payloads | The payloads package provides Query mechanisms for finding and using payloads. |
org.apache.lucene.search.spans | The calculus of spans. |
org.apache.lucene.search.suggest | |
org.apache.lucene.search.suggest.fst | |
org.apache.lucene.search.suggest.jaspell | |
org.apache.lucene.search.suggest.tst | |
org.apache.lucene.store | Binary i/o API, used for all index data. |
org.apache.lucene.util | Some utility classes. |
org.apache.lucene.util.fst | Finite state transducers |
contrib: Analysis | |
---|---|
org.apache.lucene.analysis.ar | Analyzer for Arabic. |
org.apache.lucene.analysis.bg | Analyzer for Bulgarian. |
org.apache.lucene.analysis.br | Analyzer for Brazilian Portuguese. |
org.apache.lucene.analysis.ca | Analyzer for Catalan. |
org.apache.lucene.analysis.cjk | Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters). |
org.apache.lucene.analysis.cn | Analyzer for Chinese, which indexes unigrams (individual chinese characters). |
org.apache.lucene.analysis.cn.smart |
Analyzer for Simplified Chinese, which indexes words. |
org.apache.lucene.analysis.cn.smart.hhmm |
SmartChineseAnalyzer Hidden Markov Model package. |
org.apache.lucene.analysis.compound | A filter that decomposes compound words you find in many Germanic languages into the word parts. |
org.apache.lucene.analysis.compound.hyphenation | The code for the compound word hyphenation is taken from the Apache FOP project. |
org.apache.lucene.analysis.cz | Analyzer for Czech. |
org.apache.lucene.analysis.da | Analyzer for Danish. |
org.apache.lucene.analysis.de | Analyzer for German. |
org.apache.lucene.analysis.el | Analyzer for Greek. |
org.apache.lucene.analysis.en | Analyzer for English. |
org.apache.lucene.analysis.es | Analyzer for Spanish. |
org.apache.lucene.analysis.eu | Analyzer for Basque. |
org.apache.lucene.analysis.fa | Analyzer for Persian. |
org.apache.lucene.analysis.fi | Analyzer for Finnish. |
org.apache.lucene.analysis.fr | Analyzer for French. |
org.apache.lucene.analysis.gl | Analyzer for Galician. |
org.apache.lucene.analysis.hi | Analyzer for Hindi. |
org.apache.lucene.analysis.hu | Analyzer for Hungarian. |
org.apache.lucene.analysis.hy | Analyzer for Armenian. |
org.apache.lucene.analysis.icu | Analysis components based on ICU |
org.apache.lucene.analysis.icu.segmentation | Tokenizer that breaks text into words with the Unicode Text Segmentation algorithm. |
org.apache.lucene.analysis.icu.tokenattributes | |
org.apache.lucene.analysis.id | Analyzer for Indonesian. |
org.apache.lucene.analysis.in | Analysis components for Indian languages. |
org.apache.lucene.analysis.it | Analyzer for Italian. |
org.apache.lucene.analysis.lv | Analyzer for Latvian. |
org.apache.lucene.analysis.miscellaneous | Miscellaneous TokenStreams |
org.apache.lucene.analysis.ngram | Character n-gram tokenizers and filters. |
org.apache.lucene.analysis.nl | Analyzer for Dutch. |
org.apache.lucene.analysis.no | Analyzer for Norwegian. |
org.apache.lucene.analysis.path | |
org.apache.lucene.analysis.payloads | Provides various convenience classes for creating payloads on Tokens. |
org.apache.lucene.analysis.pl | Analyzer for Polish. |
org.apache.lucene.analysis.position | Filter for assigning position increments. |
org.apache.lucene.analysis.pt | Analyzer for Portuguese. |
org.apache.lucene.analysis.query | Automatically filter high-frequency stopwords. |
org.apache.lucene.analysis.reverse | Filter to reverse token text. |
org.apache.lucene.analysis.ro | Analyzer for Romanian. |
org.apache.lucene.analysis.ru | Analyzer for Russian. |
org.apache.lucene.analysis.shingle | Word n-gram filters |
org.apache.lucene.analysis.sinks | Implementations of the SinkTokenizer that might be useful. |
org.apache.lucene.analysis.snowball | TokenFilter and Analyzer implementations that use Snowball
stemmers. |
org.apache.lucene.analysis.stempel | Stempel: Algorithmic Stemmer |
org.apache.lucene.analysis.sv | Analyzer for Swedish. |
org.apache.lucene.analysis.th | Analyzer for Thai. |
org.apache.lucene.analysis.tr | Analyzer for Turkish. |
org.apache.lucene.analysis.util | |
org.apache.lucene.analysis.wikipedia | Tokenizer that is aware of Wikipedia syntax. |
org.egothor.stemmer | |
org.tartarus.snowball | |
org.tartarus.snowball.ext |
contrib: Benchmark | |
---|---|
org.apache.lucene.benchmark |
The benchmark contribution contains tools for benchmarking Lucene using standard, freely available corpora. |
org.apache.lucene.benchmark.byTask |
Benchmarking Lucene By Tasks. |
org.apache.lucene.benchmark.byTask.feeds | Sources for benchmark inputs: documents and queries. |
org.apache.lucene.benchmark.byTask.feeds.demohtml | |
org.apache.lucene.benchmark.byTask.programmatic | Sample performance test written programmatically - no algorithm file is needed here. |
org.apache.lucene.benchmark.byTask.stats | Statistics maintained when running benchmark tasks. |
org.apache.lucene.benchmark.byTask.tasks | Extendable benchmark tasks. |
org.apache.lucene.benchmark.byTask.utils | Utilities used for the benchmark, and for the reports. |
org.apache.lucene.benchmark.quality | Search Quality Benchmarking. |
org.apache.lucene.benchmark.quality.trec | Utilities for Trec related quality benchmarking, feeding from Trec Topics and QRels inputs. |
org.apache.lucene.benchmark.quality.utils | Miscellaneous utilities for search quality benchmarking: query parsing, submission reports. |
org.apache.lucene.benchmark.stats | |
org.apache.lucene.benchmark.utils |
contrib: ICU | |
---|---|
org.apache.lucene.collation |
CollationKeyFilter
converts each token into its binary CollationKey using the
provided Collator , and then encode the CollationKey
as a String using
IndexableBinaryStringTools , to allow it to be
stored as an index term. |
contrib: Demo | |
---|---|
org.apache.lucene.demo |
contrib: Grouping | |
---|---|
org.apache.lucene.search.grouping | This module enables search result grouping with Lucene, where hits with the same value in the specified single-valued group field are grouped together. |
contrib: Highlighter | |
---|---|
org.apache.lucene.search.highlight | The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages. |
org.apache.lucene.search.vectorhighlight | This is an another highlighter implementation. |
contrib: Instantiated | |
---|---|
org.apache.lucene.store.instantiated | InstantiatedIndex, alternative RAM store for small corpora. |
contrib: Memory | |
---|---|
org.apache.lucene.index.memory | High-performance single-document main memory Apache Lucene fulltext search index. |
contrib: Misc | |
---|---|
org.apache.lucene.misc |
contrib: Queries | |
---|---|
org.apache.lucene.search.regex | Regular expression Query. |
org.apache.lucene.search.similar | Document similarity query generators. |
contrib: Query Parser | |
---|---|
org.apache.lucene.queryParser.analyzing | QueryParser that passes Fuzzy-, Prefix-, Range-, and WildcardQuerys through the given analyzer. |
org.apache.lucene.queryParser.complexPhrase | QueryParser which permits complex phrase query syntax eg "(john jon jonathan~) peters*" |
org.apache.lucene.queryParser.core | Contains the core classes of the flexible query parser framework |
org.apache.lucene.queryParser.core.builders | Contains the necessary classes to implement query builders |
org.apache.lucene.queryParser.core.config | Contains the base classes used to configure the query processing |
org.apache.lucene.queryParser.core.messages | Contains messages usually used by query parser implementations |
org.apache.lucene.queryParser.core.nodes | Contains query nodes that are commonly used by query parser implementations |
org.apache.lucene.queryParser.core.parser | Contains the necessary interfaces to implement text parsers |
org.apache.lucene.queryParser.core.processors | Interfaces and implementations used by query node processors |
org.apache.lucene.queryParser.core.util | Utility classes to used with the Query Parser |
org.apache.lucene.queryParser.ext | Extendable QueryParser provides a simple and flexible extension mechanism by overloading query field names. |
org.apache.lucene.queryParser.precedence | This package contains the Precedence Query Parser Implementation |
org.apache.lucene.queryParser.precedence.processors | This package contains the processors used by Precedence Query Parser |
org.apache.lucene.queryParser.standard | Contains the implementation of the Lucene query parser using the flexible query parser frameworks |
org.apache.lucene.queryParser.standard.builders | Standard Lucene Query Node Builders |
org.apache.lucene.queryParser.standard.config | Standard Lucene Query Configuration |
org.apache.lucene.queryParser.standard.nodes | Standard Lucene Query Nodes |
org.apache.lucene.queryParser.standard.parser | Lucene Query Parser |
org.apache.lucene.queryParser.standard.processors | Lucene Query Node Processors |
org.apache.lucene.queryParser.surround.parser | This package contains the QueryParser.jj source file for the Surround parser. |
org.apache.lucene.queryParser.surround.query | This package contains SrndQuery and its subclasses. |
contrib: Spatial | |
---|---|
org.apache.lucene.spatial | |
org.apache.lucene.spatial.geohash | Support for Geohash encoding, decoding, and filtering. |
org.apache.lucene.spatial.geometry | |
org.apache.lucene.spatial.geometry.shape | |
org.apache.lucene.spatial.tier | Support for filtering based upon geographic location. |
org.apache.lucene.spatial.tier.projections |
contrib: SpellChecker | |
---|---|
org.apache.lucene.search.spell | Suggest alternate spellings for words. |
contrib: WordNet | |
---|---|
org.apache.lucene.wordnet | This package uses synonyms defined by WordNet. |
contrib: XML Query Parser | |
---|---|
org.apache.lucene.xmlparser | Parser that produces Lucene Query objects from XML streams. |
org.apache.lucene.xmlparser.builders |
Apache Lucene is a high-performance, full-featured text search engine library. Here's a simple example how to use Lucene for indexing and searching (using JUnit to check if the results are what we expect):
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT); // Store the index in memory: Directory directory = new RAMDirectory(); // To store an index on disk, use this instead: //Directory directory = FSDirectory.open("/tmp/testindex"); IndexWriter iwriter = new IndexWriter(directory, analyzer, true, new IndexWriter.MaxFieldLength(25000)); Document doc = new Document(); String text = "This is the text to be indexed."; doc.add(new Field("fieldname", text, Field.Store.YES, Field.Index.ANALYZED)); iwriter.addDocument(doc); iwriter.close(); // Now search the index: IndexSearcher isearcher = new IndexSearcher(directory, true); // read-only=true // Parse a simple query that searches for "text": QueryParser parser = new QueryParser("fieldname", analyzer); Query query = parser.parse("text"); ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs; assertEquals(1, hits.length); // Iterate through the results: for (int i = 0; i < hits.length; i++) { Document hitDoc = isearcher.doc(hits[i].doc); assertEquals("This is the text to be indexed.", hitDoc.get("fieldname")); } isearcher.close(); directory.close();
The Lucene API is divided into several packages:
> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.IndexFiles rec.food.recipes/soups
adding rec.food.recipes/soups/abalone-chowder
[ ... ]> java -cp lucene.jar:lucene-demo.jar:lucene-analyzers-common.jar org.apache.lucene.demo.SearchFiles
Query: chowder
Searching for: chowder
34 total matching documents
1. rec.food.recipes/soups/spam-chowder
[ ... thirty-four documents contain the word "chowder" ... ]Query: "clam chowder" AND Manhattan
Searching for: +"clam chowder" +manhattan
2 total matching documents
1. rec.food.recipes/soups/clam-chowder
[ ... two documents contain the phrase "clam chowder" and the word "manhattan" ... ]
[ Note: "+" and "-" are canonical, but "AND", "OR" and "NOT" may be used. ]
|
||||||||||
PREV NEXT | FRAMES NO FRAMES |