org.apache.lucene.analysis (Lucene 4.5.0 API)

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES

Package org.apache.lucene.analysis

Support for testing analysis components.

See:
Description

Interface Summary
BaseTokenStreamTestCase.CheckClearAttributesAttribute	Attribute that records if it was cleared or not.
CannedBinaryTokenStream.BinaryTermAttribute	An attribute extending `TermToBytesRefAttribute` but exposing `CannedBinaryTokenStream.BinaryTermAttribute.setBytesRef(org.apache.lucene.util.BytesRef)` method.

Class Summary
BaseTokenStreamTestCase	Base class for all Lucene unit tests that use TokenStreams.
BaseTokenStreamTestCase.CheckClearAttributesAttributeImpl	Attribute that records if it was cleared or not.
CannedBinaryTokenStream	TokenStream from a canned list of binary (BytesRef-based) tokens.
CannedBinaryTokenStream.BinaryTermAttributeImpl	Implementation for `CannedBinaryTokenStream.BinaryTermAttribute`.
CannedBinaryTokenStream.BinaryToken	Represents a binary token.
CannedTokenStream	TokenStream from a canned list of Tokens.
CollationTestBase	Base test class for testing Unicode collation.
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>	An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead.
LookaheadTokenFilter.Position	Holds all state for a single position; subclass this to record other state at each position.
MockAnalyzer	Analyzer for testing
MockBytesAnalyzer	Analyzer for testing that encodes terms as UTF-16 bytes.
MockBytesAttributeFactory	Attribute factory that implements CharTermAttribute with `MockUTF16TermAttributeImpl`
MockCharFilter	the purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use correctOffset or does incorrect offset math.
MockFixedLengthPayloadFilter	TokenFilter that adds random fixed-length payloads.
MockGraphTokenFilter	Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1.
MockHoleInjectingTokenFilter	Randomly injects holes (similar to what a stopfilter would do)
MockPayloadAnalyzer	Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.
MockRandomLookaheadTokenFilter	Uses `LookaheadTokenFilter` to randomly peek at future tokens.
MockReaderWrapper	Wraps a Reader, and can throw random or fixed exceptions, and spoon feed read chars.
MockTokenFilter	A tokenfilter for testing that removes terms accepted by a DFA.
MockTokenizer	Tokenizer for testing.
MockUTF16TermAttributeImpl	Extension of `CharTermAttributeImpl` that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.
MockVariableLengthPayloadFilter	TokenFilter that adds random variable-length payloads.
TokenStreamToDot	Consumes a TokenStream and outputs the dot (graphviz) string (graph).
ValidatingTokenFilter	A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).
VocabularyAssert	Utility class for doing vocabulary-based stemming tests

Package org.apache.lucene.analysis Description

Support for testing analysis components.

The main classes of interest are:

BaseTokenStreamTestCase: Highly recommended to use its helper methods, (especially in conjunction with MockAnalyzer or MockTokenizer), as it contains many assertions and checks to catch bugs.
MockTokenizer: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as a TokenFilter, its a great idea to test it wrapping this tokenizer instead for extra checks.
MockAnalyzer: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, its a great idea to test it with this analyzer instead.

Overview

Package

Class

Use

Tree

Deprecated

Help

PREV PACKAGE NEXT PACKAGE

FRAMES NO FRAMES