Package org.apache.lucene.tests.analysis
Support for testing analysis components.
The main classes of interest are:
BaseTokenStreamTestCase
: Highly recommended to use its helper methods, (especially in conjunction withMockAnalyzer
orMockTokenizer
), as it contains many assertions and checks to catch bugs.MockTokenizer
: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as aTokenFilter
, it's a great idea to test it wrapping this tokenizer instead for extra checks.MockAnalyzer
: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, it's a great idea to test it with this analyzer instead.
-
Interface Summary Interface Description BaseTokenStreamTestCase.CheckClearAttributesAttribute Attribute that records if it was cleared or not. -
Class Summary Class Description BaseTokenStreamFactoryTestCase Base class for testing tokenstream factories.BaseTokenStreamTestCase Base class for all Lucene unit tests that use TokenStreams.BaseTokenStreamTestCase.CheckClearAttributesAttributeImpl Attribute that records if it was cleared or not.CannedBinaryTokenStream TokenStream from a canned list of binary (BytesRef-based) tokens.CannedBinaryTokenStream.BinaryToken Represents a binary token.CannedTokenStream TokenStream from a canned list of Tokens.CollationTestBase Base test class for testing Unicode collation.CrankyTokenFilter Throws IOException from random Tokenstream methods.LookaheadTokenFilter<T extends LookaheadTokenFilter.Position> An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead.LookaheadTokenFilter.Position Holds all state for a single position; subclass this to record other state at each position.MockAnalyzer Analyzer for testingMockBytesAnalyzer Analyzer for testing that encodes terms as UTF-16 bytes.MockCharFilter the purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use correctOffset or does incorrect offset math.MockFixedLengthPayloadFilter TokenFilter that adds random fixed-length payloads.MockGraphTokenFilter Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1.MockHoleInjectingTokenFilter Randomly injects holes (similar to what a stopfilter would do)MockLowerCaseFilter A lowercasingTokenFilter
.MockPayloadAnalyzer Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.MockRandomLookaheadTokenFilter UsesLookaheadTokenFilter
to randomly peek at future tokens.MockReaderWrapper Wraps a Reader, and can throw random or fixed exceptions, and spoon feed read chars.MockSynonymAnalyzer adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".MockSynonymFilter adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".MockTokenFilter A tokenfilter for testing that removes terms accepted by a DFA.MockTokenizer Tokenizer for testing.MockUTF16TermAttributeImpl Extension ofCharTermAttributeImpl
that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.MockVariableLengthPayloadFilter TokenFilter that adds random variable-length payloads.SimplePayloadFilter Simple payload filter that sets the payload as pos: XXXXToken A Token is an occurrence of a term from the text of a field.TokenStreamToDot Consumes a TokenStream and outputs the dot (graphviz) string (graph).ValidatingTokenFilter A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).VocabularyAssert Utility class for doing vocabulary-based stemming tests