Package org.apache.lucene.tests.analysis
package org.apache.lucene.tests.analysis
Support for testing analysis components.
The main classes of interest are:
BaseTokenStreamTestCase
: Highly recommended to use its helper methods, (especially in conjunction withMockAnalyzer
orMockTokenizer
), as it contains many assertions and checks to catch bugs.MockTokenizer
: Tokenizer for testing. Tokenizer that serves as a replacement for WHITESPACE, SIMPLE, and KEYWORD tokenizers. If you are writing a component such as aTokenFilter
, it's a great idea to test it wrapping this tokenizer instead for extra checks.MockAnalyzer
: Analyzer for testing. Analyzer that uses MockTokenizer for additional verification. If you are testing a custom component such as a queryparser or analyzer-wrapper that consumes analysis streams, it's a great idea to test it with this analyzer instead.
-
ClassDescriptionBase class for testing tokenstream factories.Base class for all Lucene unit tests that use TokenStreams.Attribute that records if it was cleared or not.Attribute that records if it was cleared or not.TokenStream from a canned list of binary (BytesRef-based) tokens.Represents a binary token.TokenStream from a canned list of Tokens.Base test class for testing Unicode collation.Throws IOException from random Tokenstream methods.An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead.Holds all state for a single position; subclass this to record other state at each position.Analyzer for testingAnalyzer for testing that encodes terms as UTF-16 bytes.the purpose of this charfilter is to send offsets out of bounds if the analyzer doesn't use correctOffset or does incorrect offset math.TokenFilter that adds random fixed-length payloads.Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1.Randomly injects holes (similar to what a stopfilter would do)A lowercasing
TokenFilter
.Wraps a whitespace tokenizer with a filter that sets the first token, and odd tokens to posinc=1, and all others to 0, encoding the position as pos: XXX in the payload.UsesLookaheadTokenFilter
to randomly peek at future tokens.Wraps a Reader, and can throw random or fixed exceptions, and spoon feed read chars.adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".adds synonym of "dog" for "dogs", and synonym of "cavy" for "guinea pig".A tokenfilter for testing that removes terms accepted by a DFA.Tokenizer for testing.Extension ofCharTermAttributeImpl
that encodes the term text as UTF-16 bytes instead of as UTF-8 bytes.TokenFilter that adds random variable-length payloads.Simple payload filter that sets the payload as pos: XXXXA Token is an occurrence of a term from the text of a field.Consumes a TokenStream and outputs the dot (graphviz) string (graph).A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another).Utility class for doing vocabulary-based stemming tests