public final class WordDelimiterFilter extends TokenFilter
"Wi-Fi" → "Wi", "Fi""PowerShot" →
 "Power", "Shot""SD500" →
 "SD", "500""//hello---there, 'dude'" →
 "hello", "there", "dude""O'Neil's"
 → "O", "Neil"
 "PowerShot"
 → 0:"Power", 1:"Shot" (0 and 1 are the token positions)"PowerShot" →
 0:"Power", 1:"Shot" 1:"PowerShot""A's+B's&C's" -gt; 0:"A", 1:"B", 2:"C", 2:"ABC"
 "Super-Duper-XL500-42-AutoCoder!" →
 0:"Super", 1:"Duper", 2:"XL", 2:"SuperDuperXL", 3:"500" 4:"42", 5:"Auto", 6:"Coder", 6:"AutoCoder"
 WordDelimiterFilter is to help match words with different
 subword delimiters. For example, if the source text contained "wi-fi" one may
 want "wifi" "WiFi" "wi-fi" "wi+fi" queries to all match. One way of doing so
 is to specify combinations="1" in the analyzer used for indexing, and
 combinations="0" (the default) in the analyzer used for querying. Given that
 the current StandardTokenizer immediately removes many intra-word
 delimiters, it is recommended that this filter be used after a tokenizer that
 does not do this (such as WhitespaceTokenizer).AttributeSource.State| Modifier and Type | Field and Description | 
|---|---|
static int | 
ALPHA  | 
static int | 
ALPHANUM  | 
static int | 
CATENATE_ALL
Causes all subword parts to be catenated:
 
 "wi-fi-4000" => "wifi4000" 
 | 
static int | 
CATENATE_NUMBERS
Causes maximum runs of word parts to be catenated:
 
 "wi-fi" => "wifi" 
 | 
static int | 
CATENATE_WORDS
Causes maximum runs of word parts to be catenated:
 
 "wi-fi" => "wifi" 
 | 
static int | 
DIGIT  | 
static int | 
GENERATE_NUMBER_PARTS
Causes number subwords to be generated:
 
 "500-42" => "500" "42" 
 | 
static int | 
GENERATE_WORD_PARTS
Causes parts of words to be generated:
 
 "PowerShot" => "Power" "Shot" 
 | 
static int | 
LOWER  | 
static int | 
PRESERVE_ORIGINAL
Causes original words are preserved and added to the subword list (Defaults to false)
 
 "500-42" => "500" "42" "500-42" 
 | 
static int | 
SPLIT_ON_CASE_CHANGE
If not set, causes case changes to be ignored (subwords will only be generated
 given SUBWORD_DELIM tokens) 
 | 
static int | 
SPLIT_ON_NUMERICS
If not set, causes numeric changes to be ignored (subwords will only be generated
 given SUBWORD_DELIM tokens). 
 | 
static int | 
STEM_ENGLISH_POSSESSIVE
Causes trailing "'s" to be removed for each subword
 
 "O'Neil's" => "O", "Neil" 
 | 
static int | 
SUBWORD_DELIM  | 
static int | 
UPPER  | 
inputDEFAULT_TOKEN_ATTRIBUTE_FACTORYDEFAULT_ATTRIBUTE_FACTORY| Constructor and Description | 
|---|
WordDelimiterFilter(TokenStream in,
                   byte[] charTypeTable,
                   int configurationFlags,
                   CharArraySet protWords)
Creates a new WordDelimiterFilter 
 | 
WordDelimiterFilter(TokenStream in,
                   int configurationFlags,
                   CharArraySet protWords)
Creates a new WordDelimiterFilter using  
WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE
 as its charTypeTable | 
WordDelimiterFilter(Version matchVersion,
                   TokenStream in,
                   byte[] charTypeTable,
                   int configurationFlags,
                   CharArraySet protWords)
Deprecated. 
 
 | 
WordDelimiterFilter(Version matchVersion,
                   TokenStream in,
                   int configurationFlags,
                   CharArraySet protWords)
Deprecated. 
 
 | 
| Modifier and Type | Method and Description | 
|---|---|
boolean | 
incrementToken()  | 
void | 
reset()  | 
close, endaddAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toStringpublic static final int LOWER
public static final int UPPER
public static final int DIGIT
public static final int SUBWORD_DELIM
public static final int ALPHA
public static final int ALPHANUM
public static final int GENERATE_WORD_PARTS
public static final int GENERATE_NUMBER_PARTS
public static final int CATENATE_WORDS
public static final int CATENATE_NUMBERS
public static final int CATENATE_ALL
public static final int PRESERVE_ORIGINAL
public static final int SPLIT_ON_CASE_CHANGE
public static final int SPLIT_ON_NUMERICS
public static final int STEM_ENGLISH_POSSESSIVE
public WordDelimiterFilter(TokenStream in, byte[] charTypeTable, int configurationFlags, CharArraySet protWords)
in - TokenStream to be filteredcharTypeTable - table containing character typesconfigurationFlags - Flags configuring the filterprotWords - If not null is the set of tokens to protect from being delimited@Deprecated public WordDelimiterFilter(Version matchVersion, TokenStream in, byte[] charTypeTable, int configurationFlags, CharArraySet protWords)
WordDelimiterFilter(TokenStream, byte[], int, CharArraySet)public WordDelimiterFilter(TokenStream in, int configurationFlags, CharArraySet protWords)
WordDelimiterIterator.DEFAULT_WORD_DELIM_TABLE
 as its charTypeTablein - TokenStream to be filteredconfigurationFlags - Flags configuring the filterprotWords - If not null is the set of tokens to protect from being delimited@Deprecated public WordDelimiterFilter(Version matchVersion, TokenStream in, int configurationFlags, CharArraySet protWords)
WordDelimiterFilter(TokenStream, int, CharArraySet)public boolean incrementToken()
                       throws IOException
incrementToken in class TokenStreamIOExceptionpublic void reset()
           throws IOException
reset in class TokenFilterIOExceptionCopyright © 2000-2014 Apache Software Foundation. All Rights Reserved.