java.lang.Object
- org.apache.lucene.util.AttributeSource
- - org.apache.lucene.analysis.TokenStream
  - - org.apache.lucene.analysis.TokenFilter
    - - org.apache.lucene.analysis.ja.JapaneseNumberFilter

All Implemented Interfaces:

Closeable, AutoCloseable
```
public class JapaneseNumberFilter
extends TokenFilter
```
A TokenFilter that normalizes Japanese numbers (kansūji) to regular Arabic decimal numbers in half-width characters.
Japanese numbers are often written using a combination of kanji and Arabic numbers with various kinds punctuation. For example, ３．２千 means 3200. This filter does this kind of normalization and allows a search for 3200 to match ３．２千 in text, but can also be used to make range facets based on the normalized numbers and so on.
Notice that this analyzer uses a token composition scheme and relies on punctuation tokens being found in the token stream. Please make sure your JapaneseTokenizer has discardPunctuation set to false. In case punctuation characters, such as ． (U+FF0E FULLWIDTH FULL STOP), is removed from the token stream, this filter would find input tokens tokens ３ and ２千 and give outputs 3 and 2000 instead of 3200, which is likely not the intended result. If you want to remove punctuation characters from your index that are not part of normalized numbers, add a StopFilter with the punctuation you wish to remove after JapaneseNumberFilter in your analyzer chain.
Below are some examples of normalizations this filter supports. The input is untokenized text and the result is the single term attribute emitted for the input.
- 〇〇七 becomes 7
- 一〇〇〇 becomes 1000
- 三千2百２十三 becomes 3223
- 兆六百万五千一 becomes 1000006005001
- ３．２千 becomes 3200
- １．２万３４５．６７ becomes 12345.67
- 4,647.100 becomes 4647.1
- 15,7 becomes 157 (be aware of this weakness)
Tokens preceded by a token with PositionIncrementAttribute of zero are left left untouched and emitted as-is.
This filter does not use any part-of-speech information for its normalization and the motivation for this is to also support n-grammed token streams in the future.
This filter may in some cases normalize tokens that are not numbers in their context. For example, is 田中京一 is a name and means Tanaka Kyōichi, but 京一 (Kyōichi) out of context can strictly speaking also represent the number 10000000000000001. This filter respects the KeywordAttribute, which can be used to prevent specific normalizations from happening.
Also notice that token attributes such as PartOfSpeechAttribute, ReadingAttribute, InflectionAttribute and BaseFormAttribute are left unchanged and will inherit the values of the last token used to compose the normalized number and can be wrong. Hence, for １０万 (10000), we will have ReadingAttribute set to マン. This is a known issue and is subject to a future improvement.
Japanese formal numbers (daiji), accounting numbers and decimal fractions are currently not supported.

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class JapaneseNumberFilter.NumberBuffer
Buffer that holds a Japanese number string and a position index used as a parsed-to marker
- Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
  AttributeSource.State

Field Summary
- Fields inherited from class org.apache.lucene.analysis.TokenFilter
  input
- Fields inherited from class org.apache.lucene.analysis.TokenStream
  DEFAULT_TOKEN_ATTRIBUTE_FACTORY

Constructor Summary

Constructors
Constructor Description

JapaneseNumberFilter(TokenStream input)

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`boolean`	`incrementToken()`
`boolean`	`isArabicNumeral(char c)`	Arabic numeral predicate.
`boolean`	`isNumeral(char c)`	Numeral predicate
`boolean`	`isNumeral(String input)`	Numeral predicate
`boolean`	`isNumeralPunctuation(char c)`	Numeral punctuation predicate
`boolean`	`isNumeralPunctuation(String input)`	Numeral punctuation predicate
`String`	`normalizeNumber(String number)`	Normalizes a Japanese number
`BigDecimal`	`parseLargeKanjiNumeral(JapaneseNumberFilter.NumberBuffer buffer)`	Parse large kanji numerals (ten thousands or larger)
`BigDecimal`	`parseMediumKanjiNumeral(JapaneseNumberFilter.NumberBuffer buffer)`	Parse medium kanji numerals (tens, hundreds or thousands)
`void`	`reset()`

Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end

Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, endAttributes, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, removeAllAttributes, restoreState, toString

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - JapaneseNumberFilter
```
public JapaneseNumberFilter(TokenStream input)
```
- Method Detail
  - incrementToken
```
public final boolean incrementToken()
                             throws IOException
```
    Specified by:
    
    incrementToken in class TokenStream
    
    Throws:
    
    IOException
  - reset
```
public void reset()
           throws IOException
```
    Overrides:
    
    reset in class TokenFilter
    
    Throws:
    
    IOException
  - normalizeNumber
```
public String normalizeNumber(String number)
```
    Normalizes a Japanese number
    
    Parameters:
    
    number - number or normalize
    
    Returns:
    
    normalized number, or number to normalize on error (no op)
  - parseLargeKanjiNumeral
```
public BigDecimal parseLargeKanjiNumeral(JapaneseNumberFilter.NumberBuffer buffer)
```
    Parse large kanji numerals (ten thousands or larger)
    
    Parameters:
    
    buffer - buffer to parse
    
    Returns:
    
    parsed number, or null on error or end of input
  - parseMediumKanjiNumeral
```
public BigDecimal parseMediumKanjiNumeral(JapaneseNumberFilter.NumberBuffer buffer)
```
    Parse medium kanji numerals (tens, hundreds or thousands)
    
    Parameters:
    
    buffer - buffer to parse
    
    Returns:
    
    parsed number or null on error
  - isNumeral
```
public boolean isNumeral(String input)
```
    Numeral predicate
    
    Parameters:
    
    input - string to test
    
    Returns:
    
    true if and only if input is a numeral
  - isNumeral
```
public boolean isNumeral(char c)
```
    Numeral predicate
    
    Parameters:
    
    c - character to test
    
    Returns:
    
    true if and only if c is a numeral
  - isNumeralPunctuation
```
public boolean isNumeralPunctuation(String input)
```
    Numeral punctuation predicate
    
    Parameters:
    
    input - string to test
    
    Returns:
    
    true if and only if c is a numeral punctuation string
  - isNumeralPunctuation
```
public boolean isNumeralPunctuation(char c)
```
    Numeral punctuation predicate
    
    Parameters:
    
    c - character to test
    
    Returns:
    
    true if and only if c is a numeral punctuation character
  - isArabicNumeral
```
public boolean isArabicNumeral(char c)
```
    Arabic numeral predicate. Both half-width and full-width characters are supported
    
    Parameters:
    
    c - character to test
    
    Returns:
    
    true if and only if c is an Arabic numeral

Class JapaneseNumberFilter

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

Field Summary

Fields inherited from class org.apache.lucene.analysis.TokenFilter

Fields inherited from class org.apache.lucene.analysis.TokenStream

Constructor Summary

Method Summary

Methods inherited from class org.apache.lucene.analysis.TokenFilter

Methods inherited from class org.apache.lucene.util.AttributeSource

Methods inherited from class java.lang.Object

Constructor Detail

JapaneseNumberFilter

Method Detail

incrementToken

reset

normalizeNumber

parseLargeKanjiNumeral

parseMediumKanjiNumeral

isNumeral

isNumeral

isNumeralPunctuation

isNumeralPunctuation

isArabicNumeral