org.apache.solr.schema
Class SimplePreAnalyzedParser

java.lang.Object
  extended by org.apache.solr.schema.SimplePreAnalyzedParser
All Implemented Interfaces:
PreAnalyzedField.PreAnalyzedParser

public final class SimplePreAnalyzedParser
extends Object
implements PreAnalyzedField.PreAnalyzedParser

Simple plain text format parser for PreAnalyzedField.

Serialization format

The format of the serialization is as follows:

 content ::= version (stored)? tokens
 version ::= digit+ " "
 ; stored field value - any "=" inside must be escaped!
 stored ::= "=" text "="
 tokens ::= (token ((" ") + token)*)*
 token ::= text ("," attrib)*
 attrib ::= name '=' value
 name ::= text
 value ::= text
 

Special characters in "text" values can be escaped using the escape character \ . The following escape sequences are recognized:

 "\ " - literal space character
 "\," - literal , character
 "\=" - literal = character
 "\\" - literal \ character
 "\n" - newline
 "\r" - carriage return
 "\t" - horizontal tab
 
Please note that Unicode sequences (e.g. \u0001) are not supported.

Supported attribute names

The following token attributes are supported, and identified with short symbolic names:
 i - position increment (integer)
 s - token offset, start position (integer)
 e - token offset, end position (integer)
 t - token type (string)
 f - token flags (hexadecimal integer)
 p - payload (bytes in hexadecimal format)
 
Token positions are tracked and implicitly added to the token stream - the start and end offsets consider only the term text and whitespace, and exclude the space taken by token attributes.

Example token streams

 1 one two three
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
 1 one  two   three 
  - version 1
  - stored: 'null'
  - tok: '(term=one,startOffset=1,endOffset=4)'
  - tok: '(term=two,startOffset=6,endOffset=9)'
  - tok: '(term=three,startOffset=12,endOffset=17)'
1 one,s=123,e=128,i=22  two three,s=20,e=22
  - version 1
  - stored: 'null'
  - tok: '(term=one,positionIncrement=22,startOffset=123,endOffset=128)'
  - tok: '(term=two,positionIncrement=1,startOffset=5,endOffset=8)'
  - tok: '(term=three,positionIncrement=1,startOffset=20,endOffset=22)'
1 \ one\ \,,i=22,a=\, two\=

  \n,\ =\   \
  - version 1
  - stored: 'null'
  - tok: '(term= one ,,positionIncrement=22,startOffset=0,endOffset=6)'
  - tok: '(term=two=


 ,positionIncrement=1,startOffset=7,endOffset=15)'
  - tok: '(term=\,positionIncrement=1,startOffset=17,endOffset=18)'
1 ,i=22 ,i=33,s=2,e=20 , 
  - version 1
  - stored: 'null'
  - tok: '(term=,positionIncrement=22,startOffset=0,endOffset=0)'
  - tok: '(term=,positionIncrement=33,startOffset=2,endOffset=20)'
  - tok: '(term=,positionIncrement=1,startOffset=2,endOffset=2)'
1 =This is the stored part with \= 
 \n    \t escapes.=one two three 
  - version 1
  - stored: 'This is the stored part with = 
 \n    \t escapes.'
  - tok: '(term=one,startOffset=0,endOffset=3)'
  - tok: '(term=two,startOffset=4,endOffset=7)'
  - tok: '(term=three,startOffset=8,endOffset=13)'
1 ==
  - version 1
  - stored: ''
  - (no tokens)
1 =this is a test.=
  - version 1
  - stored: 'this is a test.'
  - (no tokens)
 


Constructor Summary
SimplePreAnalyzedParser()
           
 
Method Summary
 PreAnalyzedField.ParseResult parse(Reader reader, AttributeSource parent)
          Parse input.
 String toFormattedString(Field f)
          Format a field so that the resulting String is valid for parsing with PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource).
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

SimplePreAnalyzedParser

public SimplePreAnalyzedParser()
Method Detail

parse

public PreAnalyzedField.ParseResult parse(Reader reader,
                                          AttributeSource parent)
                                   throws IOException
Description copied from interface: PreAnalyzedField.PreAnalyzedParser
Parse input.

Specified by:
parse in interface PreAnalyzedField.PreAnalyzedParser
Parameters:
reader - input to read from
parent - parent who will own the resulting states (tokens with attributes)
Returns:
parse result, with possibly null stored and/or states fields.
Throws:
IOException - if a parsing error or IO error occurs

toFormattedString

public String toFormattedString(Field f)
                         throws IOException
Description copied from interface: PreAnalyzedField.PreAnalyzedParser
Format a field so that the resulting String is valid for parsing with PreAnalyzedField.PreAnalyzedParser.parse(Reader, AttributeSource).

Specified by:
toFormattedString in interface PreAnalyzedField.PreAnalyzedParser
Parameters:
f - field instance
Returns:
formatted string
Throws:
IOException - If there is a low-level I/O error.


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.