org.apache.lucene.codecs.simpletext
Class SimpleTextDocValuesFormat

java.lang.Object
  extended by org.apache.lucene.codecs.DocValuesFormat
      extended by org.apache.lucene.codecs.simpletext.SimpleTextDocValuesFormat
All Implemented Interfaces:
NamedSPILoader.NamedSPI

public class SimpleTextDocValuesFormat
extends DocValuesFormat

plain text doc values format.

FOR RECREATIONAL USE ONLY

the .dat file contains the data. for numbers this is a "fixed-width" file, for example a single byte range:

  field myField
    type NUMERIC
    minvalue 0
    pattern 000
  005
  234
  123
  ...
  
so a document's value (delta encoded from minvalue) can be retrieved by seeking to startOffset + (1+pattern.length())*docid. The extra 1 is the newline. for bytes this is also a "fixed-width" file, for example:
  field myField
    type BINARY
    maxlength 6
    pattern 0
  length 6
  foobar[space][space]
  length 3
  baz[space][space][space][space][space]
  ...
  
so a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*doc the extra 9 is 2 newlines, plus "length " itself. for sorted bytes this is a fixed-width file, for example:
  field myField
    type SORTED
    numvalues 10
    maxLength 8
    pattern 0
    ordpattern 00
  length 6
  foobar[space][space]
  length 3
  baz[space][space][space][space][space]
  ...
  03
  06
  01
  10
  ...
  
so the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. a document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord for sorted set this is a fixed-width file very similar to the SORTED case, for example:
  field myField
    type SORTED_SET
    numvalues 10
    maxLength 8
    pattern 0
    ordpattern XXXXX
  length 6
  foobar[space][space]
  length 3
  baz[space][space][space][space][space]
  ...
  0,3,5   
  1,2
  
  10
  ...
  
so the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues. a document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it. and beware the empty string! an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord the reader can just scan this file when it opens, skipping over the data blocks and saving the offset/etc for each field.

WARNING: This API is experimental and might change in incompatible ways in the next release.

Constructor Summary
SimpleTextDocValuesFormat()
           
 
Method Summary
 DocValuesConsumer fieldsConsumer(SegmentWriteState state)
           
 DocValuesProducer fieldsProducer(SegmentReadState state)
           
 
Methods inherited from class org.apache.lucene.codecs.DocValuesFormat
availableDocValuesFormats, forName, getName, reloadDocValuesFormats, toString
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

SimpleTextDocValuesFormat

public SimpleTextDocValuesFormat()
Method Detail

fieldsConsumer

public DocValuesConsumer fieldsConsumer(SegmentWriteState state)
                                 throws IOException
Specified by:
fieldsConsumer in class DocValuesFormat
Throws:
IOException

fieldsProducer

public DocValuesProducer fieldsProducer(SegmentReadState state)
                                 throws IOException
Specified by:
fieldsProducer in class DocValuesFormat
Throws:
IOException


Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.