org.apache.lucene.codecs.simpletext
Class SimpleTextDocValuesFormat
java.lang.Object
org.apache.lucene.codecs.DocValuesFormat
org.apache.lucene.codecs.simpletext.SimpleTextDocValuesFormat
- All Implemented Interfaces:
- NamedSPILoader.NamedSPI
public class SimpleTextDocValuesFormat
- extends DocValuesFormat
plain text doc values format.
FOR RECREATIONAL USE ONLY
the .dat file contains the data.
for numbers this is a "fixed-width" file, for example a single byte range:
field myField
type NUMERIC
minvalue 0
pattern 000
005
234
123
...
so a document's value (delta encoded from minvalue) can be retrieved by
seeking to startOffset + (1+pattern.length())*docid. The extra 1 is the newline.
for bytes this is also a "fixed-width" file, for example:
field myField
type BINARY
maxlength 6
pattern 0
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
so a doc's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*doc
the extra 9 is 2 newlines, plus "length " itself.
for sorted bytes this is a fixed-width file, for example:
field myField
type SORTED
numvalues 10
maxLength 8
pattern 0
ordpattern 00
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
03
06
01
10
...
so the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues.
a document's ord can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid
an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord
for sorted set this is a fixed-width file very similar to the SORTED case, for example:
field myField
type SORTED_SET
numvalues 10
maxLength 8
pattern 0
ordpattern XXXXX
length 6
foobar[space][space]
length 3
baz[space][space][space][space][space]
...
0,3,5
1,2
10
...
so the "ord section" begins at startOffset + (9+pattern.length+maxlength)*numValues.
a document's ord list can be retrieved by seeking to "ord section" + (1+ordpattern.length())*docid
this is a comma-separated list, and its padded with spaces to be fixed width. so trim() and split() it.
and beware the empty string!
an ord's value can be retrieved by seeking to startOffset + (9+pattern.length+maxlength)*ord
the reader can just scan this file when it opens, skipping over the data blocks
and saving the offset/etc for each field.
- WARNING: This API is experimental and might change in incompatible ways in the next release.
SimpleTextDocValuesFormat
public SimpleTextDocValuesFormat()
fieldsConsumer
public DocValuesConsumer fieldsConsumer(SegmentWriteState state)
throws IOException
- Specified by:
fieldsConsumer
in class DocValuesFormat
- Throws:
IOException
fieldsProducer
public DocValuesProducer fieldsProducer(SegmentReadState state)
throws IOException
- Specified by:
fieldsProducer
in class DocValuesFormat
- Throws:
IOException
Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.