Class DirectoryTaxonomyWriter
- java.lang.Object
-
- org.apache.lucene.facet.taxonomy.directory.DirectoryTaxonomyWriter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
,TaxonomyWriter
,TwoPhaseCommit
- Direct Known Subclasses:
ReindexingEnrichedDirectoryTaxonomyWriter
public class DirectoryTaxonomyWriter extends Object implements TaxonomyWriter
TaxonomyWriter
which uses aDirectory
to store the taxonomy information on disk, and keeps an additional in-memory cache of some or all categories.In addition to the permanently-stored information in the
Directory
, efficiency dictates that we also keep an in-memory cache of recently seen or all categories, so that we do not need to go back to disk for every category addition to see which ordinal this category already has, if any. ATaxonomyWriterCache
object determines the specific caching algorithm used.This class offers some hooks for extending classes to control the
IndexWriter
instance that is used. SeeopenIndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.index.IndexWriterConfig)
.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
DirectoryTaxonomyWriter.DiskOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap
maintained on file systemstatic class
DirectoryTaxonomyWriter.MemoryOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap
maintained in memorystatic interface
DirectoryTaxonomyWriter.OrdinalMap
Mapping from old ordinal to new ordinals, used when merging indexes with separate taxonomies.
-
Field Summary
Fields Modifier and Type Field Description static String
INDEX_EPOCH
Property name of user commit data that contains the index epoch.
-
Constructor Summary
Constructors Constructor Description DirectoryTaxonomyWriter(Directory d)
Create this withOpenMode.CREATE_OR_APPEND
.DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode)
Creates a new instance with a default cache as defined bydefaultTaxonomyWriterCache()
.DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode, TaxonomyWriterCache cache)
Construct a Taxonomy writer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
addCategory(FacetLabel categoryPath)
addCategory() adds a category with a given path name to the taxonomy, and returns its ordinal.void
addTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.OrdinalMap map)
Takes the categories from the given taxonomy directory, and adds the missing ones to this taxonomy.void
close()
Frees used resources as well as closes the underlyingIndexWriter
, which commits whatever changes made to it to the underlyingDirectory
.protected void
closeResources()
A hook for extending classes to close additional resources that were used.long
commit()
protected IndexWriterConfig
createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
Create theIndexWriterConfig
that would be used for opening the internal index writer.static TaxonomyWriterCache
defaultTaxonomyWriterCache()
Defines the defaultTaxonomyWriterCache
to use in constructors which do not specify one.protected void
enrichOrdinalDocument(Document d, FacetLabel categoryPath)
Child classes can implement this method to modify the document corresponding to a category path before indexing it.protected void
ensureOpen()
Verifies that this instance wasn't closed, or throwsAlreadyClosedException
if it is.protected int
findCategory(FacetLabel categoryPath)
Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy.TaxonomyWriterCache
getCache()
Returns theTaxonomyWriterCache
in use by this writer.Directory
getDirectory()
Returns theDirectory
of this taxonomy writer.Iterable<Map.Entry<String,String>>
getLiveCommitData()
Returns the commit user data iterable that was set onTaxonomyWriter.setLiveCommitData(Iterable)
.int
getParent(int ordinal)
getParent() returns the ordinal of the parent category of the category with the given ordinal.int
getSize()
getSize() returns the number of categories in the taxonomy.long
getTaxonomyEpoch()
Expert: returns current index epoch, if this is a near-real-time reader.protected IndexWriter
openIndexWriter(Directory directory, IndexWriterConfig config)
Open internal index writer, which contains the taxonomy data.long
prepareCommit()
prepare most of the work needed for a two-phase commit.void
replaceTaxonomy(Directory taxoDir)
Replaces the current taxonomy with the given one.void
rollback()
Rollback changes to the taxonomy writer and closes the instance.void
setCacheMissesUntilFill(int i)
Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache.void
setLiveCommitData(Iterable<Map.Entry<String,String>> commitUserData)
Sets the commit user data iterable.boolean
useNumericDocValuesForOrdinals()
Determine whether-or-not to store taxonomy ordinals for each document using the older binary format or the newer SortedNumericDocValues format (based on the version used to create the index).
-
-
-
Field Detail
-
INDEX_EPOCH
public static final String INDEX_EPOCH
Property name of user commit data that contains the index epoch. The epoch changes whenever the taxonomy is recreated (i.e. opened withIndexWriterConfig.OpenMode.CREATE
.Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
DirectoryTaxonomyWriter
public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode, TaxonomyWriterCache cache) throws IOException
Construct a Taxonomy writer.- Parameters:
directory
- TheDirectory
in which to store the taxonomy. Note that the taxonomy is written directly to that directory (not to a subdirectory of it).openMode
- Specifies how to open a taxonomy for writing:APPEND
means open an existing index for append (failing if the index does not yet exist).CREATE
means create a new index (first deleting the old one if it already existed).APPEND_OR_CREATE
appends to an existing index if there is one, otherwise it creates a new index.cache
- ATaxonomyWriterCache
implementation which determines the in-memory caching policy. See for exampleLruTaxonomyWriterCache
. If null or missing,defaultTaxonomyWriterCache()
is used.- Throws:
CorruptIndexException
- if the taxonomy is corrupted.LockObtainFailedException
- if the taxonomy is locked by another writer.IOException
- if another error occurred.
-
DirectoryTaxonomyWriter
public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode) throws IOException
Creates a new instance with a default cache as defined bydefaultTaxonomyWriterCache()
.- Throws:
IOException
-
DirectoryTaxonomyWriter
public DirectoryTaxonomyWriter(Directory d) throws IOException
Create this withOpenMode.CREATE_OR_APPEND
.- Throws:
IOException
-
-
Method Detail
-
getCache
public TaxonomyWriterCache getCache()
Returns theTaxonomyWriterCache
in use by this writer.
-
openIndexWriter
protected IndexWriter openIndexWriter(Directory directory, IndexWriterConfig config) throws IOException
Open internal index writer, which contains the taxonomy data.Extensions may provide their own
IndexWriter
implementation or instance.
NOTE: the instance this method returns will be closed upon calling toclose()
.
NOTE: the merge policy in effect must not merge none adjacent segments. See comment increateIndexWriterConfig(IndexWriterConfig.OpenMode)
for the logic behind this.- Parameters:
directory
- theDirectory
on top of which anIndexWriter
should be opened.config
- configuration for the internal index writer.- Throws:
IOException
- See Also:
createIndexWriterConfig(IndexWriterConfig.OpenMode)
-
createIndexWriterConfig
protected IndexWriterConfig createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
Create theIndexWriterConfig
that would be used for opening the internal index writer.
Extensions can configure theIndexWriter
as they see fit, including setting amerge-scheduler
, ordeletion-policy
, different RAM size etc.
NOTE: internal docids of the configured index must not be altered. For that, categories are never deleted from the taxonomy index. In addition, merge policy in effect must not merge none adjacent segments.- Parameters:
openMode
- seeIndexWriterConfig.OpenMode
- See Also:
openIndexWriter(Directory, IndexWriterConfig)
-
defaultTaxonomyWriterCache
public static TaxonomyWriterCache defaultTaxonomyWriterCache()
Defines the defaultTaxonomyWriterCache
to use in constructors which do not specify one.The current default is
LruTaxonomyWriterCache
-
close
public void close() throws IOException
Frees used resources as well as closes the underlyingIndexWriter
, which commits whatever changes made to it to the underlyingDirectory
.- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Throws:
IOException
-
closeResources
protected void closeResources() throws IOException
A hook for extending classes to close additional resources that were used. The default implementation closes theIndexReader
as well as theTaxonomyWriterCache
instances that were used.
NOTE: if you override this method, you should include asuper.closeResources()
call in your implementation.- Throws:
IOException
-
findCategory
protected int findCategory(FacetLabel categoryPath) throws IOException
Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy.- Throws:
IOException
-
addCategory
public int addCategory(FacetLabel categoryPath) throws IOException
Description copied from interface:TaxonomyWriter
addCategory() adds a category with a given path name to the taxonomy, and returns its ordinal. If the category was already present in the taxonomy, its existing ordinal is returned.Before adding a category, addCategory() makes sure that all its ancestor categories exist in the taxonomy as well. As result, the ordinal of a category is guaranteed to be smaller then the ordinal of any of its descendants.
- Specified by:
addCategory
in interfaceTaxonomyWriter
- Throws:
IOException
-
ensureOpen
protected final void ensureOpen()
Verifies that this instance wasn't closed, or throwsAlreadyClosedException
if it is.
-
enrichOrdinalDocument
protected void enrichOrdinalDocument(Document d, FacetLabel categoryPath)
Child classes can implement this method to modify the document corresponding to a category path before indexing it.- WARNING: This API is experimental and might change in incompatible ways in the next release.
-
commit
public long commit() throws IOException
- Specified by:
commit
in interfaceTwoPhaseCommit
- Throws:
IOException
-
setLiveCommitData
public void setLiveCommitData(Iterable<Map.Entry<String,String>> commitUserData)
Description copied from interface:TaxonomyWriter
Sets the commit user data iterable. SeeIndexWriter.setLiveCommitData(java.lang.Iterable<java.util.Map.Entry<java.lang.String, java.lang.String>>)
.- Specified by:
setLiveCommitData
in interfaceTaxonomyWriter
-
getLiveCommitData
public Iterable<Map.Entry<String,String>> getLiveCommitData()
Description copied from interface:TaxonomyWriter
Returns the commit user data iterable that was set onTaxonomyWriter.setLiveCommitData(Iterable)
.- Specified by:
getLiveCommitData
in interfaceTaxonomyWriter
-
prepareCommit
public long prepareCommit() throws IOException
prepare most of the work needed for a two-phase commit. SeeIndexWriter.prepareCommit()
.- Specified by:
prepareCommit
in interfaceTwoPhaseCommit
- Throws:
IOException
-
getSize
public int getSize()
Description copied from interface:TaxonomyWriter
getSize() returns the number of categories in the taxonomy.Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
- Specified by:
getSize
in interfaceTaxonomyWriter
-
setCacheMissesUntilFill
public void setCacheMissesUntilFill(int i)
Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache.This taxonomy writer holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.
If the number is set to
0
, the entire taxonomy is read into the cache on first use, without fetching individual categories first.NOTE: it is assumed that this method is called immediately after the taxonomy writer has been created.
-
getParent
public int getParent(int ordinal) throws IOException
Description copied from interface:TaxonomyWriter
getParent() returns the ordinal of the parent category of the category with the given ordinal.When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an IndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy. TODO (Facet): instead of a getParent(ordinal) method, consider having a
getCategory(categorypath, prefixlen) which is similar to addCategory except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by addCategory() so it means less code.
- Specified by:
getParent
in interfaceTaxonomyWriter
- Throws:
IOException
-
addTaxonomy
public void addTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.OrdinalMap map) throws IOException
Takes the categories from the given taxonomy directory, and adds the missing ones to this taxonomy. Additionally, it fills the givenDirectoryTaxonomyWriter.OrdinalMap
with a mapping from the original ordinal to the new ordinal.- Throws:
IOException
-
rollback
public void rollback() throws IOException
Rollback changes to the taxonomy writer and closes the instance. Following this method the instance becomes unusable (calling any of its API methods will yield anAlreadyClosedException
).- Specified by:
rollback
in interfaceTwoPhaseCommit
- Throws:
IOException
-
replaceTaxonomy
public void replaceTaxonomy(Directory taxoDir) throws IOException
Replaces the current taxonomy with the given one. This method should generally be called in conjunction withIndexWriter.addIndexes(Directory...)
to replace both the taxonomy and the search index content.- Throws:
IOException
-
getTaxonomyEpoch
public final long getTaxonomyEpoch()
Expert: returns current index epoch, if this is a near-real-time reader. Used byDirectoryTaxonomyReader
to support NRT.- NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
-
useNumericDocValuesForOrdinals
public boolean useNumericDocValuesForOrdinals()
Description copied from interface:TaxonomyWriter
Determine whether-or-not to store taxonomy ordinals for each document using the older binary format or the newer SortedNumericDocValues format (based on the version used to create the index).- Specified by:
useNumericDocValuesForOrdinals
in interfaceTaxonomyWriter
-
-