public class DirectoryTaxonomyWriter extends Object implements TaxonomyWriter
TaxonomyWriter
which uses a Directory
to store the taxonomy
information on disk, and keeps an additional in-memory cache of some or all
categories.
In addition to the permanently-stored information in the Directory
,
efficiency dictates that we also keep an in-memory cache of recently
seen or all categories, so that we do not need to go back to disk
for every category addition to see which ordinal this category already has,
if any. A TaxonomyWriterCache
object determines the specific caching
algorithm used.
This class offers some hooks for extending classes to control the
IndexWriter
instance that is used. See openIndexWriter(org.apache.lucene.store.Directory, org.apache.lucene.index.IndexWriterConfig)
.
Modifier and Type | Class and Description |
---|---|
static class |
DirectoryTaxonomyWriter.DiskOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained on file system |
static class |
DirectoryTaxonomyWriter.MemoryOrdinalMap
DirectoryTaxonomyWriter.OrdinalMap maintained in memory |
static interface |
DirectoryTaxonomyWriter.OrdinalMap
Mapping from old ordinal to new ordinals, used when merging indexes
wit separate taxonomies.
|
Modifier and Type | Field and Description |
---|---|
static String |
INDEX_EPOCH
Property name of user commit data that contains the index epoch.
|
Constructor and Description |
---|
DirectoryTaxonomyWriter(Directory d) |
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode)
Creates a new instance with a default cache as defined by
defaultTaxonomyWriterCache() . |
DirectoryTaxonomyWriter(Directory directory,
IndexWriterConfig.OpenMode openMode,
TaxonomyWriterCache cache)
Construct a Taxonomy writer.
|
Modifier and Type | Method and Description |
---|---|
int |
addCategory(CategoryPath categoryPath)
addCategory() adds a category with a given path name to the taxonomy,
and returns its ordinal.
|
void |
addTaxonomy(Directory taxoDir,
DirectoryTaxonomyWriter.OrdinalMap map)
Takes the categories from the given taxonomy directory, and adds the
missing ones to this taxonomy.
|
void |
close()
Frees used resources as well as closes the underlying
IndexWriter ,
which commits whatever changes made to it to the underlying
Directory . |
protected void |
closeResources()
A hook for extending classes to close additional resources that were used.
|
void |
commit() |
protected IndexWriterConfig |
createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
Create the
IndexWriterConfig that would be used for opening the internal index writer. |
static TaxonomyWriterCache |
defaultTaxonomyWriterCache()
Defines the default
TaxonomyWriterCache to use in constructors
which do not specify one. |
protected void |
ensureOpen()
Verifies that this instance wasn't closed, or throws
AlreadyClosedException if it is. |
protected int |
findCategory(CategoryPath categoryPath)
Look up the given category in the cache and/or the on-disk storage,
returning the category's ordinal, or a negative number in case the
category does not yet exist in the taxonomy.
|
Map<String,String> |
getCommitData()
Returns the commit user data map that was set on
TaxonomyWriter.setCommitData(Map) . |
Directory |
getDirectory()
Returns the
Directory of this taxonomy writer. |
int |
getParent(int ordinal)
getParent() returns the ordinal of the parent category of the category
with the given ordinal.
|
int |
getSize()
getSize() returns the number of categories in the taxonomy.
|
long |
getTaxonomyEpoch()
Expert: returns current index epoch, if this is a
near-real-time reader.
|
protected IndexWriter |
openIndexWriter(Directory directory,
IndexWriterConfig config)
Open internal index writer, which contains the taxonomy data.
|
void |
prepareCommit()
prepare most of the work needed for a two-phase commit.
|
void |
replaceTaxonomy(Directory taxoDir)
Replaces the current taxonomy with the given one.
|
void |
rollback()
Rollback changes to the taxonomy writer and closes the instance.
|
void |
setCacheMissesUntilFill(int i)
Set the number of cache misses before an attempt is made to read the entire
taxonomy into the in-memory cache.
|
void |
setCommitData(Map<String,String> commitUserData)
Sets the commit user data map.
|
void |
setDelimiter(char delimiter)
Changes the character that the taxonomy uses in its internal storage as a
delimiter between category components.
|
static void |
unlock(Directory directory)
Forcibly unlocks the taxonomy in the named directory.
|
public static final String INDEX_EPOCH
IndexWriterConfig.OpenMode.CREATE
.
Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.
public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode, TaxonomyWriterCache cache) throws IOException
directory
- The Directory
in which to store the taxonomy. Note that
the taxonomy is written directly to that directory (not to a
subdirectory of it).openMode
- Specifies how to open a taxonomy for writing: APPEND
means open an existing index for append (failing if the index does
not yet exist). CREATE
means create a new index (first
deleting the old one if it already existed).
APPEND_OR_CREATE
appends to an existing index if there
is one, otherwise it creates a new index.cache
- A TaxonomyWriterCache
implementation which determines
the in-memory caching policy. See for example
LruTaxonomyWriterCache
and Cl2oTaxonomyWriterCache
.
If null or missing, defaultTaxonomyWriterCache()
is used.CorruptIndexException
- if the taxonomy is corrupted.LockObtainFailedException
- if the taxonomy is locked by another writer. If it is known
that no other concurrent writer is active, the lock might
have been left around by an old dead process, and should be
removed using unlock(Directory)
.IOException
- if another error occurred.public DirectoryTaxonomyWriter(Directory directory, IndexWriterConfig.OpenMode openMode) throws IOException
defaultTaxonomyWriterCache()
.IOException
public DirectoryTaxonomyWriter(Directory d) throws IOException
IOException
public void setDelimiter(char delimiter)
If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all taxonomy writer and reader instances you create for the same directory.
public static void unlock(Directory directory) throws IOException
Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this taxonomy.
This method is unnecessary if your Directory
uses a
NativeFSLockFactory
instead of the default
SimpleFSLockFactory
. When the "native" lock is used, a lock
does not stay behind forever when the process using it dies.
IOException
protected IndexWriter openIndexWriter(Directory directory, IndexWriterConfig config) throws IOException
Extensions may provide their own IndexWriter
implementation or instance.
NOTE: the instance this method returns will be closed upon calling
to close()
.
NOTE: the merge policy in effect must not merge none adjacent segments. See
comment in createIndexWriterConfig(IndexWriterConfig.OpenMode)
for the logic behind this.
directory
- the Directory
on top of which an IndexWriter
should be opened.config
- configuration for the internal index writer.IOException
createIndexWriterConfig(IndexWriterConfig.OpenMode)
protected IndexWriterConfig createIndexWriterConfig(IndexWriterConfig.OpenMode openMode)
IndexWriterConfig
that would be used for opening the internal index writer.
IndexWriter
as they see fit,
including setting a merge-scheduler
, or
deletion-policy
, different RAM size
etc.openMode
- see IndexWriterConfig.OpenMode
openIndexWriter(Directory, IndexWriterConfig)
public static TaxonomyWriterCache defaultTaxonomyWriterCache()
TaxonomyWriterCache
to use in constructors
which do not specify one.
The current default is Cl2oTaxonomyWriterCache
constructed
with the parameters (1024, 0.15f, 3), i.e., the entire taxonomy is
cached in memory while building it.
public void close() throws IOException
IndexWriter
,
which commits whatever changes made to it to the underlying
Directory
.close
in interface Closeable
IOException
protected void closeResources() throws IOException
IndexReader
as well as the
TaxonomyWriterCache
instances that were used. super.closeResources()
call in your implementation.IOException
protected int findCategory(CategoryPath categoryPath) throws IOException
IOException
public int addCategory(CategoryPath categoryPath) throws IOException
TaxonomyWriter
Before adding a category, addCategory() makes sure that all its ancestor categories exist in the taxonomy as well. As result, the ordinal of a category is guaranteed to be smaller then the ordinal of any of its descendants.
addCategory
in interface TaxonomyWriter
IOException
protected final void ensureOpen()
AlreadyClosedException
if it is.public void commit() throws IOException
commit
in interface TwoPhaseCommit
IOException
public void setCommitData(Map<String,String> commitUserData)
TaxonomyWriter
committed
even if no other changes were made to
the writer instance.
NOTE: the map is cloned internally, therefore altering the map's contents after calling this method has no effect.
setCommitData
in interface TaxonomyWriter
public Map<String,String> getCommitData()
TaxonomyWriter
TaxonomyWriter.setCommitData(Map)
.getCommitData
in interface TaxonomyWriter
public void prepareCommit() throws IOException
IndexWriter.prepareCommit()
.prepareCommit
in interface TwoPhaseCommit
IOException
public int getSize()
TaxonomyWriter
Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.
Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).
getSize
in interface TaxonomyWriter
public void setCacheMissesUntilFill(int i)
This taxonomy writer holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.
If the number is set to 0
, the entire taxonomy is read into the
cache on first use, without fetching individual categories first.
NOTE: it is assumed that this method is called immediately after the taxonomy writer has been created.
public int getParent(int ordinal) throws IOException
TaxonomyWriter
When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back.
If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy. TODO (Facet): instead of a getParent(ordinal) method, consider having a
getCategory(categorypath, prefixlen) which is similar to addCategory except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by addCategory() so it means less code.
getParent
in interface TaxonomyWriter
IOException
public void addTaxonomy(Directory taxoDir, DirectoryTaxonomyWriter.OrdinalMap map) throws IOException
DirectoryTaxonomyWriter.OrdinalMap
with a mapping from the original ordinal to the new
ordinal.IOException
public void rollback() throws IOException
AlreadyClosedException
).rollback
in interface TwoPhaseCommit
IOException
public void replaceTaxonomy(Directory taxoDir) throws IOException
IndexWriter.addIndexes(Directory...)
to replace both the taxonomy
as well as the search index content.IOException
public final long getTaxonomyEpoch()
DirectoryTaxonomyReader
to support NRT.Copyright © 2000-2013 Apache Software Foundation. All Rights Reserved.