org.apache.lucene.util
Class OpenBitSet

java.lang.Object
  extended by org.apache.lucene.search.DocIdSet
      extended by org.apache.lucene.util.OpenBitSet
All Implemented Interfaces:
Serializable, Cloneable
Direct Known Subclasses:
OpenBitSetDISI

public class OpenBitSet
extends DocIdSet
implements Cloneable, Serializable

An "open" BitSet implementation that allows direct access to the array of words storing the bits.

Unlike java.util.bitset, the fact that bits are packed into an array of longs is part of the interface. This allows efficient implementation of other algorithms by someone other than the author. It also allows one to efficiently implement alternate serialization or interchange formats.

OpenBitSet is faster than java.util.BitSet in most operations and *much* faster at calculating cardinality of sets and results of set operations. It can also handle sets of larger cardinality (up to 64 * 2**32-1)

The goals of OpenBitSet are the fastest implementation possible, and maximum code reuse. Extra safety and encapsulation may always be built on top, but if that's built in, the cost can never be removed (and hence people re-implement their own version in order to get better performance). If you want a "safe", totally encapsulated (and slower and limited) BitSet class, use java.util.BitSet.

Performance Results

Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinality intersect_count union nextSetBit get iterator
50% full 3.36 3.96 1.44 1.46 1.99 1.58
1% full 3.31 3.90   1.04   0.99

Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M
BitSet size = 1,000,000
Results are java.util.BitSet time divided by OpenBitSet time.
cardinality intersect_count union nextSetBit get iterator
50% full 2.50 3.50 1.00 1.03 1.12 1.25
1% full 2.51 3.49   1.00   1.02

Version:
$Id$
See Also:
Serialized Form

Field Summary
protected  long[] bits
           
protected  int wlen
           
 
Fields inherited from class org.apache.lucene.search.DocIdSet
EMPTY_DOCIDSET
 
Constructor Summary
OpenBitSet()
           
OpenBitSet(long numBits)
          Constructs an OpenBitSet large enough to hold numBits.
OpenBitSet(long[] bits, int numWords)
          Constructs an OpenBitSet from an existing long[].
 
Method Summary
 void and(OpenBitSet other)
           
 void andNot(OpenBitSet other)
           
static long andNotCount(OpenBitSet a, OpenBitSet b)
          Returns the popcount or cardinality of "a and not b" or "intersection(a, not(b))".
static int bits2words(long numBits)
          returns the number of 64 bit words it would take to hold numBits
 long capacity()
          Returns the current capacity in bits (1 greater than the index of the last bit)
 long cardinality()
           
 void clear(int startIndex, int endIndex)
          Clears a range of bits.
 void clear(long index)
          clears a bit, allowing access beyond the current set size without changing the size.
 void clear(long startIndex, long endIndex)
          Clears a range of bits.
 Object clone()
           
 void ensureCapacity(long numBits)
          Ensure that the long[] is big enough to hold numBits, expanding it if necessary.
 void ensureCapacityWords(int numWords)
          Expand the long[] with the size given as a number of words (64 bit longs).
 boolean equals(Object o)
          returns true if both sets have the same bits set
protected  int expandingWordNum(long index)
           
 void fastClear(int index)
          clears a bit.
 void fastClear(long index)
          clears a bit.
 void fastFlip(int index)
          flips a bit.
 void fastFlip(long index)
          flips a bit.
 boolean fastGet(int index)
          Returns true or false for the specified bit index.
 boolean fastGet(long index)
          Returns true or false for the specified bit index.
 void fastSet(int index)
          Sets the bit at the specified index.
 void fastSet(long index)
          Sets the bit at the specified index.
 void flip(long index)
          flips a bit, expanding the set size if necessary
 void flip(long startIndex, long endIndex)
          Flips a range of bits, expanding the set size if necessary
 boolean flipAndGet(int index)
          flips a bit and returns the resulting bit value.
 boolean flipAndGet(long index)
          flips a bit and returns the resulting bit value.
 boolean get(int index)
          Returns true or false for the specified bit index.
 boolean get(long index)
          Returns true or false for the specified bit index
 boolean getAndSet(int index)
          Sets a bit and returns the previous value.
 boolean getAndSet(long index)
          Sets a bit and returns the previous value.
 int getBit(int index)
          returns 1 if the bit is set, 0 if not.
 long[] getBits()
          Expert: returns the long[] storing the bits
 int getNumWords()
          Expert: gets the number of longs in the array that are in use
 int hashCode()
           
 void intersect(OpenBitSet other)
          this = this AND other
static long intersectionCount(OpenBitSet a, OpenBitSet b)
          Returns the popcount or cardinality of the intersection of the two sets.
 boolean intersects(OpenBitSet other)
          returns true if the sets have any elements in common
 boolean isCacheable()
          This DocIdSet implementation is cacheable.
 boolean isEmpty()
          Returns true if there are no set bits
 DocIdSetIterator iterator()
          Provides a DocIdSetIterator to access the set.
 int nextSetBit(int index)
          Returns the index of the first set bit starting at the index specified.
 long nextSetBit(long index)
          Returns the index of the first set bit starting at the index specified.
 void or(OpenBitSet other)
           
 void remove(OpenBitSet other)
          Remove all elements set in other.
 void set(long index)
          sets a bit, expanding the set size if necessary
 void set(long startIndex, long endIndex)
          Sets a range of bits, expanding the set size if necessary
 void setBits(long[] bits)
          Expert: sets a new long[] to use as the bit storage
 void setNumWords(int nWords)
          Expert: sets the number of longs in the array that are in use
 long size()
          Returns the current capacity of this set.
 void trimTrailingZeros()
          Lowers numWords, the number of words in use, by checking for trailing zero words.
 void union(OpenBitSet other)
          this = this OR other
static long unionCount(OpenBitSet a, OpenBitSet b)
          Returns the popcount or cardinality of the union of the two sets.
 void xor(OpenBitSet other)
          this = this XOR other
static long xorCount(OpenBitSet a, OpenBitSet b)
          Returns the popcount or cardinality of the exclusive-or of the two sets.
 
Methods inherited from class java.lang.Object
finalize, getClass, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

bits

protected long[] bits

wlen

protected int wlen
Constructor Detail

OpenBitSet

public OpenBitSet(long numBits)
Constructs an OpenBitSet large enough to hold numBits.

Parameters:
numBits -

OpenBitSet

public OpenBitSet()

OpenBitSet

public OpenBitSet(long[] bits,
                  int numWords)
Constructs an OpenBitSet from an existing long[].
The first 64 bits are in long[0], with bit index 0 at the least significant bit, and bit index 63 at the most significant. Given a bit index, the word containing it is long[index/64], and it is at bit number index%64 within that word.

numWords are the number of elements in the array that contain set bits (non-zero longs). numWords should be <= bits.length, and any existing words in the array at position >= numWords should be zero.

Method Detail

iterator

public DocIdSetIterator iterator()
Description copied from class: DocIdSet
Provides a DocIdSetIterator to access the set. This implementation can return null or DocIdSet.EMPTY_DOCIDSET.iterator() if there are no docs that match.

Specified by:
iterator in class DocIdSet

isCacheable

public boolean isCacheable()
This DocIdSet implementation is cacheable.

Overrides:
isCacheable in class DocIdSet

capacity

public long capacity()
Returns the current capacity in bits (1 greater than the index of the last bit)


size

public long size()
Returns the current capacity of this set. Included for compatibility. This is *not* equal to cardinality()


isEmpty

public boolean isEmpty()
Returns true if there are no set bits


getBits

public long[] getBits()
Expert: returns the long[] storing the bits


setBits

public void setBits(long[] bits)
Expert: sets a new long[] to use as the bit storage


getNumWords

public int getNumWords()
Expert: gets the number of longs in the array that are in use


setNumWords

public void setNumWords(int nWords)
Expert: sets the number of longs in the array that are in use


get

public boolean get(int index)
Returns true or false for the specified bit index.


fastGet

public boolean fastGet(int index)
Returns true or false for the specified bit index. The index should be less than the OpenBitSet size


get

public boolean get(long index)
Returns true or false for the specified bit index


fastGet

public boolean fastGet(long index)
Returns true or false for the specified bit index. The index should be less than the OpenBitSet size.


getBit

public int getBit(int index)
returns 1 if the bit is set, 0 if not. The index should be less than the OpenBitSet size


set

public void set(long index)
sets a bit, expanding the set size if necessary


fastSet

public void fastSet(int index)
Sets the bit at the specified index. The index should be less than the OpenBitSet size.


fastSet

public void fastSet(long index)
Sets the bit at the specified index. The index should be less than the OpenBitSet size.


set

public void set(long startIndex,
                long endIndex)
Sets a range of bits, expanding the set size if necessary

Parameters:
startIndex - lower index
endIndex - one-past the last bit to set

expandingWordNum

protected int expandingWordNum(long index)

fastClear

public void fastClear(int index)
clears a bit. The index should be less than the OpenBitSet size.


fastClear

public void fastClear(long index)
clears a bit. The index should be less than the OpenBitSet size.


clear

public void clear(long index)
clears a bit, allowing access beyond the current set size without changing the size.


clear

public void clear(int startIndex,
                  int endIndex)
Clears a range of bits. Clearing past the end does not change the size of the set.

Parameters:
startIndex - lower index
endIndex - one-past the last bit to clear

clear

public void clear(long startIndex,
                  long endIndex)
Clears a range of bits. Clearing past the end does not change the size of the set.

Parameters:
startIndex - lower index
endIndex - one-past the last bit to clear

getAndSet

public boolean getAndSet(int index)
Sets a bit and returns the previous value. The index should be less than the OpenBitSet size.


getAndSet

public boolean getAndSet(long index)
Sets a bit and returns the previous value. The index should be less than the OpenBitSet size.


fastFlip

public void fastFlip(int index)
flips a bit. The index should be less than the OpenBitSet size.


fastFlip

public void fastFlip(long index)
flips a bit. The index should be less than the OpenBitSet size.


flip

public void flip(long index)
flips a bit, expanding the set size if necessary


flipAndGet

public boolean flipAndGet(int index)
flips a bit and returns the resulting bit value. The index should be less than the OpenBitSet size.


flipAndGet

public boolean flipAndGet(long index)
flips a bit and returns the resulting bit value. The index should be less than the OpenBitSet size.


flip

public void flip(long startIndex,
                 long endIndex)
Flips a range of bits, expanding the set size if necessary

Parameters:
startIndex - lower index
endIndex - one-past the last bit to flip

cardinality

public long cardinality()
Returns:
the number of set bits

intersectionCount

public static long intersectionCount(OpenBitSet a,
                                     OpenBitSet b)
Returns the popcount or cardinality of the intersection of the two sets. Neither set is modified.


unionCount

public static long unionCount(OpenBitSet a,
                              OpenBitSet b)
Returns the popcount or cardinality of the union of the two sets. Neither set is modified.


andNotCount

public static long andNotCount(OpenBitSet a,
                               OpenBitSet b)
Returns the popcount or cardinality of "a and not b" or "intersection(a, not(b))". Neither set is modified.


xorCount

public static long xorCount(OpenBitSet a,
                            OpenBitSet b)
Returns the popcount or cardinality of the exclusive-or of the two sets. Neither set is modified.


nextSetBit

public int nextSetBit(int index)
Returns the index of the first set bit starting at the index specified. -1 is returned if there are no more set bits.


nextSetBit

public long nextSetBit(long index)
Returns the index of the first set bit starting at the index specified. -1 is returned if there are no more set bits.


clone

public Object clone()
Overrides:
clone in class Object

intersect

public void intersect(OpenBitSet other)
this = this AND other


union

public void union(OpenBitSet other)
this = this OR other


remove

public void remove(OpenBitSet other)
Remove all elements set in other. this = this AND_NOT other


xor

public void xor(OpenBitSet other)
this = this XOR other


and

public void and(OpenBitSet other)

or

public void or(OpenBitSet other)

andNot

public void andNot(OpenBitSet other)

intersects

public boolean intersects(OpenBitSet other)
returns true if the sets have any elements in common


ensureCapacityWords

public void ensureCapacityWords(int numWords)
Expand the long[] with the size given as a number of words (64 bit longs). getNumWords() is unchanged by this call.


ensureCapacity

public void ensureCapacity(long numBits)
Ensure that the long[] is big enough to hold numBits, expanding it if necessary. getNumWords() is unchanged by this call.


trimTrailingZeros

public void trimTrailingZeros()
Lowers numWords, the number of words in use, by checking for trailing zero words.


bits2words

public static int bits2words(long numBits)
returns the number of 64 bit words it would take to hold numBits


equals

public boolean equals(Object o)
returns true if both sets have the same bits set

Overrides:
equals in class Object

hashCode

public int hashCode()
Overrides:
hashCode in class Object


Copyright © 2000-2010 Apache Software Foundation. All Rights Reserved.