@InterfaceAudience.Public @InterfaceStability.Unstable @NotThreadSafe public class BloomFilter extends Object
It can be used to filter all the records which are wanted, but doesn't guarantee to filter out all the records which are not wanted.
Please check this wiki for more details.
The BloomFilter
here is a scanning filter and used to constrain the number of records
returned from TServer. It provides different types of put
methods. When you put
a
record into BloomFilter
, it means you expect the TServer to return records with
the same value in a scan.
Here is an example for use:
BloomFilter bf = BloomFilter.BySizeAndFPRate(numBytes);
bf.put(1);
bf.put(3);
bf.put(4);
byte[] bitSet = bf.getBitSet();
byte[] numHashes = bf.getNumHashes();
String hashFunctionName = bf.getHashFunctionName();
// TODO: implement the interface for serializing and sending
// (bitSet, numHashes, hashFunctionName) to TServer.
Modifier and Type | Method and Description |
---|---|
static BloomFilter |
byCount(int expectedCount)
Generate bloom filter, default hashing is
Murmur2 and false positive rate is 0.01. |
static BloomFilter |
byCountAndFPRate(int expectedCount,
double fpRate)
Generate bloom filter, default hashing is
Murmur2 . |
static BloomFilter |
byCountAndFPRate(int expectedCount,
double fpRate,
org.apache.kudu.util.BloomFilter.HashFunction hashFunction)
Generate bloom filter.
|
static BloomFilter |
bySize(int numBytes)
Generate bloom filter, default hashing is
Murmur2 and false positive rate is 0.01. |
static BloomFilter |
bySizeAndFPRate(int numBytes,
double fpRate)
Generate bloom filter, default hashing is
Murmur2 . |
static BloomFilter |
bySizeAndFPRate(int numBytes,
double fpRate,
org.apache.kudu.util.BloomFilter.HashFunction hashFunction)
Generate bloom filter.
|
byte[] |
getBitSet()
Get the internal bit set in bytes.
|
String |
getHashFunctionName()
Get the name of hashing used when updating or checking containment.
|
int |
getNumHashes()
Get the number of hashing times when updating or checking containment.
|
void |
put(boolean data)
Update bloom filter with a
boolean . |
void |
put(byte data)
Update bloom filter with a
byte . |
void |
put(byte[] data)
Update bloom filter with a
byte[] . |
void |
put(double data)
Update bloom filter with a
double . |
void |
put(float data)
Update bloom filter with a
float . |
void |
put(int data)
Update bloom filter with a
int . |
void |
put(long data)
Update bloom filter with a
long . |
void |
put(short data)
Update bloom filter with a
short . |
void |
put(String data)
Update bloom filter with a
String . |
String |
toString() |
public static BloomFilter bySize(int numBytes)
Murmur2
and false positive rate is 0.01.numBytes
- size of bloom filter in bytespublic static BloomFilter bySizeAndFPRate(int numBytes, double fpRate)
Murmur2
.numBytes
- size of bloom filter in bytesfpRate
- the probability that TServer will erroneously return a record that has not
ever been put
into the BloomFilter
.public static BloomFilter bySizeAndFPRate(int numBytes, double fpRate, org.apache.kudu.util.BloomFilter.HashFunction hashFunction)
numBytes
- size of bloom filter in bytesfpRate
- the probability that TServer will erroneously return a record that has not
ever been put
into the BloomFilter
.hashFunction
- hashing used when updating or checking containment, user should pick
the hashing function from HashFunctions
public static BloomFilter byCount(int expectedCount)
Murmur2
and false positive rate is 0.01.expectedCount
- The expected number of elements, targeted by this bloom filter.
It is used to size the bloom filter.public static BloomFilter byCountAndFPRate(int expectedCount, double fpRate)
Murmur2
.expectedCount
- The expected number of elements, targeted by this bloom filter.
It is used to size the bloom filter.fpRate
- the probability that TServer will erroneously return a record that has not
ever been put
into the BloomFilter
.public static BloomFilter byCountAndFPRate(int expectedCount, double fpRate, org.apache.kudu.util.BloomFilter.HashFunction hashFunction)
expectedCount
- The expected number of elements, targeted by this bloom filter.
It is used to size the bloom filter.fpRate
- the probability that TServer will erroneously return a record that has not
ever been put
into the BloomFilter
.hashFunction
- hashing used when updating or checking containment, user should pick
the hashing function from HashFunctions
public void put(byte[] data)
byte[]
.public void put(boolean data)
boolean
.public void put(byte data)
byte
.public void put(short data)
short
.public void put(int data)
int
.public void put(long data)
long
.public void put(float data)
float
.public void put(double data)
double
.public void put(String data)
String
.public byte[] getBitSet()
public int getNumHashes()
public String getHashFunctionName()