package sketch
Type Members
- abstract class BloomFilter extends AnyRef
A Bloom filter is a space-efficient probabilistic data structure that offers an approximate containment test with one-sided error: if it claims that an item is contained in it, this might be in error, but if it claims that an item is not contained in it, then this is definitely true.
A Bloom filter is a space-efficient probabilistic data structure that offers an approximate containment test with one-sided error: if it claims that an item is contained in it, this might be in error, but if it claims that an item is not contained in it, then this is definitely true. Currently supported data types include:
ByteShortIntegerLongString
The false positive probability (
FPP) of a Bloom filter is defined as the probability that #mightContain(Object) will erroneously returntruefor an object that has not actually been put in theBloomFilter.The implementation is largely based on the
BloomFilterclass from Guava. - abstract class CountMinSketch extends AnyRef
A Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space.
A Count-min sketch is a probabilistic data structure used for cardinality estimation using sub-linear space. Currently, supported data types include:
ByteShortIntegerLongString
A
CountMinSketchis initialized with a random seed, and a pair of parameters:- relative error (or
eps), and - confidence (or
delta)
Suppose you want to estimate the number of times an element
xhas appeared in a data stream so far. With probabilitydelta, the estimate of this frequency is within the rangetrue frequency <= estimate <= true frequency + eps * N, whereNis the total count of items have appeared the data stream so far.Under the cover, a
CountMinSketchis essentially a two-dimensionallongarray with depthdand widthw, whered = ceil(2 / eps)w = ceil(-log(1 - confidence) / log(2))
This implementation is largely based on the
CountMinSketchclass from stream-lib. - class IncompatibleMergeException extends Exception