object
TwitterAlgebirdCMS extends AnyRef
Value Members
-
final
def
!=(arg0: AnyRef): Boolean
-
final
def
!=(arg0: Any): Boolean
-
final
def
##(): Int
-
final
def
==(arg0: AnyRef): Boolean
-
final
def
==(arg0: Any): Boolean
-
final
def
asInstanceOf[T0]: T0
-
def
clone(): AnyRef
-
final
def
eq(arg0: AnyRef): Boolean
-
def
equals(arg0: Any): Boolean
-
def
finalize(): Unit
-
final
def
getClass(): java.lang.Class[_]
-
def
hashCode(): Int
-
final
def
isInstanceOf[T0]: Boolean
-
def
main(args: Array[String]): Unit
-
final
def
ne(arg0: AnyRef): Boolean
-
final
def
notify(): Unit
-
final
def
notifyAll(): Unit
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
-
def
toString(): String
-
final
def
wait(): Unit
-
final
def
wait(arg0: Long, arg1: Int): Unit
-
final
def
wait(arg0: Long): Unit
Inherited from AnyRef
Inherited from Any
Illustrates the use of the Count-Min Sketch, from Twitter's Algebird library, to compute windowed and global Top-K estimates of user IDs occurring in a Twitter stream.
Note that since Algebird's implementation currently only supports Long inputs, the example operates on Long IDs. Once the implementation supports other inputs (such as String), the same approach could be used for computing popular topics for example. This blog post has a good overview of the Count-Min Sketch (CMS). The CMS is a datastructure for approximate frequency estimation in data streams (e.g. Top-K elements, frequency of any given element, etc), that uses space sub-linear in the number of elements in the stream. Once elements are added to the CMS, the estimated count of an element can be computed, as well as "heavy-hitters" that occur more than a threshold percentage of the overall total count. Algebird's implementation is a monoid, so we can succinctly merge two CMS instances in the reduce operation.