Redis HyperLogLog is used for calculating the cardinality of a set or non mathematically speaking it is a probabilistic data structure used to count unique values. Basically it is an algorithm that use randomization such that it can provide an approximation of the number of unique elements in a set by just using a constant complexity, and a fixed/small amount of memory footprint. Each key stored in hyperloglog is only about 12 kbytes per key and with a standard error of 0.81%. There is no limit to the number of items you can count in the HyperLogLog Datatype, unless you approach 2^64 items. If you are planning to use some form of unique count, e.g ipadress
Redis HyperLogLog Datatype – Operations
- PFADD: Adds or updates one or more members to HyperLogLog, Returns 1 if at least 1 HyperLogLog internal register was altered, 0 otherwise O(1)
- PFCOUNT: Returns the approximated cardinality computed by the HyperLogLog, 0 if the variable does not exist. O(1).
- PFMERGE: Merge multiple HyperLogLog values into an unique value O(N)
C# code using Redis HyperLogLog Datatype
class Program
{
static void Main(string[] args)
{
var redis = RedisStore.RedisCache;
RedisKey key = "setKey";
RedisKey alphaKey = "alphaKey";
RedisKey destinationKey = "destKey";
redis.KeyDelete(key, CommandFlags.FireAndForget);
redis.KeyDelete(alphaKey, CommandFlags.FireAndForget);
redis.KeyDelete(destinationKey, CommandFlags.FireAndForget);
redis.HyperLogLogAdd(key, "a");
redis.HyperLogLogAdd(key, "b");
redis.HyperLogLogAdd(key, "c");
Console.WriteLine(redis.HyperLogLogLength(key)); //output 3
redis.HyperLogLogAdd(alphaKey, "1");
redis.HyperLogLogAdd(alphaKey, "2");
redis.HyperLogLogAdd(alphaKey, "3");
Console.WriteLine(redis.HyperLogLogLength(alphaKey)); //output 3
redis.HyperLogLogAdd(alphaKey, "a");
Console.WriteLine(redis.HyperLogLogLength(alphaKey)); //output 4
redis.HyperLogLogMerge(destinationKey, key, alphaKey);
Console.WriteLine(redis.HyperLogLogLength(destinationKey)); //output 6 which is (a, b, c, 1, 2, 3)
Console.ReadKey();
}
}
So this covers the basic usage of Redish HyperLogLog Datatype, in the next blog post I will cover how to use Pub/Sub in Redis.
For the code please visit
https://github.com/taswar/RedisForNetDevelopers/tree/master/8.RedisHyperLogLog
For previous Redis topics
- Intro to Redis for .NET Developers
- Redis for .NET Developer – Connecting with C#
- Redis for .NET Developer – String Datatype
- Redis for .NET Developer – String Datatype part 2
- Redis for .NET Developer – Hash Datatype
- Redis for .NET Developer – List Datatype
- Redis for .NET Developer – Redis Sets Datatype
- Redis for .NET Developer – Redis Sorted Sets Datatype
View Comments (3)
Hi Taswar,
HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset. [Wikipedia].
Redis is has data in memory, and it's really fast. Why we would need an estimation.
Thanks
One of the reason why redis introduced hyperloglog is due to the memory footprint it uses for a count, is minimal. If you wish to read more on redis hyperloglog I would suggest looking at this blog post http://antirez.com/news/75
it gives a very good explanation of it.
"approach 264 items" should be "approach 2^64 items"