Redis and Azure table storage have a lot in common. Both have no schema’s enforced on the objects that they store. Both have limited complex query capabilities and both are touted as being “post SQL” storage. However there is one major difference between the two that tends to dampen comparisons and that is how they deal with concurrency.
Redis deals with concurrency by queuing requests and handling them one at a time. This allows for single instance Redis servers to deliver impressive performance and a bunch of nice features like set intersections and unions. On the down side this limits the scalability of a single Redis instance because you are stuck with just 1 thread doing the actual work.
Azure Table Storage on the other hand is a distributed platform that abstracts away a lot of what happens under the hood by keeping a simple interface. Azure Table Storage spreads keys across multiple storage nodes based on the partition keys, and orders the partion itself on the row keys. Any object that you want stored in Azure Tables need these two keys. This lends itself well to scalability but has implications on complex operations. Azure Table Storage offers no set based intersections as (understandably) – it’s hard to do so in a consistent manner at any given time across a distributed system.
So what to use? The answer of course is “it depends”. If you want a fast key value store that will not grow beyond a certain well defined load then you can stick to Redis and enjoy some of the set operations. If you want a system that can scale with ever increasing amounts of load then you might want to consider switching to Azure table storage.
Consider a system that tracks users across time. In Redis you could do this simply by creating a sorted set with a key signature that contains the minute that is under consideration and rank the users within the sorted set by the number of times they are seen in that minute:
Sorted Set – users.minute:37324
now for multiple minutes you have multiple sets:
Sorted Set – users.minute:37325
and if you want to find out the aggregate of these two minutes you can simply do a ZUNIONSTORE operation on these two sets resulting in a new set that contains both minutes worth of operations. Easy!
But now imagine you have to scale this to an arbitrary load such that a single instance of Redis would not be able to bear the load. How could you go about doing so with Azure table storage?
One option is to craft a partition key much in the same way that we did before and use the row keys as an index to the user name. This way you can retrieve all the users for a given minute using the partition key, or access a specific users index using a row key with fairly low latencies.
PartitionKey – user.minute:37324, RowKey – Alice
This creates a number of keys when taken together form a set.
Given another set (as defined by having the same PartitionKey):
You can now fairly quickly query the result for each partition key and manually loop through the two lists and accumulate the result in another structure (O(n) if you use Dictionary).
The caveat however is that you have no way of being certain that the results are consistent at any given point in time, specially if the second set (37325) is being added to as you run the set intersection but it comes close to the same kind of functionality as offered by Redis.