Caching is a tricky business. The idea sounds simple: if you’ve used a piece of data recently, keep it around so that you can use it again. But the real world is much more complicated than that, and the simplest ideas often lead you down a garden path. Pick the wrong approach to a cache for your applications, and you can easily wind up with worse performance than you started with. That’s why we took a more sophisticated, flexible approach.
Applications use storage in different ways, access it in different patterns, expect different levels of performance. Some workloads are strictly sequential, writing large blocks of data out one after another. These guys don’t need to hold on to a cache page any longer than it takes to get the blocks written out to stable storage. Others skip around the disk, merrily reading and writing hither and yon. They’re tough: if there isn’t a pattern to the accesses, the best way to benefit is to have a lot of cache and hope that you hit some of the access points. Others are more predictable, fitting within a nice working set of the disk, only slowly migrating data in and out of cache.
Mixing up multiple workloads on multiple volumes within a single cache makes it really very difficult to get reasonable benefits. A backup process, for example, writing out large sequential blocks will stomp all over everything in its path, while a database or a mail server will suffer greatly if it’s forced to share space with other applications. Good, sophisticated caching algorithms go a long way towards mitigating the performance impacts of shared workloads, by doing things like detecting and caching metadata regions, keeping multiple levels of cache, aging cache entries, and myriad other approaches, but the best approach is the least sophisticated: if you’ve got to achieve deterministic performance levels, don’t share your cache.
We recognized early on that if an appliance has only a single shared cache, it’s going to have to waste a lot of time and space trying to guess what the application is doing. Instead, we decided to design tools that would give the administrator the ability to tune CloudArray to fit their applications. Policies within our management framework describe different levels of performance and capacity, ranging from SSD to SATA, from sharing terabytes across multiple applications to dedicating a single local disk to a single logical volume. Within a single cache, we can focus on achieving the highest performance required by that application or application set, without having to worry about how the applications interact.
If you are working with a storage appliance that doesn’t have a flexible policy engine, fixing cache performance problems is either about adding capacity or adding appliances: putting in an appliance for each application, or putting in enough disk so that it doesn’t matter. With a smarter approach like CloudArray’s policy engine, you can give your backup workload one cache, your archives another, and drop your databases on SSD. And if one of your caches is underperforming, you can adjust it on the fly. It’s about flexibility. It’s about efficiency. It’s about performance.