Architecting for capacity efficiency: Advancing compression implementations for all-flash arrays

To lead the all-flash array market with most cost-efficient storage platform is not easy. In essence you have two major approaches of getting to the top:

  1. Make enough noise using creative marketing and sales engineering. Deal with reality (customers) later.
  2. Architect for capacity efficiency and build some hard business guarantees based on real technological capabilities

Needless to say, we picked the latter.

With Kaminario’s K2 all-flash array Gen6, we’ve taken yet another step forward in capacity efficiency, this time on the front of compression. We’ve managed to increase compression ratios by 30% on average, and now we guarantee effective capacity with a 4:1 data reduction ratio.

Ok – but how? Implementing compression in an all-flash array is no small feat as the products of several 3-letter storage companies might testify.  It is not a bolt-on feature that can be added as an afterthought, and some early architectural decisions might prevent taking advantage of the most important data reduction feature for all-flash arrays.

Kaminario’s VisionOS, the storage operating system that was built for flash, has been designed with data reduction as one of its main pillars (more on VisionOS in this blog). In-line compression has become crucial in all-flash economics, especially in environments where data deduplication is negligible or inexistent. As such, the design of K2’s compression follows some very demanding guidelines:

  1. Be compute friendly – Compression does not come for free, it takes its toll on CPU resources. The stronger the compression algorithm is, there are more CPU cycles are running compression instead of executing other system tasks. The right tradeoff has to be made between the array’s performance (IOPS, throughput latency), the required compression results and the cost of the CPU that ultimately determines the total pool of compute resources.
  2. Be modular – The compression flow within the greater scheme of how IO flows in the array (see the detailed flow in K2’s Architecture White Paper) must be abstracted such that any change to the compression engine would be transparent to the IO flow. In addition, the compression engine should not be influenced by any other component in the IO flow.
  3. Be byte-aligned – or in other words, do not fragment where you don’t have to. The result of a compressed data segment can be measured in the granularity of bytes. The question is, how is this compressed data segment stored within the array. In some implementations, that didn’t follow demanding-guideline-#2, the underlying operating system of the array is incapable of storing data in a granularity that is less than 1KB. This means the entire capacity efficiency is affected by storing inflated pieces of data instead of storing it byte-aligned.

By following these guidelines, Kaminario’s software-defined approach made it extremely easy to take advantage of hardware innovation and update the K2’s compression engine and offload it from the main CPU to a hardware accelerated processing unit. Let’s examine what has changed, using the same set of guidelines described above:

  1. Is it compute friendly? The hardware compression unit is connected via a front loading, hot-swappable PCIe slot within a K2 storage controller. With the compression algorithm offloaded to the compression unit, we were able to use DEFLATE, a more storage efficient algorithm than LZ4 that was used in previous K2 generations. At a high level, the DEFLATE algorithm is a combination of LZ77 and Huffman coding. It means it compresses better since it compresses at the bit level rather than LZ4’s byte outputs, and the Huffman coding takes advantage of some bytes being more common than others. Although DEFLATE is more CPU demanding than LZ4, there is no performance penalty since it runs on a dedicated processing unit.
  2. Is it modular? The compression unit’s API were loaded to the existing abstraction layer of the compression module – and that’s it! No changes to any other sub-component whatsoever.
  3. Is it byte-aligned? The output of DEFLATE is in bits and then the entire compressed segment is rounded to the nearest byte. So yes, compressed data is still stored in a byte-aligned manner.

Improving K2’s compression capabilities using off-the-shelf components – hardware and algorithms – with minimal effort is a great example of architecting for capacity efficiency. A software-defined approach and building a storage architecture which is modular and robust are mandatory for leading with cost-efficiency in the all-flash array market.

New Call-to-action