Data Reduction – What Does it Mean to Your Application?

A few weeks ago, Kaminario’s Doron Tal blogged about The Design Principles Behind Our K2 v5, which covered the several important data reduction features of the new K2 from a storage perspective. Today, we’d like to focus on the data reduction capabilities of K2 from an applications perspective, and explain in more detail its significance to you and your application.

The net effect of the data reduction features is easily illustrated in the following diagram. Data reduction allows you to keep more data with the same amount of physical storage. In the example below, you’ll notice that without data reduction the array can only hold 15TB of data. However, once you add in data reduction the same array can hold 90TB of user capacity.



Having larger available storage capacity allows you to consolidate more TBs of data on Kaminario. Since we don’t license the data reduction features, having more useable capacity reduces the $/GB (array cost / GB capacity) which is improved when capacity is increased. In addition to $/GB storage cost, Kaminario’s all software-inclusive pricing – which means that we don’t license or charge extra for software (including data reduction), allows consolidation of workloads on K2, which can help reduce both server software and hardware licensing by reducing CPU cores. While this helps cut down on costs, we also ensure consistent performance which is a critical end-user need that most of our competitors can’t deliver.

There are four features in Kaminario that are related to the data reduction features: Compression, Deduplication, Thin Provisioning and Efficient Snapshots. The last two are not 100% data reduction features; however this post will also explain why those features, in addition to Kaminario’s guaranteed capacity, are related.

Here’s a quick look:



The goal of compression is to reduce the data footprint. This happens by decreasing the bits required to store the data. Compression in Kaminario is done at the byte level and is focused on the speed for compressing and decompressing data. With Kaminario our compression is always turned on, with no penalty for read or writes operations. Data reduction from compression will vary based on the data, and some datasets are more compression friendly than others. Typically, we’ve seen data reduction between 2:1 to 5:1 just from compression.

The following example is taken from a database implementation on the Kaminario K2 all-flash array. As seen in the screenshot from the K2 UI: 60TB worth of volumes was thinly provisioned and 6.4TB of data was copied to the K2. The 6.4TB after compression consumed only 1.7TB of physical capacity. That’s an impressive 3.8:1 data reduction ratio!


Unlike compression that reduces the amount of bits required to store the data, deduplication eliminates the number of times duplicate data blocks are kept. Only one copy of a block is kept, while duplicate copies are eliminated (by virtue of intelligent metadata handling).  Deduplication in the K2 is done on a 4KB block size. Deduplication can save a huge amount of storage capacity for applications that have similar data, or not reduce capacity at all. One example of a “dedup friendly” application is VDI.  While each user in a VDI environment has its own OS, the majority of the data is similar for all users.

On the other hand, many databases (such as Oracle) will not benefit from deduplication since each block is unique. Since deduplication has overhead on the array’s performance, Kaminario offers selective deduplication per LUN. On the same array, you can select to dedup some volumes and not dedup others, which is another unique Kaminario offering.

Below is an example extracted from our cloud-based call home report of a large VMware environment that consists of more than 5000 VMs. The 38.89TB of data was reduced to just 2.10 TB. This is an amazing 18.6:1 data reduction ratio.


Thin Provisioning

Thin provisioning enables users to setup volumes with huge capacities, higher than the initial installed capacity that the system has.  All base volumes, snapshots and replicas (writable snapshots) within the Kaminario K2 are thin provisioned. Although thin provisioning is not a data reduction feature, it does allow users to reduce the amount of data required for their initial purchase. Users can pay for storage capacity as their requirements grow and start with the required volume sizes from the get-go. In other words, it allows for smart storage provisioning for the long term. This is tied nicely to Kaminario’s “non-disruptive everything” that also includes online capacity expansion. Users can expand with additional capacity as the storage usage grows but plan and configure the storage for the maximum requirements. 

Efficient Snapshots

Finally, the last feature in the “data reduction” family is Kaminario’s snapshots. Snapshots provide a point of time copy of the data and maintain changes to blocks since the snapshot creation. There are several use cases for snapshots: backup/recovery, test/dev, etc. Kaminario’s snapshots can provide huge savings in the physical storage requirements. Think of a 5TB dataset that needs to be duplicated for QA purposes. Without snapshots, an additional 5TB of data will be consumed. Kaminario’s snapshots will not consume additional capacity until changes to the base volumes are made.

Obviously, there is more value for snapshots than data reduction.  For example, they are instantly created and they provide similar performance as the base volumes deliver. Kaminario uses redirect-on-write snapshots to ensure no performance overhead.

Guaranteed Capacity

We’ve covered the major features that allow Kaminario’s K2 v5 storage to offer superior data reduction, and we’d like to touch on one last one. Kaminario is the only storage vendor that actually guarantees the capacity you will be able to store in the array. The capacity guarantee is straightforward and simple; it is the minimum amount of storage that you will be able to keep in the array. We don’t charge you if our data reduction works better and you can keep more data on the K2.

How does our capacity guarantee work?  It means that if you are unable to store the capacity we guaranteed, we will bring you additional hardware and expand the system on our expense.

We have reviewed four of Kaminario’s data reduction features: Compression, Deduplication, Thin Provisioning and Efficient Snapshots. These features are a true cost saving and allow an organization to consolidate several workloads on the same Kaminario array.

For more details on the Kaminario K2 please check out our K2 v5 architecture white paper or data sheet.


New Call-to-action