Note: This is a guest post by storage analyst Chris M. Evans. The views reflected below are his own.
At a superficial level, it’s easy to assume that all solid-state disks (SSDs) are the same. In reality, that assumption couldn’t be further from the truth. SSDs continue to evolve and are dividing into a broad product category, in much the same way that hard disk drives have done over the last 10-20 years. The result of this is that devices on the market vary in terms of price, capacity, endurance and performance. Choosing the right SSD to use in all-flash arrays becomes a mix of all these factors, plus one vitally important one; the way in which data is written to and from the device.
Let’s start by establishing some definitions:
Endurance – this is a measure of the longevity of an SSD. All solid-state disks have a finite write lifetime and will eventually wear out. In the SSD hierarchy, single-level cell (SLC) devices have more endurance than multi-level cell (MLC), which in turn has more endurance than triple-level-cell (TLC). Even within these categories, levels of endurance vary. The typical measure of endurance is DWPD or Device Writes Per Day and indicates how many complete full writes the disk can manage over a typical 5-year lifetime.
Capacity – drive capacities vary from a few hundred gigabytes to the latest drives offering multi-terabytes. Large drive capacities have been delivered through the use of MLC and TLC technology. Initially flash drives stored a single bit in each “cell”, based on measuring the cell voltage (SLC). MLC devices store two bits by measuring four voltage states (00, 01, 10 and 11) and TLC devices store three bits (000 to 111) using eight states. At the same time, the manufacturing process has shrunk year on year, allowing drive densities to be increased.
Performance – drives vary significantly in performance, with SLC & MLC drives generally outperforming TLC drives. However, compared to hard drives, all SSDs have much greater throughput (MB/s) and bandwidth (IOPS) than HDDs, especially for random workloads.
Reducing the manufacturing process dimension becomes progressively harder to achieve, so vendors have looked for other alternatives, one of which is 3D or V-NAND (depending on the brand name). This technology drills down into the silicon substrate to effectively create a number of layers of cells – currently at either 32 or 48 layers. In order to deliver reliable layering, the manufacturing process has been increased (to around 40-60nm) however the overall effect is to deliver a higher density than non-3D devices and allows room to reduce the process dimensions over time.
There are fundamental properties of NAND flash and the drive that have to be taken into consideration when writing data. Specifically, these relate to write amplification – the additional amount of physical writes that need to be performed to commit data to NAND on the device itself. Write amplification occurs because the drive writes data to NAND in pages (typically 4KB) but has to erase an entire collection of pages (a block, typically 256KB) before that space can be re-used. This means shuffling data around as part of garbage collection and wear leveling processes.
Realizing Cost Savings
When flash was first introduced into primary arrays, it was deployed simply as a replacement for traditional hard drives. Although this solution works, high endurance drives were required as there were no write optimizing features built into the design. Modern purpose-built arrays are designed to minimize physical writes to SSD. Techniques used to achieve this include thin provisioning, de-duplication and data compression, all of which reduce the amount of physical data written compared to the logical amount of data stored in the system. (As a side note, de-dupe and compression needs to be done inline to realize the write reductions, a subject for another blog post).
In addition to the space saving techniques, Kaminario’s K2 platform employs a log structured file architecture that writes full stripe updates as part of the proprietary K-RAID design. Log-structured file systems write data sequentially, so using this technique minimizes the impact to SSD endurance of overwriting data in place. This, in turn, minimizes the amount of write amplification experienced through garbage collection and wear leveling. More details of these features and the K2 architecture can be found online.
With an efficient flash management architecture, there is less dependency on having a high DWPD capability and no need to pay for extreme endurance in flash devices. This means it is possible to employ lower endurance drives without increasing the risk of device failures. The capabilities of SLC, MLC and TLC drives are also directly related to cost, with 3D-NAND TLC drives offering the most attractive price point and enables Kaminario K2 systems to reach a price point of under $1/GB.
It’s worth noting here that not every flash array architecture will be able to take advantage of larger, lower cost drives; exploiting high drive capacities requires efficient metadata management, balancing the needs to keep as much metadata in memory as possible, without having to add gigabytes of expensive DRAM (again, a subject for another post).
Flash for All
It would be naïve to assume that one type of flash drive would be used to deliver the requirements of every all-flash array. The displacement of hard drives from the data centre (at least for primary data) isn’t going to happen unless the right cost metric can be achieved, irrespective of the high levels of performance flash offers. In fact, general production workloads don’t need the high I/O density (IOPS/GB) provided by the first all-flash storage solutions and due to their cost these solutions were targeted at only the most demanding workloads that could justify the excessive price.
Kaminario K2 has moved to 3D-TLC NAND because the technology offers a price point that takes the discussion out of using flash for all primary data. Existing systems can take advantage of 3D-TLC immediately, as the K2 architecture allows node configurations to be mixed within a single cluster. At just under $1/GB, why wait to use this new technology?