Note: This is a guest post by storage analyst Chris M. Evans. The views reflected below are his own.
The enterprise world revolves around the process of continuous hardware refreshes. In storage, this cycle occurs perhaps every 3-4 years, depending on the organization and their volume of data growth. Storage is refreshed because better, faster and cheaper arrays come along and the traditional vendors make refresh pricing attractive – maintenance charges after the initial period of warranty can be alarmingly expensive.
Unfortunately, most IT organizations look purely at the capital cost of refreshing their infrastructure. This isn’t really surprising; it’s the easiest part to quantify, as putting a cost (at purchase time) on what the refresh costs will be 3-4 years later is no mean feat. In any event, those costs can be rolled into the price of the new acquisitions and the vendors put on the hook for managing and helping with migrations.
This process may seem like a good plan, however, whether the transformation/migration process is performed by the customer or the vendor, ultimately the customer pays as the vendor simply incorporates the migration charge into the $/GB cost of the storage – with a potentially hidden margin added to boot.
Why do we go through this repeated cycle of deploy and migrate? There are a number of reasons (excluding the already highlighted maintenance structures). First and most important is that legacy hardware platforms were scale-up in design. This means the array itself was one large monolithic entity; expansion is limited to the performance capacity of the controllers, which were (and still are) difficult or impossible to upgrade in place. Changing controllers represents potential risk and application downtime. This sets a capacity maximum on the practical scalability of a single array. Now it’s fair to say that a single scale-up array can be designed with a minimal configuration and expanded over time, however this only applies to the highest class, most expensive enterprise arrays.
Second, in the days of disk-based arrays, hard drive capacity increased rapidly, with densities doubling around every 18 months to 2 years. Power, cooling and space attributes become an important factor in the cost of running storage, so it’s an easy sell for a vendor to come back to a customer 3 years after initial acquisition (when the array is probably at capacity) and offer a shiny new model, with double or triple the capacity at a lower $/GB and with significantly lower environmental costs. As a side issue, the I/O density of hard drives (IOPS to capacity) got to a point where moving to larger drives each refresh became a performance bottleneck, so vendors had to introduce the concept of dynamic tiering and flash caching, leading to more complexity, but that’s a story for another day.
So let’s look at the risks and issues involved with continuous data migrations caused by scale-up technology.
Cost – The cost of migrations is probably the most obvious issue, but probably least understood or calculated by IT organizations. There are plenty of estimates available online to give us ideas of what the costs really are. In 2008, an Incipient Inc (a storage virtualization company) report indicated the cost of migration to new hardware at $5000/TB. Studies by Wikibon suggest the cost of migration can be as high as 54% of the capital cost. An IBM report from 2007 highlights some of the causes of migration costs. These include the project time to execute the work (project managers, change control, technical staff to plan and move data), unplanned outages/downtime, overruns on leases (where migrations take longer than expected) and the cost of having to maintain two physical arrays on the floor (with associated costs) during the migration period. Other reports quote figures from $7K-$15K/TB for migrations and are referenced at the end of this blog. As an aside, we know technologies like Storage vMotion can help move data in certain virtual environments, but of course these are licensable features, so not always free to use.
Risk – the most obvious risks to migrations are in causing unplanned outages, however there are more; migrations may introduce performance issues (and require manual rebalancing), may reduce redundancy or DR capabilities during the data migration or require significant up leveling of operating system drivers and patches, firmware and data fabric O/S code. Anything that changes the existing configuration can potentially be a risk factor and wherever possible should be minimized or avoided.
Breaking the Cycle
Forklift upgrades are driven by a mix of financial, technical and operational concerns and the obvious question is how do we resolve these issues? One solution is to implement scale-out storage technology. In this scenario, increasing either performance or capacity is simply a case of adding more hardware resources. Efficient scale-out solutions manage the rebalancing and distribution of data to take advantage of the added hardware and connectivity. Scale-out solutions are more financially attractive as components of the hardware can be decommissioned and removed as the assets are amortized, removing the need for complex co-terminus agreements that occur with traditional systems that are upgraded during their lifetime.
This scenario is becoming ever more important as the speed of transition from MLC to 3D-MLC and 3D-TLC takes hold, evolving at a rate faster than that seen with Moore’s Law and the improvements in processor and DRAM performance. Scale-out systems that support mixed configurations allow customers to take advantage of new technology while maintaining their investment in the old. Having a single storage platform also provides other technical benefits:
Efficiency – features like global deduplication ensure data is fully optimized. Shared spare space or devices are minimised as a percentage of overall capacity. There is also less wasted space than can occur when data is spread across many small storage systems.
Performance – I/O workload can be distributed across the entire scale-out system automatically (in contrast to traditional deployments that required lots of manual rebalancing when hardware was added).
Resiliency – in flash-based systems, spreading workload across the hardware takes advantage of the endurance capabilities of all devices.
Operational – minimizing footprint means less hardware to manage from every aspect, including provisioning and maintenance.
Scale-out doesn’t solve every storage issue, but offers a compelling approach to reducing capital and operational costs in today’s 24-hour always-on IT environments. Kaminario’s K2 platform offers the benefits of both scale-up and scale-out through the use of K-nodes. A K-node acts as a resilient member of a K2 cluster. Scale up is achieved by adding more storage capacity to a K-node; scale-out is achieved by adding (or removing) K-nodes. Nodes of different capacities and media types can be mixed within the same cluster, enabling the use of new 3D-NAND while retaining investments in previous technology.
Learn more about the K2 architecture here and what it means for storage in your business.
Wikibon article on migration – http://wikibon.org/wiki/v/The_Cost_of_Storage_Array_Migration_in_2014
HDS paper from 2012 – https://www.hds.com/assets/pdf/reduce-costs-and-risks-for-data-migrations-whitepaper.pdf
Bloor Research 2011 report – http://www.bloorresearch.com/dlfile/data-migration-2011-2107.pdf
Dave Merrill (HDS Chief Economist) – https://community.hds.com/docs/DOC-1005246