September 28, 2015
Putting a leash on storage expansion
Understanding the real impact of deduplication is critical for any IT professional in the modern datacenter. This is true from both performance and capacity perspectives. Choosing the right storage solution is now a longer-term commitment. Choose wisely by understanding the impact of a key data service functionality such as deduplication.
In the past deduplication, or dedupe, was a feature highly priced and only available from a small number of vendors. However, as the IT community began to see the explosion of data created and used by their machines, business consumers and applications, dedupe went from a nice to have feature to a critical data service. It was critical not only for data management but also for storage cost mitigation. These days, however, the feature is provided by both hardware and software vendors alike. The vendors each have their own methods and levels of success. Making it seem that deduplication is just a generic check box from a feature perspective, which applies to all modern storage solutions. However, when you really look under the covers this isn’t true at all. There is a lot fluff, FUD and downplay around deduplication in the industry, especially from those vendors that don’t have a strong story around this feature set. In this blog post I will articulate why de-duplication is important for your modern datacenter and give a deduplication technology deep dive.
Currently, deduplication effectiveness boils down to three variables:
What amount of data is analyzed at a specific time
When that analysis is done
Where it is done in the IO stream
How big is your block?
As dedupe hit the data center, the amount of data analyzed at a time was defined largely by two things: individual data block size and the amount of data processing possible by available resources in the system. Currently, the average written block size is 1024 bytes or 1 kilobytes (KB), and the average FC payload as 2048 bytes, or 2KB. By comparison traditional storage arrays, like EMC, Netapp and IBM, use an average block size of 16KB. However, as advancements in this technology have moved forward so too has the variance in the amount of data analyzed. These days the block size ranges from 4 to 128KB depending on vendor and whether it is software or hardware doing the de-duplication work. For disk read/write I/O Atlantis’ deduplication filesystem (DedupeFS) uses a block size of 4KB. For comparison EMC uses an average block size of 16KB, while CommVault’s Simpana product uses an average block size of 32KB.
This is all relevant for one reason and one reason only!
The smaller the block size the more scrutiny can be placed on the actual data analyzed. In other words the smaller the block size, the more critical the analysis and the better the dedupe ratio or percentage. However, this also means that the frequency at which the data is analyzed is also increased. Which leads us to the second factor in deduplication, where the work is done.
There are a lot of mouse traps out there.
Many vendors have evolved their product to accommodate not only an increase in IO workload but also the added workload of dedupe. On traditional storage vendors in particular there have been a lot changes in recent year. EMC, for example, suffered greatly initially on their smaller platforms focused on SMB/E customers. When certain features like deduplication was enabled, performance tanked. I personally heard it described as the half hit. Meaning when you enabled the dedupe feature performance was cut in half overall on the array. To accommodate this, EMC redesigned the platform starting with the VNX series. The storage arrays combined the available cache on the storage processors with dedicated solid state drives (SSDs), or flash drives, and named it FAST Cache. Netapp as another example, has always leveraged non-volatile RAM (NVRAM) in their platforms. However, in recent iterations the NVRAM was augmented with PCIe based flash cards called Flash Cache. Most recently this was augmented again with the use of flash drives in the array. What this means is that before the data from the application is even processed it has to transfer across the SAN or NAS first. Now while the “return” bit is sent to the application once it reaches the storage processor, the actual deduplication work still needs to be done. This eats into the available CPU and cache resources needed to process IO requests in general. This is equally true if you are leveraging a purely software based deduplication solution. Simply put, overall performance and responsiveness is reduced. And in some cases, it's reduced by 50% or more.
By comparison, Atlantis USX and HyperScale don't suffer from these problems. Instead Atlantis immediately processes the read/write requests by leveraging the RAM and/or local flash drives in the hypervisor to perform dedupe and compression.
Why is this a big deal? The difference is literally sub-millisecond response times. Also, consider that processing read/write IO requests are no faster than when between the application and the local CPU and RAM on the motherboard. It’s the IO autobahn!! So, by leveraging these local resources the deduplication work is done as quickly as possible, despite a potential increase in the number of blocks processed. So this covers the how and the where, leaving the when.
Don’t wait. DO IT NOW!
Within the IO flow there are two points at which dedupe is done, inline and post process. Inline deduplication is done prior to the read/write request hitting the actual storage. With this method disk utilization is minimized thus greatly improving TCO and ROI. Not to mention reduction of any future additional storage purchases and associated operating costs. This is a potentially huge cost savings for the business. With that said the actual hardware and software resources used are sometimes increased and the effectiveness of the actual deduplication reduced. Particularly if this is done by a traditional storage array where these types of resources are minimal. Even more so, when compared with those resources available in the hypervisors the storage array is attached too. However, with Atlantis technology and methodologies the resources used for this type of data processing are minimized and used as efficiently as possible. The by-product of this is 10x greater performance overall and an expected 70 to 90% deduplication ratio, depending on data type. The other typical point in the data flow where dedupe is done is after the data hits the hard drive. This is more commonly known as post-process or sometimes batch processing. Some vendors actually allow the user to select which method to utilize. In most cases though, post-processing can have an adverse effect to actual data processing as mentioned earlier. There is even some debate that this type of methodology can actually reduce the MTB of individual hard drives, regardless of disk type, especially those dedicated to the deduplication process.
Landing the plane.
At the end of the day almost every environment can benefit from the use of deduplication. The larger the user base and, or the number of applications utilized by an organization the more critical dedupe becomes. Equally important is the time it takes to do that work, the resources used, and the point in the data flow where it’s performed. For my money the more efficient and effective this process the greater the benefit. I would even be willing to bet hard currency that most other IT professionals would say the same.
Current rating: 4.1 (10 ratings)