Share this post

December 10, 2015

The Top 10 Considerations for Data Reduction

Priyadarshi Prasad - Atlantis

The unexpected rate of increase of growth of data is increasingly forcing customers to make a choice between managing their IT the traditional (and often expensive) way and looking at newer (and often smarter) alternatives that deliver more at lower costs using data reduction technologies. While every alternative generally delivers some sort of efficiency, customers might be surprised and dismayed when their chosen technology ends up delivering lower data reduction than they anticipated. A lower data reduction often forces IT leaders to go back to their businesses and ask for more budgets – an unwelcome conversation that no one looks forward to having. How can we avoid this and set the right expectations up front?
Understanding the available data reduction technologies, with their strengths and weaknesses is a good place to start with. But perhaps as important is to understand your own data set. E.g. hoping for a high level of data reduction on an already compressed file format is just that, a hope. On the other hand, there is a widespread misconception that introducing data reduction technology will necessarily, or at least in most cases, impact application performance. This used to be true in the early days of what I term as phase 1 of data reduction, where the general guidance seemed to be to turn off data reduction when CPU utilization went above a certain threshold (typically 30%). Thankfully it is possible to actually improve performance with data reduction technologies these days!
wp_data_reduction-satin.pngGiven Atlantis’ leadership in using data reduction technologies to both improve efficiency AND accelerate performance, we thought it would be useful to put together a holistic white paper on the key data reduction technologies. You can find the paper here. The paper has been downloaded over 1000 times in under three weeks!
Here is an excerpt from the white paper that talks about the top ten questions you should ask yourself when choosing the right data reduction technology: 
  1. Is my data set de-duplicable?
  2. Is my data set compressible? If so, is compression already turned on at the application layer?
  3. What's the granularity (page size) of deduplication? Smaller page sizes like 4K are likely to yield significantly better savings vs. 8K or 16K.
  4. Does the page size always stay granular or does it become fat (e.g. going to 32K or greater)?
  5. Is data reduction always inline or does it get into post-process mode sometimes?
  6. Is I/O offload a benefit of that particular data reduction technology
  7. Does the data reduction technology improve my overall latency
  8. Does the data reduction technology improve the life of my solid state storage
  9. Will the data reduction technology make my storage management more efficient?
  10. Do storage data services benefit from the underlying data reduction technology?
 Some of the questions above might seem counter-intuitive. Take (5) for example. Why is it generally beneficial for data reduction to be done inline completely, and even more so if you are dealing with solid state storage (question 8 above)? The answer lies in a fundamental characteristic of SSDs – they wear out with writes. Any process that introduces or increases write amplification is bad for them. Now compare write amplification inherent in a post-process data reduction methodology:

The Advantage

This allows Atlantis to offer a true All-Flash hyperconverged appliance (HyperScale) at a price that beats other disk or hybrid hyper-converged appliances in the industry on both performance (of course) and price.
Similarly, question 6 above – just how much could be the I/O offload benefits of data reduction. Well, the graph below shows a real example of I/O offload in action with Atlantis solution.
This massive I/O offload is a direct result of inline data reduction (and in-memory metadata manipulation) that is built-in to the Atlantis solution.

The Benefit

Customers can take their old, slow, creaky storage infrastructure and use Atlantis to not only get more capacity out of it but also more performance.
There are many more considerations, all covered in detail in the paper. Please take a look and let us know of what you think at:
@AtlantisSDS, or @Priyadarshi_Pd
Current rating: 3 (3 ratings)