Monday 12 December 2011

Veeam Vs. vRanger Part 3: Data reduction techniques


With the typical organisation experiencing substantial year on year growth in storage requirements ( IDC Research in 2009 suggests large enterprises storage requirements are increasing by 60% per year) and server virtualisation pushing storage demands to new highs it is little wonder than technologies like deduplication and compression focused on reducing storage requirements have been in the spotlight in recent years. As many of these technologies come with a performance trade off, they have yet to have a huge impact on primary storage, but have gained considerable traction in less IO intensive storage including backup data storage. As such, most major players in the backup software space have been quick to adopt these technologies, and Veeam and vRanger are no exception. In this post, I will compare the two products differing approaches to reducing the amount of storage required to hold backup data.

Veeam Backup & Recovery comes as standard with integrated in-line block level deduplication and compression, by comparison, vRanger uses a combination of compression and Active Block Mapping (ABM) to ensure they only backup blocks being actively used by the disk, but do not have dedupe integrated into the product. Quest / vRanger’s answer to Veeams dedupe has come in their latest release, vRanger® 5.3, which enables full integration with NetVault® SmartDisk, a software based deduplication engine originally developed to work with NetVault Backup, Quest’s enterprise level backup application. In addition to this, both products can also integrate with 3rd party dedupe appliances like the HP StoreOnce, and Fujitsu’s ETERNUS CS800. So which approach is better?  

Veeam…
Veeam’s main differentiator in the data reduction stakes is their integrated deduplication, it is certainly a big selling point, and enables them to market their product as a cost effective all in one solution. It does however come under fire from the competition who have indirectly referred to it as primitive – so what are its limitations?

Veeam’s dedupe is in-line, fixed length block level deduplication, block sizes are configurable and which is appropriate depends on the target storage, Local - 1MB; LAN – 512k; WAN - 256k. Veeam claim dedupe ratios in excess of 10x, and a whitepaper on their website contains some decent dedupe / compression ratios from internal testing by Veeam. On a 60GB backup job with 6 VMs their compression / dedupe combined produced a ‘combined technology saving’ of just over 80%. Interestingly, compression alone on the same job saved around 62% (which is very similar to the compression ratios in vRanger,) so in this example, dedupe is only reducing data by an extra 20% or so beyond this. I’m not belittling this because it is still offers a substantial reduction in terms of the amount of backup storage required (about 50% over compression alone,) but it’s just not as dramatic as I had expected before I started researching the numbers, and is certainly not as substantial as the ratios that can be obtained using dedupe appliances. 

The reason for this difference in performance is twofold. Firstly, the block sizes used by Veeam’s dedupe are larger than those used in dedupe appliances. Veeam have done this deliberately as larger blocks mean less resources are required to run the dedupe process and backup / restore jobs run quicker, but it does reduce the effectiveness of dedupe. Secondly, because Veeam uses fixed length blocks, when the data in a file is moved, for example when adding some additional text into a Word document, all of the subsequent blocks in the file will be rewritten, and are likely to be considered as different from those in the original file. 

Another potential limitation of Veeam’s deduplication is that is can only dedupe against VM’s in the same backup job. This is done very much by design as explained in these Veeam dedupe FAQ’s, but the drawback is that to achieve Global deduplication (dedupe across all of your VM’s,) you would need to backup all of your VM’s in the same backup job. For smaller installations, this may not be problematic, but for customers with 10’s or 100’s of VM’s to backup, it is not practical as VM’s can only be backed up to the same job serially.

Despite these limitations, I believe that for smaller installations, and companies looking for an all in one cost effective VM backup solution, it is a great feature to have included in the package. For larger installations of 10’s or 100’s of VM’s, it can become more cost effective to consider a 3rd party appliance. Veeam are certainly not blind to this, and have partnered some of the major players in this space to test the integration of their product with such appliances - they have produced a couple of whitepapers on the subject; one with Exagrid, and the other with HP StoreOnce, and claim that they can achieve data reduction of up to 95% when using the approach.

It’s worth noting that Veeam’s integrated dedupe doesn’t become entirely redundant / still has some value when using 3rd party dedupe appliances. While compression should be turned off as dedupe doesn’t work as well with compressed data, Veeam recommend leaving dedupe on as both deduplication techniques can provide complementary benefits. As Veeam’s inline deduplication takes place during the backup job either on the client machine or dedicated Veeam backup server, it can reduce the amount of data sent over the network. 

vRanger…
While vRanger does offer compression of backup data, which can achieve data reduction in the range of 50-70%, they do not have deduplication integrated into their product. Until the latest release offering integration with NetVault’s SmartDisk, their only answer for clients who wanted dedupe was integration with 3rd party appliances (similarly to Veeam, Quest have tested the integration of vRanger with the HP StoreOnce appliance.)

SmartDisk is a software based byte-level, variable-block, post process deduplication product that was originally developed for use in NetVault Backup, and now available for use with vRanger. Quest software claim that SmartDisk can reduce backup data by up to 90%, which compares well with dedupe appliances, and is potentially less expensive. Smaller and variable block sizes should mean better dedupe ratios can be achieved with SmartDisk than by using Veeams integrated dedupe. Post process dedupe means that processing doesn’t need to be carried out on the host or dedicated backup server so doesn’t have as much of an impact on backup windows, but may lead to more network traffic. So arguably SmartDisk is a more advanced dedupe engine than Veeams integrated dedupe, but it doesn’t come for free, and does still have some limitations.

While the licencing for SmartDisk is cheaper than purchasing a dedupe appliance, at this stage, Quest recommend running it on a dedicated physical server (server specs on page 47,) and disk space is also required to store the data, which adds at least a few thousand dollars to the price of the solution. In addition to this, SmartDisk currently has a limit of 15TB of unique data per SmartDisk instance, so if you have more data than this, you will need to deploy multiple instances, and data would not be deduplicated between these.

One of vRangers’ biggest differentiators over Veeam is their patented technology Active Block Mapping (ABM) which ensures that data from deleted files inside the OS are not backed up. Quest claim that ABM can reduce backup image sizes by up to 25% and improves backup performance by an average of 33%.

Active Block Mapping works by reading entries from the Master File Table (MFT) from outside the VM to establish where the deleted data, active data and while space (zeroed blocks) are located in the NTFS file system. The process takes less than a second and has no overhead, once this information is received, vRanger is able create a map of only the active data, so deleted and zeroed blocks can be skipped. (a more details explanation of ABM can be found in this tech target article, or by watching this video from Quest) ABM is a very clever technology, and on the surface, offers similar data reduction on average as Veeams integrated dedupe, albeit in a very different way.

Conclusions…
Comparing the integrated data reduction tools, on the surface Veeam’s Compression and Dedupe and vRanger’s Compression and Active Block Mapping, appear to achieve a similar overall result in terms of data reduction. Compression in both products seems fairly equivalent and will produce somewhere between 40-70% reduction depending on the type of data being stored. Veeam’s dedupe when used in tandem with compression manages to save approximately an additional 20% according to their white paper, and Quest claim ABM has the potential to reduce backup storage footprints by as much as 25%. It goes without saying that these are very rough numbers, and which product would be more effective at reducing backup data will depend on your environment and how the packages were setup. Despite many hours of research I have been unable to find any independent tests that compare the effectiveness of each strategy – if data reduction is a priority for you, I would recommend trialling both products, and putting them to the test in your environment (If you do so, please share the results of this!)

For larger users, if you want to move beyond the integrated data reduction tools, both products have the ability to integrate with 3rd party dedupe appliances, and recommend that when using these devices, compression be turned off. In these circumstances Veeams dedupe can still offer some value by reducing network traffic but its effectiveness on data reduction will become redundant. vRangers ABM can also work in tandem with a 3rd party appliance to further reduce the backup storage footprint, although I suspect it’s benefits may be lessened as some of the whitespace and deleted data could be deduped. With their integration with SmartDisk, vRanger can also offer an alternative to using a 3rd Party appliance that is potentially more cost effective.


Update: 14/12/2011

Anton Gostev (@gostev) from Veeam sent the following tweet to @TDataoz (the company I work for, that 'tweeted' a link to this post) in response to this post:

@tdataoz 3 paragraphs of ABM praising and not a single mention of Veeam's unique capability of skipping swap file blocks? Doesn't smell good

Thanks for pointing this out, indeed it seems I missed a new feature in Veeam Backup & Replication v6 that can also assist in reducing the backup storage footprint. Veeam now include an option to skip the page file within virtual machines during both backup and replication jobs.

The page file (sometimes called a swap file) is 'a component of an operating system that provides virtual memory for the system. Recently used pages of memory are swapped out to this area on the disk to make room in physical memory (RAM) for newer memory pages.' (Introduction to VMware vSphere, Page 36) Skipping the page file will improve the efficiency of backup / replication jobs, and save some space in backup storage. A brief overview of this new feature is provided in Veeams Blog.

To understand the potential benefits in terms of data reduction, we need to calculate the size of the swap file (.vswp) using the formula 'configured memory – memory reservation = swap file. For example a virtual machine configured with 2GB and a 1GB memory reservation will have a 1GB swap file.' (Impact of host local VM swap on HA and DRS)

My apologies for missing this in my first pass, if I have missed anything else that may have an impact on data reduction, or you have any other feedback, please feel free to post a comment below.

No comments:

Post a Comment