Monday 12 December 2011

Veeam Vs. vRanger Part 3: Data reduction techniques


With the typical organisation experiencing substantial year on year growth in storage requirements ( IDC Research in 2009 suggests large enterprises storage requirements are increasing by 60% per year) and server virtualisation pushing storage demands to new highs it is little wonder than technologies like deduplication and compression focused on reducing storage requirements have been in the spotlight in recent years. As many of these technologies come with a performance trade off, they have yet to have a huge impact on primary storage, but have gained considerable traction in less IO intensive storage including backup data storage. As such, most major players in the backup software space have been quick to adopt these technologies, and Veeam and vRanger are no exception. In this post, I will compare the two products differing approaches to reducing the amount of storage required to hold backup data.

Veeam Backup & Recovery comes as standard with integrated in-line block level deduplication and compression, by comparison, vRanger uses a combination of compression and Active Block Mapping (ABM) to ensure they only backup blocks being actively used by the disk, but do not have dedupe integrated into the product. Quest / vRanger’s answer to Veeams dedupe has come in their latest release, vRanger® 5.3, which enables full integration with NetVault® SmartDisk, a software based deduplication engine originally developed to work with NetVault Backup, Quest’s enterprise level backup application. In addition to this, both products can also integrate with 3rd party dedupe appliances like the HP StoreOnce, and Fujitsu’s ETERNUS CS800. So which approach is better?  

Veeam…
Veeam’s main differentiator in the data reduction stakes is their integrated deduplication, it is certainly a big selling point, and enables them to market their product as a cost effective all in one solution. It does however come under fire from the competition who have indirectly referred to it as primitive – so what are its limitations?

Veeam’s dedupe is in-line, fixed length block level deduplication, block sizes are configurable and which is appropriate depends on the target storage, Local - 1MB; LAN – 512k; WAN - 256k. Veeam claim dedupe ratios in excess of 10x, and a whitepaper on their website contains some decent dedupe / compression ratios from internal testing by Veeam. On a 60GB backup job with 6 VMs their compression / dedupe combined produced a ‘combined technology saving’ of just over 80%. Interestingly, compression alone on the same job saved around 62% (which is very similar to the compression ratios in vRanger,) so in this example, dedupe is only reducing data by an extra 20% or so beyond this. I’m not belittling this because it is still offers a substantial reduction in terms of the amount of backup storage required (about 50% over compression alone,) but it’s just not as dramatic as I had expected before I started researching the numbers, and is certainly not as substantial as the ratios that can be obtained using dedupe appliances. 

The reason for this difference in performance is twofold. Firstly, the block sizes used by Veeam’s dedupe are larger than those used in dedupe appliances. Veeam have done this deliberately as larger blocks mean less resources are required to run the dedupe process and backup / restore jobs run quicker, but it does reduce the effectiveness of dedupe. Secondly, because Veeam uses fixed length blocks, when the data in a file is moved, for example when adding some additional text into a Word document, all of the subsequent blocks in the file will be rewritten, and are likely to be considered as different from those in the original file. 

Another potential limitation of Veeam’s deduplication is that is can only dedupe against VM’s in the same backup job. This is done very much by design as explained in these Veeam dedupe FAQ’s, but the drawback is that to achieve Global deduplication (dedupe across all of your VM’s,) you would need to backup all of your VM’s in the same backup job. For smaller installations, this may not be problematic, but for customers with 10’s or 100’s of VM’s to backup, it is not practical as VM’s can only be backed up to the same job serially.

Despite these limitations, I believe that for smaller installations, and companies looking for an all in one cost effective VM backup solution, it is a great feature to have included in the package. For larger installations of 10’s or 100’s of VM’s, it can become more cost effective to consider a 3rd party appliance. Veeam are certainly not blind to this, and have partnered some of the major players in this space to test the integration of their product with such appliances - they have produced a couple of whitepapers on the subject; one with Exagrid, and the other with HP StoreOnce, and claim that they can achieve data reduction of up to 95% when using the approach.

It’s worth noting that Veeam’s integrated dedupe doesn’t become entirely redundant / still has some value when using 3rd party dedupe appliances. While compression should be turned off as dedupe doesn’t work as well with compressed data, Veeam recommend leaving dedupe on as both deduplication techniques can provide complementary benefits. As Veeam’s inline deduplication takes place during the backup job either on the client machine or dedicated Veeam backup server, it can reduce the amount of data sent over the network. 

vRanger…
While vRanger does offer compression of backup data, which can achieve data reduction in the range of 50-70%, they do not have deduplication integrated into their product. Until the latest release offering integration with NetVault’s SmartDisk, their only answer for clients who wanted dedupe was integration with 3rd party appliances (similarly to Veeam, Quest have tested the integration of vRanger with the HP StoreOnce appliance.)

SmartDisk is a software based byte-level, variable-block, post process deduplication product that was originally developed for use in NetVault Backup, and now available for use with vRanger. Quest software claim that SmartDisk can reduce backup data by up to 90%, which compares well with dedupe appliances, and is potentially less expensive. Smaller and variable block sizes should mean better dedupe ratios can be achieved with SmartDisk than by using Veeams integrated dedupe. Post process dedupe means that processing doesn’t need to be carried out on the host or dedicated backup server so doesn’t have as much of an impact on backup windows, but may lead to more network traffic. So arguably SmartDisk is a more advanced dedupe engine than Veeams integrated dedupe, but it doesn’t come for free, and does still have some limitations.

While the licencing for SmartDisk is cheaper than purchasing a dedupe appliance, at this stage, Quest recommend running it on a dedicated physical server (server specs on page 47,) and disk space is also required to store the data, which adds at least a few thousand dollars to the price of the solution. In addition to this, SmartDisk currently has a limit of 15TB of unique data per SmartDisk instance, so if you have more data than this, you will need to deploy multiple instances, and data would not be deduplicated between these.

One of vRangers’ biggest differentiators over Veeam is their patented technology Active Block Mapping (ABM) which ensures that data from deleted files inside the OS are not backed up. Quest claim that ABM can reduce backup image sizes by up to 25% and improves backup performance by an average of 33%.

Active Block Mapping works by reading entries from the Master File Table (MFT) from outside the VM to establish where the deleted data, active data and while space (zeroed blocks) are located in the NTFS file system. The process takes less than a second and has no overhead, once this information is received, vRanger is able create a map of only the active data, so deleted and zeroed blocks can be skipped. (a more details explanation of ABM can be found in this tech target article, or by watching this video from Quest) ABM is a very clever technology, and on the surface, offers similar data reduction on average as Veeams integrated dedupe, albeit in a very different way.

Conclusions…
Comparing the integrated data reduction tools, on the surface Veeam’s Compression and Dedupe and vRanger’s Compression and Active Block Mapping, appear to achieve a similar overall result in terms of data reduction. Compression in both products seems fairly equivalent and will produce somewhere between 40-70% reduction depending on the type of data being stored. Veeam’s dedupe when used in tandem with compression manages to save approximately an additional 20% according to their white paper, and Quest claim ABM has the potential to reduce backup storage footprints by as much as 25%. It goes without saying that these are very rough numbers, and which product would be more effective at reducing backup data will depend on your environment and how the packages were setup. Despite many hours of research I have been unable to find any independent tests that compare the effectiveness of each strategy – if data reduction is a priority for you, I would recommend trialling both products, and putting them to the test in your environment (If you do so, please share the results of this!)

For larger users, if you want to move beyond the integrated data reduction tools, both products have the ability to integrate with 3rd party dedupe appliances, and recommend that when using these devices, compression be turned off. In these circumstances Veeams dedupe can still offer some value by reducing network traffic but its effectiveness on data reduction will become redundant. vRangers ABM can also work in tandem with a 3rd party appliance to further reduce the backup storage footprint, although I suspect it’s benefits may be lessened as some of the whitespace and deleted data could be deduped. With their integration with SmartDisk, vRanger can also offer an alternative to using a 3rd Party appliance that is potentially more cost effective.


Update: 14/12/2011

Anton Gostev (@gostev) from Veeam sent the following tweet to @TDataoz (the company I work for, that 'tweeted' a link to this post) in response to this post:

@tdataoz 3 paragraphs of ABM praising and not a single mention of Veeam's unique capability of skipping swap file blocks? Doesn't smell good

Thanks for pointing this out, indeed it seems I missed a new feature in Veeam Backup & Replication v6 that can also assist in reducing the backup storage footprint. Veeam now include an option to skip the page file within virtual machines during both backup and replication jobs.

The page file (sometimes called a swap file) is 'a component of an operating system that provides virtual memory for the system. Recently used pages of memory are swapped out to this area on the disk to make room in physical memory (RAM) for newer memory pages.' (Introduction to VMware vSphere, Page 36) Skipping the page file will improve the efficiency of backup / replication jobs, and save some space in backup storage. A brief overview of this new feature is provided in Veeams Blog.

To understand the potential benefits in terms of data reduction, we need to calculate the size of the swap file (.vswp) using the formula 'configured memory – memory reservation = swap file. For example a virtual machine configured with 2GB and a 1GB memory reservation will have a 1GB swap file.' (Impact of host local VM swap on HA and DRS)

My apologies for missing this in my first pass, if I have missed anything else that may have an impact on data reduction, or you have any other feedback, please feel free to post a comment below.

Monday 5 December 2011

Veeam Vs. vRanger Part 2: Similarities


As discussed briefly in my previous blog post comparing Traditional and VMWare specific Backup applications, Veeam and vRanger have both been built from the ground up specifically to backup Virtual environments, and hence have some fundamental similarities. Before I get into discussing the differentiators between the two products in detail, I thought it would be worthwhile providing a quick overview of some of the key similarities…

Both Veeam and vRanger leverage the vStorage APIs for Data Protection (VADP’s) which allow them to backup VMs across multiple vSphere hosts, without requiring the installation of agents either in the ESX servers or the vSphere hosts. When compared with traditional backup software approach of installing agent on each Server or Virtual server to be protected, this approach substantially reduces the complexity of installation as you don’t need to spend time installing agents in each VM to be protected. Additionally, ongoing management of your backup solution is easier; product updates can be applied centrally, and both products allow you to backup either individual VMs, or by vSphere host, so if you add a new VM to a protected host, it will automatically be included in the backup job.

Another major benefit of this approach is that the VADPs allow Veeam / vRanger to perform backups from a separate backup server or VM without placing a load on vSphere hosts. By utilising the snapshot capabilities of VMware vStorage VMFS backup snapshots can be performed without any disruption to the VM’s, or applications that run on them, effectively eliminating the traditional backup window. In contrast, using an agent based or guest level approach to backing up VM’s can lead to significant resource issues on the host, especially if backup jobs are scheduled to run simultaneously. Staggering backup jobs can offset this effect, but will increase backup windows.

Veeam and vRangers integration with the VADPs also allow them to take advantage of Change Block Tracking (CBT) to increase the speed and efficiency of incremental and differential backups. The CBT feature creates a ctl.vmdk file for each VM containing a map of the virtual disk, that indicates the timestamp and last modification for each zone of the disk since the last snapshot was taken. Veeam and vRanger can access this information, and only need to process the blocks of the VM disk file that have been modified since the last full, incremental or differential backup was taken.

Currently, Veeam and vRanger also share some common weaknesses; they are both only capable of backing up Virtual Machines, so if you have any physical servers you will need a separate application to protect them, and there is no tape support in either product, so if you need to backup to tape, you will need a second solution to push the backups generated by Veeam / vRanger to tape. In addition to this, for applications that are not VSS aware, backing up at the host level can cause problems as the applications don’t know they have been backed up, so won’t truncate logs etc. It’s worth noting that both companies have plans to address some of these issues, I will discuss this in more detail in a later post discussing development roadmaps.

In summary both Veeam and vRanger have similar fundamentals, both can offer some substantial advantages and have some drawbacks when compared with traditional agent based backup software. Once you move past these fundamentals however, there are some fairly substantial differences between the products. In my next post I will compare both products approaches to data reduction, and the methods they employ to minimize the amount of storage required to hold backup data.

Next Up: Veeam Vs. vRanger Part 3: Data reduction techniques

Tuesday 22 November 2011

Veeam Vs. vRanger


In a previous post I discussed the differences between Traditional / Enterprise backup applications, and dedicated VMWare / virtual backup applications. After some positive feedback, I have decided to delve down a bit deeper and compare the two main products currently competing in the VM backup space, Veeam and vRanger

I should start by clarifying that as of yet, I do not have any first experience in using either of these applications, and although I do intend to do so, I have not yet run an eval on either package. The information used in the comparison has been sourced from various other blogs on the subject, speaking with customers using the products, online forums, and marketing collateral and support documentation from Veeam and Quest Software / vRanger including white papers, product datasheets etc. Wherever possible I have confirmed in at least two places any information relayed below, and included links to referenced materials. As mentioned in my blog intro, I have tried to do my upmost to ensure all of the information is accurate, but it has been a tricky process to sort through the marketing terminology and occasional misinformation provided on forums and blog posts out there, so if anyone has any additional information or comments on this comparison, please feel free to post a comment!

So, disclaimer aside, let’s get started. To make this a bit easier to digest, I have broken this comparison down into a number of smaller posts, the first of which I have included below, and the remainder of which I will put up over the coming weeks (time permitting!)

Veeam Vs. vRanger Part 1: The Companies
vRanger was first launched by Vizioncore, who were founded in 2002, and was the first specialist VMWare backup application to market. Quest Software acquired Vizioncore in January 2008, on the back of acquisitions of Invirtus, a provider of virtual machine optimization, conversion and automation products, and Provision Networks, a leader in virtual client and desktop management solutions in 2007, as part of the companies ‘plan to support increasing customer needs for managing their virtual environment.’ Since the acquisition of Vizioncore, Quest has further strengthened its hand in the virtual and data protection markets with additional acquisitions including Bakbone / NetVault Backup in 2011, and vKernel in November 2011 (a full list of their acquisitions is here.) Quest software currently claims to have more than 38,000 customers using vRanger, and over 2000 using NetVault Backup, to protect their virtual infrastructure.

Veeam was founded in 2006, by the team previously behind Aelita Software, and specialise in developing ‘innovative products for virtual infrastructure management and data protection.’ Their range of VMWare solutions includes Veeam Backup & Replication, Veeam Nworks VMWare Enterprise monitoring tools, and Veeam ONE™, a solution for optimizing the performance, configuration and utilization of VMware environments. Veeam Software currently claims to have more than 30,000 customers worldwide, and are adding new customers at the rate of 1500 per month.

Rivalry between Veeam and vRanger is fierce, vRanger offer a direct comparison to Veeam on their website here, and Veeam have written a number of scathing blogs on their website directly attacking vRanger, including Desperate Times Call For Desperate Measures, and Who is the industry leader in VMware backup? The two companies recently had a very public argument about the release of similar features in their releases last year that offer instantaneous recovery of virtual machines from a backup image, and there has also been a war of words on twitter with each company disputing the others claims to be the number #1 VMWare backup solution. 

Clearly there is no love lost in the quest to be the number one VMWare backup solution in the market. So whose product has the edge now? What can they both do well, and what are the differentiators?  These are the questions I will seek to answer in my upcoming posts…

Sunday 13 November 2011

Common Backup & DR Strategies - Comparison Table


The below table is designed to serve as a quick reference to compare common backup and disaster recovery strategies. It is certainly not an exhaustive list, but covers the majority of backup and disaster recovery solutions I have encountered while speaking with resellers in Australia - if you have any comments or can suggest some alternative solutions that should be included in this table, please let me know.


  
 Protects Against
Features / Cost
Common Backup Strategies
Hardware Failure?
Human Error (accidental deletion etc.)
Virus / hacker attack?
Fire, theft, natural disaster?
File Recovery time
Full Recovery time
Recovery Point
Cost
File Backup to Removable media (tape, external HDD, RDX, Flash drive) taken offsite
Yes
Yes
Yes.
Data is kept offline
Yes
Slow.
Media must be brought back onsite to recover files

Slow
Even if replacement hardware is available, Server will need to be rebuilt, and media brought back to site before files can be restored
Depends on last backup job, Typically  24hours for  offsite nightly backup
Affordable. Consists of upfront hardware cost, cost of labour to take media offsite and potentially store
File Backup to NAS or SAN Onsite
Yes
Yes.
No.
Data is online and on the same network so can be attacked by virus or hacker.
No
Fast
Files are easily accessed as data is online.
Slow
Systems need to be restored before files can be brought back
Depends on last backup job but tend to be scheduled to run more frequently than offsite nightly backups
Affordable,
Upfront hardware cost for disk array
File Backup to the ‘cloud’
Yes
Yes
Yes / No
Data is online, but on a different network, so chances of complete data loss are small, but still feasible (e.g. lost passwords)
Yes
Slow / Fast
Depends on the size of the file and bandwidth, but for small files recovery should be quick
Slow
Systems need to be restored, and data copied or physically brought back to site before files can be brought back
Depends on last backup job,
Typically 24hours for  offsite nightly backup, but easy to schedule to run more frequently
Affordable, low or no upfront cost, but ongoing subscription and bandwidth cost can ultimately be more expensive than two options above
File level backup to NAS or SAN Onsite, then to Removable media (tape, external HDD, RDX, Flash drive) taken offsite. (D2D2T)
Yes
Yes
Yes
Yes
Fast
Files are easily accessed as data is available on LAN
Slow
Systems need to be restored before files can be brought back
Depends on last backup job
Mid-range.
Consists of upfront hardware cost, cost of labour to take media offsite and potentially store


Image level Backup to Removable media (tape, external HDD, RDX, Flash drive) taken offsite

Yes
Yes
Yes
Yes
Slow
Media must be brought back onsite, and depending on software may need to be mounted to recover individual files

Slow / medium
If replacement hardware is available, once media is onsite, image level restore should save time as OS & apps do not need to be installed prior to recovery
Depends on last backup job, Typically 24hours for  offsite nightly backup
Mid-range.
Consists of upfront hardware & Backup Software cost, cost of labour to take media offsite and potentially store


Image level Backup to NAS or SAN Onsite
Yes
Yes
No
No
Generally Fast
Data is online so quick to access, may need to mount image to recover files depending on backup software
Fast if replacement hardware is available.
Depends on last backup job, but tend to be scheduled to run more frequently – can run as often as every couple of minutes with some software
Mid-range.
Consists of upfront hardware & Backup Software cost
Image level Backup to the ‘cloud’
Yes
Yes
Yes / No
Data is online, but on a different network, so chances of complete data loss are small, but still feasible (e.g. lost passwords)




Yes
Slow / Fast
Depends on the size of the file and bandwidth, also image may need to be mounted depending on backup software but for small files recovery should be fairly quick
Slow / Fast
Generally slow for local restore as data needs to be copied or physically brought back to site before restore is performed.
If replacement hardware is available, once data is back onsite, image level restore should save time.
Has the potential to be fast If online back service allows you to run up a virtual server in their datacentre and point users to that server.
Depends on last backup job, may be a nightly backup scheduled to run outside business hours to prevent network congestion, or could be incrementals scheduled to run as often as every couple of minutes with some software
Mid-Range.
Low or no upfront cost, but ongoing subscription and bandwidth cost, generally more expensive the online file based backup only
Image level backup to NAS or SAN Onsite, then to Removable media (tape, external HDD, RDX, Flash drive) taken offsite. (D2D2T)
Yes
Yes
Yes
Yes
Generally Fast
Data is online so quick to access, may need to mount image to recover files depending on backup software
Fast if replacement hardware is available.
Depends on last backup job, but tend to be scheduled to run more frequently – can run as often as every couple of minutes with some software for near CDP
Mid-range.
Consists of upfront hardware & Backup Software cost
Real time Replication & Failover to NAS or SAN Onsite with backup server or to backup server onsite
Yes
No – with Real time replication changes are tracked and replicated real-time, so if a user deleted a file on the production server, it would also be deleted on the backup server
No
No
Individual files can’t be generally recovered if they have been corrupted or accidentally deleted as changes will be replicated real time.
Very Fast / nearly instantaneous
Zero, changes are replicated in real time, so no data loss since last backup
Mid-high range.
Consists of upfront hardware & replication Software cost.
Real time Replication & Failover to NAS or SAN Offsite with backup server or to backup server Offsite
Yes
No – with Real time replication changes are tracked and replicated real-time, so if a user deleted a file on the production server, it would also be deleted on the backup server
Yes / No
Data is online, but on a different network, so chances of complete data loss are small, but still feasible
Yes
Individual files can’t be generally recovered if they have been corrupted or accidentally deleted as changes will be replicated real time.
Very Fast / nearly instantaneous
Zero, changes are replicated in real time, so no data loss since last backup
Generally High, Hardware required includes multiple disk arrays and backup servers. Software to drive replication and failover. Ongoing cost of bandwidth between main site and DR Site.
Real time Replication & Failover to NAS or SAN Offsite with backup server or to backup server Offsite. Plus Backup of sever(s) to disk, then to removable media.
Yes
Yes

Yes
Yes
Fast
Very Fast / nearly instantaneous
Zero, changes are replicated in real time, so no data loss since last backup
Expensive. Hardware required includes multiple disk arrays. backup servers. and tape dirves or similar. Software to drive replication and failover, plus licences for backup software. Ongoing cost of bandwidth between main site and DR Site.