With the typical organisation experiencing substantial year
on year growth in storage requirements ( IDC Research in 2009 suggests large
enterprises storage requirements are increasing by 60% per year) and server
virtualisation pushing storage demands to new highs it is little wonder
than technologies like deduplication and compression focused on reducing
storage requirements have been in the spotlight in recent years. As many of
these technologies come with a performance trade off, they have yet to have a
huge impact on primary storage, but have gained considerable traction in less IO
intensive storage including backup data storage. As such, most major players in
the backup software space have been quick to adopt these technologies, and
Veeam and vRanger are no exception. In this post, I will compare the two
products differing approaches to reducing the amount of storage required to
hold backup data.
Veeam Backup & Recovery comes as standard with
integrated in-line block level deduplication and compression, by comparison,
vRanger uses a combination of compression and Active Block Mapping (ABM)
to ensure they only backup blocks being actively used by the disk, but do not
have dedupe integrated into the product. Quest / vRanger’s answer to Veeams
dedupe has come in their latest release, vRanger® 5.3, which enables full
integration with NetVault® SmartDisk, a software based
deduplication engine originally developed to work with NetVault Backup, Quest’s
enterprise level backup application. In addition to this, both products can
also integrate with 3rd party dedupe appliances like the HP
StoreOnce, and
Fujitsu’s ETERNUS
CS800. So which approach is better?
Veeam…
Veeam’s main differentiator in the data reduction stakes is
their integrated deduplication, it is certainly a big selling point, and
enables them to market their product as a cost effective all in one solution. It
does however come under fire from the competition who have indirectly referred
to it as primitive – so what are its limitations?
Veeam’s
dedupe is in-line, fixed length block level deduplication, block sizes are configurable
and which is appropriate depends on the target storage,
Local - 1MB; LAN – 512k; WAN - 256k. Veeam claim dedupe ratios in excess
of 10x, and a whitepaper
on their website contains some decent dedupe / compression ratios from internal
testing by Veeam. On a 60GB backup job with 6 VMs their compression / dedupe
combined produced a ‘combined technology saving’ of just over 80%. Interestingly,
compression alone on the same job saved around 62% (which is very similar to
the compression
ratios in vRanger,) so in this example, dedupe is only reducing data by an
extra 20% or so beyond this. I’m not belittling this because it is still offers
a substantial reduction in terms of the amount of backup storage required
(about 50% over compression alone,) but it’s just not as dramatic as I had
expected before I started researching the numbers, and is certainly not as
substantial as the ratios that can be obtained using dedupe appliances.
The reason for this difference in performance is twofold. Firstly,
the block sizes used by Veeam’s dedupe are larger than those used in dedupe appliances.
Veeam have done this deliberately as larger blocks mean less resources are
required to run the dedupe process and backup / restore jobs run quicker, but
it does reduce the effectiveness of dedupe. Secondly, because Veeam uses fixed
length blocks, when the data in a file is moved, for example when adding some
additional text into a Word document, all of the subsequent blocks in the file
will be rewritten, and are likely to be considered as different from those in
the original file.
Another potential limitation of Veeam’s deduplication is
that is can only dedupe against VM’s in the same backup job. This is done very
much by design as explained in these Veeam
dedupe FAQ’s, but the drawback is that to achieve Global
deduplication (dedupe across all of your VM’s,) you would need to backup all of
your VM’s in the same backup job. For smaller installations, this may not be
problematic, but for customers with 10’s or 100’s of VM’s to backup, it is not
practical as VM’s can only be backed up to the same job serially.
Despite these limitations, I believe that for smaller
installations, and companies looking for an all in one cost effective VM backup
solution, it is a great feature to have included in the package. For larger
installations of 10’s or 100’s of VM’s, it can become more cost effective to
consider a 3rd party appliance. Veeam are certainly not blind to
this, and have partnered some of the major players in this space to test the
integration of their product with such appliances - they have produced a couple
of whitepapers on the subject; one with Exagrid,
and the other with HP
StoreOnce, and claim that they can achieve data reduction of up to 95% when
using the approach.
It’s worth noting that Veeam’s integrated dedupe doesn’t
become entirely redundant / still has some value when using 3rd party dedupe
appliances. While compression should be turned off as dedupe doesn’t work as
well with compressed data, Veeam recommend leaving dedupe on as both
deduplication techniques can provide complementary benefits. As Veeam’s inline
deduplication takes place during the backup job either on the client machine or
dedicated Veeam backup server, it can reduce the amount of data sent over the
network.
vRanger…
While vRanger does offer compression of backup data, which
can achieve data reduction in the range of 50-70%,
they do not have deduplication integrated into their product. Until the latest
release offering integration with NetVault’s SmartDisk, their only answer
for clients who wanted dedupe was integration with 3rd party
appliances (similarly to Veeam, Quest have tested the integration
of vRanger with the HP StoreOnce appliance.)
SmartDisk is a software based byte-level, variable-block,
post process deduplication product that was originally developed for use in
NetVault Backup, and now available for use with vRanger. Quest software claim
that SmartDisk can reduce backup data by up to 90%, which compares well with
dedupe appliances, and is potentially less expensive. Smaller and variable
block sizes should mean better dedupe ratios can be achieved with SmartDisk than
by using Veeams integrated dedupe. Post process dedupe means that processing
doesn’t need to be carried out on the host or dedicated backup server so doesn’t
have as much of an impact on backup windows, but may lead to more network
traffic. So arguably SmartDisk is a more advanced dedupe engine than Veeams integrated
dedupe, but it doesn’t come for free, and does still have some limitations.
While the licencing for SmartDisk is cheaper than purchasing
a dedupe appliance, at this stage, Quest recommend running it on a dedicated
physical server (server
specs on page 47,) and disk space is also required to store the data, which
adds at least a few thousand dollars to the price of the solution. In addition
to this, SmartDisk currently has a limit of 15TB of unique data per SmartDisk
instance, so if you have more data than this, you will need to deploy multiple
instances, and data would not be deduplicated between these.
One of vRangers’ biggest differentiators over Veeam is their
patented technology Active
Block Mapping (ABM) which ensures that data from deleted files inside the
OS are not backed up. Quest claim that ABM can reduce backup image sizes by up
to 25% and improves backup performance by an average of 33%.
Active Block Mapping works by reading entries from the
Master File Table (MFT) from outside the VM to establish where the deleted
data, active data and while space (zeroed blocks) are located in the NTFS file
system. The process takes less than a second and has no overhead, once this
information is received, vRanger is able create a map of only the active data,
so deleted and zeroed blocks can be skipped. (a more details explanation of ABM
can be found in this tech
target article, or by watching this video
from Quest) ABM is a very clever technology, and on the surface, offers
similar data reduction on average as Veeams integrated dedupe, albeit in a very
different way.
Conclusions…
Comparing the integrated data reduction tools, on the
surface Veeam’s Compression and Dedupe and vRanger’s Compression and Active Block
Mapping, appear to achieve a similar overall result in terms of data reduction.
Compression in both products seems fairly equivalent and will produce somewhere
between 40-70% reduction depending on the type of data being stored. Veeam’s
dedupe when used in tandem with compression manages to save approximately an
additional 20% according to their white paper, and Quest claim ABM has the
potential to reduce backup storage footprints by as much as 25%. It goes without saying that these are very rough numbers,
and which product would be more effective at reducing backup data will depend
on your environment and how the packages were setup. Despite many hours of
research I have been unable to find any independent tests that compare the
effectiveness of each strategy – if data reduction is a priority for you, I
would recommend trialling both products, and putting them to the test in your
environment (If you do so, please share the results of this!)
For larger users, if you want to move beyond the integrated
data reduction tools, both products have the ability to integrate with 3rd
party dedupe appliances, and recommend that when using these devices, compression
be turned off. In these circumstances Veeams dedupe can still offer some value
by reducing network traffic but its effectiveness on data reduction will become
redundant. vRangers ABM can also work in tandem with a 3rd party
appliance to further reduce the backup storage footprint, although I suspect it’s
benefits may be lessened as some of the whitespace and deleted data could be
deduped. With their integration with SmartDisk, vRanger can also offer an alternative
to using a 3rd Party appliance that is potentially more cost
effective.
Update: 14/12/2011
Anton Gostev (@gostev) from Veeam sent the following tweet to @TDataoz (the company I work for, that 'tweeted' a link to this post) in response to this post:
@tdataoz 3 paragraphs of ABM praising and not a single mention of Veeam's unique capability of skipping swap file blocks? Doesn't smell good
Thanks for pointing this out, indeed it seems I missed a new feature in Veeam Backup & Replication v6 that can also assist in reducing the backup storage footprint. Veeam now include an option to skip the page file within virtual machines during both backup and replication jobs.
The page file (sometimes called a swap file) is 'a component of an operating system that provides virtual memory for the system. Recently used pages of memory are swapped out to this area on the disk to make room in physical memory (RAM) for newer memory pages.' (Introduction to VMware vSphere, Page 36) Skipping the page file will improve the efficiency of backup / replication jobs, and save some space in backup storage. A brief overview of this new feature is provided in Veeams Blog.
To understand the potential benefits in terms of data reduction, we need to calculate the size of the swap file (.vswp) using the formula 'configured memory – memory reservation = swap file. For example a virtual machine configured with 2GB and a 1GB memory reservation will have a 1GB swap file.' (Impact of host local VM swap on HA and DRS)
My apologies for missing this in my first pass, if I have missed anything else that may have an impact on data reduction, or you have any other feedback, please feel free to post a comment below.
Update: 14/12/2011
Anton Gostev (@gostev) from Veeam sent the following tweet to @TDataoz (the company I work for, that 'tweeted' a link to this post) in response to this post:
@tdataoz 3 paragraphs of ABM praising and not a single mention of Veeam's unique capability of skipping swap file blocks? Doesn't smell good
Thanks for pointing this out, indeed it seems I missed a new feature in Veeam Backup & Replication v6 that can also assist in reducing the backup storage footprint. Veeam now include an option to skip the page file within virtual machines during both backup and replication jobs.
The page file (sometimes called a swap file) is 'a component of an operating system that provides virtual memory for the system. Recently used pages of memory are swapped out to this area on the disk to make room in physical memory (RAM) for newer memory pages.' (Introduction to VMware vSphere, Page 36) Skipping the page file will improve the efficiency of backup / replication jobs, and save some space in backup storage. A brief overview of this new feature is provided in Veeams Blog.
To understand the potential benefits in terms of data reduction, we need to calculate the size of the swap file (.vswp) using the formula 'configured memory – memory reservation = swap file. For example a virtual machine configured with 2GB and a 1GB memory reservation will have a 1GB swap file.' (Impact of host local VM swap on HA and DRS)
My apologies for missing this in my first pass, if I have missed anything else that may have an impact on data reduction, or you have any other feedback, please feel free to post a comment below.