"Data deduplication only stores unique data to reduce the amount of total disk storage required. However, depending on how you implement data deduplication, the backup and restore performance can be greatly impacted." Bill Andrews, president and CEO, ExaGrid Systems
In the last section, we considered how organisations can use disk for backup at the cost of tape. We explored the differences between scale-up and scale-out architectures and the different approaches to deduplication.
This section will evaluate the aspects of your environment which can affect the size of your backup system. It's critical to size the system correctly and also choose the right architecture to avoid costly forklift upgrades.
Just as many factors must be considered in evaluating the architectural implications of different disk backup with deduplication products; many aspects of your environment are a part of the equation to ensure that you are sizing the system correctly.
In primary storage you can simply say, "I have 8TB to store and so I will buy 10TB." In disk-based backup with deduplication, a sizing exercise must be conducted based on a number of factors so that you avoid the risk of buying an undersized system which quickly exceeds capacity.
As discussed in the third chapter, the data types you have directly impact the deduplication ratio and therefore the system size you need. If your mix of data types is conducive to deduplication and has high deduplication ratios (e.g. 50:1), then the deduplicated data will occupy less storage space and you'll need a smaller system. If you have a mix of data that does not deduplicate well (e.g. 10:1 or less data reduction), then you will need a much larger system.
What matters is what deduplication ratio is achieved in a real-world environment with a real mix of data types.
Deduplication method has a significant impact on deduplication ratio. All deduplication approaches are not created equal.
The number of weeks of retention you keep impacts deduplication ratio as well. The reason is that the longer the retention, the more the deduplication system is seeing repetitive data. Therefore, the deduplication ratio increases as the retention increases. Most vendors will say that they get a deduplication ratio of 20:1, but when you do the maths, that is typically if the retention period is about 16 weeks. If you keep only two weeks of retention, you may only get about a 4:1 reduction.
Example: If you have 10TB of data and you keep four weeks of retention, then without deduplication you would store about 40TB of data. With deduplication, assuming a two per cent weekly change rate, you would store about 5.6TB of data, so the deduplication ratio is about 7.1:1 (40TB ÷ 5.6TB = 7.1:1).
However, if you have 10TB of data, and you keep 16 weeks of retention, then without deduplication you would store about 160TB of data (10TB x 16 weeks). With deduplication, assuming a two per cent weekly change rate, you would store about 8TB of data, which is a deduplication ratio of 20:1 (160TB ÷ 8TB = 20:1).
Your backup rotation will also impact the size of the disk-based backup with deduplication system you need. If you are doing rolling full backups each night, then you need a larger system than if you are doing incremental backups on files during the week and then a weekend full backup.
Rotation schemes are usually:
Database and email
Because the backup rotation scheme you use changes how much data is being sent to the disk-based backup with deduplication system, this also impacts the system size you require.
Sizing scenario A: You are backing up data at site A and replicating to site B for disaster recovery. For example, if site A is 10TB and site B is just for DR, then a system that can handle 10TB at site A and 10TB at site B is required.
Sizing scenario B: However, if backup data is kept at both site A (e.g. 10TB) and at site B (e.g. 6TB) and the data from site A is being replicated to site B while the data from site B is being cross-replicated to site A, then a larger system on both sides is required.
In summary, dozens of possible scenarios impact the sizing of a system, including:
When working with a vendor, ensure they have a sizing calculator and that they calculate the exact size of the system you need based on all of the above.
The mistake often made is that the system is acquired and in a few short months, it is full because the system was undersized, retention was longer, the rotation scheme put more data into the system, the deduplication method had a low deduplication ratio, or the data types were such that they could not deduplicate well.
The truly knowledgeable vendors understand that disk-based backup with deduplication is not simply primary storage; therefore, they have the proper tools to help you size the system correctly.
This guide explains the various backup complexities, enabling you to ask the right questions and make the right decision for your specific environment and requirements. Stay tuned for the next part of this guide, which will be live on ITProPortal shortly.