Recently a coworker was doing some research with SQL 2008 Backup Compression on a very large database. During the testing they noticed that the backup file shrunk rather than grew.
The database that the test was being run on was 360GB in size. The backup initially created a file that was 120GB in size. The file then shrank to 43GB by the time the backup was complete. The file only shrinks when the drive has sufficient space for a file that is 1/3 the size of the database. If there is not sufficient space the backup file will grow from 0.
The concern that immediately came to mind is that the backup may be writing a file to the file system then performing some compression on it. Compression of the file after it is written would limit the ability to backup to anywhere but a local drive. It would also mean that the local drive would have to be optimized for read-write instead of just write.
I recently had a chance to work with someone from Microsoft that was able to get with the right people to figure out what was going on. It turns out that this behavior is a feature meant to enhance performance.
Here is what they had to say:
The issue is that we attempt to pre-allocate the backup files so that they don’t keep getting extended during the backup process, which slows things down locally, and REALLY makes them slow (>1 order of magnitude) if your backup file is on an SMB share.
For uncompressed backups, we have a pretty good idea of how much data we’ll be backing up.
For compressed backups, the eventual size depends on how compressible the data is, which we can’t tell beforehand.
So, we chose a value which is an average result in customer databases, which is 1/3 of the volume of data to be backed up.
If the backup ends up being larger than this estimate, we’ll grow the file as needed. If it ends up being smaller, we’ll trim the file back to the size actually used.
My fears of performance issues were unfounded. I was also surprised to find out that the file could grow as well as shrink, although it makes perfect sense when I think it through. The big takeaway I see is making sure that the locations I back up to have free space equal to at least 1/3 the size of the largest database being backed up there to get best possible performance. This sounds like a good excuse to write a PowerShell script.
More generally, the SQL Server community is amazing. The mere act of participating in the community means that at some point you will come into contact with someone that can answer just about any question you could have. I am in awe.