blog-troubleshooting-exchange-backups_SQ1One of the most important things you can do to insure business continuity, disaster recovery, and happy executives, is to perform Exchange backups on a regular and a frequent basis. Whether you have a disk failure, a datacenter disaster, or just an end user who is a bit trigger-happy on the Delete key, being able to restore data is a critical function of administering an Exchange environment.

In this post, we aren’t going to talk about the merits of incremental versus differential backups, or when you do or do not want circular logging. Nor will we preach the importance of performing regular restores or storing backups offsite. We just want to focus on troubleshooting when backups either fail, or fail to complete in the time allotted.

Understanding how Exchange backups work

Before you can really troubleshoot something effectively, you need to understand how it is supposed to function when everything is right with the world. Exchange backups could be covered by an entire series of blog posts; fortunately, which Microsoft’s Exchange Team has already done that.

It’s a few years old but still completely relevant, and you should check out the three-part series “Everything You Need to Know About Exchange Backups” on the You Had Me at EHLO blog. Part one can be found at https://blogs.technet.microsoft.com/exchange/2012/06/04/everything-you-need-to-know-about-exchange-backups-part-1/ and the links to parts two and three are at the top of that page. It goes into VSS, COW, and DAG and is well worth the read. Go ahead, and then come back here for more.

When disk contention is just too much

One of the things that can really prevent your backups from completing is disk contention. Consider you want to make a copy of ALL the data on a disk, or volume, or maybe just a database, but to backup all that data means you need to read all that data. That’s a LOT of disk I/O to contend with.

If nothing else is going on, that should be pretty straight forward, but what are some of the other things that may be happening to that disk which could fight for disk access? Other disk intensive operations that might trigger during backup windows include

  • Archiving operations
  • MRM policies
  • Indexing
  • Full antivirus scanning
  • Compliance/DLP searches
  • Mailbox moves

If any of these activities are taking place at the same time that backups are attempting to run, your disk subsystem may just not be able to handle all the activity, and that can lead to backup failures. In addition to competing processes, look as the disk queue length in Performance Monitor, and if you see double-digits, suspect disk contention.

Locks (and keys)

Certain operations can place locks on mailboxes, and if there are too many locks, or if a lock lasts too long, a backup can fail. For example, mailbox moves will lock the mailbox at a certain point to complete the move. Also, too many clients all accessing the same mailbox concurrently can place a number of read operations against the mailbox. Above 10 and any subsequent requests will immediately fail with an error MAPI_E_TIMEOUT (0x80040401) rather than being queued.

Depending on the backup solution you are using, you may also have mailboxes or content within a mailbox that is encrypted. Backup solutions that grab binary data won’t care about encryption, but the ones that try to read items (either for deduplication purposes or to only backup specific items) can fail when they cannot read the mailbox or items within it, because they don’t have the keys necessary to decrypt the data.

Review your logs to ensure a lock operation or failure to read operation causes the backup to fail, and if it does, dig deeper to find the cause of the lock or consider an alternative where data cannot be decrypted.

Tools to use

In addition to review the log files on the operating system and within the backup software, the VSSTester script from Microsoft can be used to diagnose backups on Exchange 2010, 2013, and 2016. As the name implies, it looks at the functions that use the Volume Shadow Copy Service, which includes both the Exchange Information Store Writer and the Exchange Replica Writer, both of which are leveraged by many different backup applications. The script both creates a disk shadow backup of a targeted database to ensure proper operation, and then performs diagnostics to determine where problems may reside.

You can download the script from https://gallery.technet.microsoft.com/scriptcenter/VSSTesterps1-script-4ed07243 and read more on how to use it, including what logging you need to enable to get the most out of it, from https://blogs.technet.microsoft.com/exchange/2013/04/29/troubleshoot-your-exchange-2010-database-backup-functionality-with-vsstester-script/.

If you are having problems with your Exchange backups, understanding how backups work, what can cause them to fail, and having the right tool to help diagnose the issue are all essential to troubleshooting the issue, and resolving it. It will take some homework before the script will be really useful to you, but if you’re running into issues, the above will sort you out in most cases.