Dealing With Exchange Server Failure: The Right Tool For The Job

Tuesday, August 30, 2016 by The Data Experts


Businesses that experience an email system outage can lose days of productivity, and as IT staff scramble to troubleshoot the issue, costs rise exponentially during each minute of downtime.  The failure of an Exchange server illustrates how a dedicated data management and restoration tool could have provided a faster recovery.

Multiple failures

A large manufacturing company experienced an unexpected data loss with its email system resulting from a series of failures.  A clustered server system was in place using an Exchange server supporting over 1,000 users, but due to local log file storage, one of the cluster servers started running out of disk space and suffered from performance issues.  Shortly thereafter, a second node in the server array failed and it would not mount the Information Store.  Microsoft’s support team assisted the IT staff by bringing the server offline and after performing repairs and taking a backup, the server was back online.

The server worked well for about a day, but users started to notice issues with appointment scheduling and message corruption.  As a remedy, another Exchange server was set up and user mailboxes were migrated to the new server, which seemed to stabilize the clustered servers.  A few days later, the clustered server system crashed again. IT set up yet another temporary messaging server so that users could continue to send and receive messages; however archived message data was not available.  With the users at least able to function with their email, work began on getting the archived user email out of the original Information Store, but a restore of the backup did not bring back any usable data.  At this point, the IT team was struggling and users were frustrated after almost two weeks of email issues.

Backup headaches

Executive management began requiring daily updates from the IT Director, while frustration and tension continued to increase among the entire messaging team.  The backup software vendor was called onsite and their experts began examining backup logs for a solution.  It was determined that a faulty tape machine within the backup library system corrupted the backup.  Unfortunately, this also corrupted the other backups kept onsite.  This left the unmountable Information Store from the clustered message servers as the only way to restore archived user email.  An Exchange repair application (Eseutil) was executed on a copy of the 100 GB Information Store, but after 12 hours there was no indication if there was going to be a successful repair or recoverable data.

Help from experts

This company did not know what to trust - hardware was failing, their restoration attempts were going nowhere and their IT team was exhausted.  Finally, they brought in a professional data recovery company to start mailbox extraction.  In less than 24 hours, over 1,400 mailboxes were successfully restored and merged into the existing messaging server.

In the end, the IT department rebuilt their Exchange server environment while the users worked from the temporary server, and future maintenance moved users to the permanent system.

In this case, the IT team did everything it could to rescue their system; they called in all of their vendors and involved Microsoft’s support services early on. From a business continuity standpoint, they did everything right despite challenges and roadblocks from a continually failing system.

In hindsight, a dedicated Exchange management tool could have been used to process the original Information Store and extract mailboxes directly to the temporary email server.  They could have then accessed the required data without changing any of the internal database contents.  Eseutil for example, has the tendency to overwrite critical meta-data and message tables if it discovers unreadable corruption.