The risk of NAS servers: high complexity recovery
You think recovering a “cheap” Network Attached Storage (NAS) server is an easy task? Well think again! NAS servers may be low-cost RAID based storage servers aimed at small to medium companies, but nowadays they are technically as complex and difficult to recover from as similar high-end (and more expensive) products from EMC, Dell, HP, etc.
Since most of these “small” storages now have advanced features similar to the high-end versions including deduplication, support of virtualisation, iSCSI targeting, etc., the data is organised in a lot of different layers which have to be recovered and restructured sequentially in order to access the “real” data files.
A risky shutdown
Recently one of our German clients experienced a failure in their QNAP NAS. Their RAID6 based system consisted of 24 hard disk drives with a capacity of 6 TB each. During setup two iSCSI LUNs were created containing over 40 and 60 TB of data respectively. Each LUN was an iSCSI target of one Windows Server 2012 R2 system and formatted as a NTFS partition. Additionally, the deduplication feature of Windows Server 2012 R2 was used on both LUNs.
One small server, one big problem
One day, the business – an IT department of a large real estate group – experienced sluggish behaviour in their NAS so the system was forced shut. The result of this decision was that both LUNs became inaccessible even though the drive letters were still showing. And even far worse for the client: there was no backup available at all!
It was at this time that the company contacted Kroll Ontrack for help. Our analysis showed that to perform the data recovery, six layers of data structure needed to be processed consecutively. Since a QNAP NAS works with a Linux data system, our specialists had to first reconstruct the QNAP NAS RAID6 layer to get to the Linux EXT4 file system. In this EXT4 file system, fragments of the missing LUNs were located and had to be rebuilt to access the 64 TB and 44 TB raw LUN data. The size of each fragment was about 1 TB each, so more than 100 data “iSCSI pieces” had to be reviewed, structured and put back together to rebuild each one of the two iSCSI LUN files.
These iSCSI fragments were originally managed by the QNAP system and combined on the fly, so the Windows Server system believed that it could access both existing LUNs. The data recovery experts managed to combine these iSCSI LUN files into a single NTFS volume after the iSCSI files were copied block wise to a temporary SSD storage. Since deduplication was also enabled on the Windows Server in this system, the engineers had to work on the sixth and final layer to find out what data was affected by this feature and create a usable NTFS volume which the company was then able to use. This NTFS volume was then copied from the temporary SSD storage onto a newly purchased RAID storage system the business could easily attach to its network to access the data again.
Even with the highly specialised Kroll Ontrack proprietary recovery tools, getting this much data recovered and copied onto the new backup storage took several weeks. The final analysis of this case by Kroll Ontrack experts clearly showed that the NAS server was not set up correctly after it was purchased and this might have caused the failure.
Learnings for the future
As described, the recovery of the data – due to the huge amount that was stored on the server – can take a very long time. This is why it is recommendable (and critical) to set up a NAS server correctly before any data is stored there. It is also wise to plan ahead for potential data loss situations – you don’t want to start researching for help during critical times!
This story also highlights that even though a NAS system has many advanced features, you should be careful when choosing which ones to use since they are not as failure-proof as high-end server products. For example in the case discussed above, using the deduplication feature made the data recovery even more complex than necessary. Since hard disks for these systems have become cheaper in the last few years, it may have been better to use additional disks instead of making the data structure unnecessarily more complex and by doing so, risking that even data recovery experts might not be able to help in a manageable timeframe and at reasonable costs.
Image credit: Paul-Georg Meister / pixelio.de