Linux Data Recovery and Linux File Systems - An Overview

Thursday, December 31, 2009 by The Data Experts

Earlier this year we entered a new era of hard disk storage when manufacturers rolled out a 1TB single hard disk drive.

One hard disk drive manufacturer is planning to deliver 4TB hard disk drives in 2011. While we all expect our personal libraries of music, videos, and documents to increase over the next few years, the entertainment industry is expecting to see a 1000% increase in total digital storage. According to analysis by Coughlin Associates, more than six exabytes of digital storage will be used for archiving, content conversion, and content preservation by 2012.  An exabyte is a billion gigabytes in decimal terms. Another way to visualize this is in terms of DVDs; one exabyte is equivalent to 250 million DVDs.  (That’s the 4GB size.) Six exabytes would equal 1.5 billion DVDs.

Transmitted computer data isn’t far behind.  Networking giant Cisco Systems, Inc., published a forecast about the amount of data that will flow through the Internet in the very near future. The report lists total Internet traffic nearly doubling every two years and that consumer IP traffic will surpass business traffic and will be at 18 exabytes per month by 2011.  Global Internet video (excluding Peer-2-peer usage) is estimated to be approximately 120 petabytes per month in 2006.

As Internet IP traffic grows, storage needs will also grow.  We’ll move from the terabyte era to the exabyte era in a matter of years.

Your users or clients are dealing with an explosion of data growth. The challenge to IT professionals right now is managing all of this data and improving file access performance. Why do we say “improving”? Maintaining the status quo is not good enough. Large data storage devices are storing millions or even billions of files.  With data storage systems growing exponentially our accessing tools need to be progressing as well. In addition to storing large data sets, what options are available for file systems?  And when an unforeseen data loss occurs with one of these behemoths, who can provide data recovery?

Over the years, Linux has become the operating system of choice for many IT professionals. In the Linux environment, there are many different file systems available. With all the choices, selecting the right file system for users or clients can be challenging. Read on to explore some of the things to consider. (Read here first to learn more about Linux operating systems.)

Linux File Systems

In the past, there wasn’t a lot of choice when it came to file systems.  The operating system only offered one or two choices for a file system and the file system was usually so transparent that it was taken for granted.

Over the years, programmers have contributed to the development of new and existing file systems for Linux.  Linux operating systems offer a variety of choices for the organization and management of data files on hard disk drives.  File systems are interchangeable with the Linux operating system by design; this is part of the portability of the operating system.  This is called the Virtual File System (VFS) inside the Linux kernel (the fundamental core of the operating system).  There is a great deal of discussion in Linux communities regarding the positive and negative aspects of each file system type.  The following table lists, in no particular order, basic choices of Linux file systems and their commonly used aliases.  Current choices of Linux file systems are as follows. At the end of this article there are links to information that describes these file systems.

Linux File System
Alias
Second Extend File System
EXT2
Third Extended File System
EXT3
File system for Silicon Graphics’ IRIX operating system
XFS
Journaling File System for IBM’s AIX operation system
JFS
Journaling File System, 64bit file system for IBM’s AIX operation system
JFS2
Journaling file system from NameSys
ReiserFS
FAT12, FAT16, FAT32 owned by Microsoft
FAT
Unix File System; similar to the Berkeley Fast File System (BSD FFS)
UFS

The following is a list of special Linux file systems that require additional configuration or that are owned by specific companies:

Linux File System
Alias
New Technology File System owned by Microsoft
NTFS
Veritas File System owned by Veritas/Symantec
VxFS
Oracle Cluster File System owned by Oracle
OCFS2
Global File System; used for Linux cluster computing
GFS
General Parallel File System developed by IBM for clustered computing
GPFS
Novell Storage Services owned by Novell and ported over to SUSE Linux
NSS
Zetabyte File System owned by Sun Solaris
ZFS

There are a lot of choices of Linux file systems for workstations and servers. Where does one start?  Here are some things to consider.

  1. Determine your file system needs by reviewing your user’s or customer’s environment.  Here are a few business IT requirements to consider.
  • File system recoverability
  • Security requirements
  • Database file support
  • File server
  1. Will the data be stored as part of a high-performance computing operation?  Examples of high-performance computing servers would be weather modeling systems, molecular modeling databases, or human genome databases.  These types of high-end systems require a lot of processing power and memory and also a database and file system that stores massive amounts of raw information.
  2. Finally, determine the file system for user workstations so that business productivity is maximized.  Because of the portability of Linux, a variety of file systems can be used based on business requirements. For example, a company’s video production unit may require vast amounts of storage space for editing; however, the business administration side of the company is hardly likely to require that level of performance from their file system.

File System Testing

The best way to answer the above questions is to perform research and testing. The goal should be to determine the performance and reliability of each file system under consideration.  Use applications that test and benchmark the file systems being considered (here are some utilities to do that.) Then begin using the system normally, logging the timing, and performance.  One writer for the Linux Gazette has benchmarked the most popular file system, read his findings here.

Other recommended tests involve simulating high volume file environments and then reproducing power failures. How long does it take for the volume to become ready, or ‘mount?’  How long does it take File System Check (FSCK) to work through the file system when there are errors?  To test file data integrity, use a MD5 Hash Generator for a group of files, then perform the above tests to make sure the files remain the same. An MD5 Hash Generator is a mathematical algorithm that is used to create a unique signature, or “fingerprint” of a file or set of files to determine if any files suffered internal data corruption.

Testing the storage and performance of large files is important because nearly all Linux file systems fragment the files that are stored.  Getting benchmarks for large file storage, helps determine what file system handles user or client needs.

The above suggestions for testing simulate extreme cases and it may be that users or systems will never reach the limits of the testing.  However, to make the best choice in Linux file systems, they must be tested to know what can and cannot be handled.

The Leader in Linux Data Recovery

Perhaps users or clients do not realize they are using a version of Linux. For example, Digital Video Recorders (DVR) have a Linux file system variant on it.  A small Network Attached Storage (NAS) device for the home or small office network may also have a version of Linux on it.  Future mobile phones may be running a Linux operating system simply because of its ease of design and flexibility.  To sum up, software developers are using elements of different Linux file systems for new products.

The proliferation of Linux file systems are due to the open-source nature and general public licensing that follows these designs.  No one person or company owns them, therefore their growth and improvement is limitless.

Despite improvements, however, there will always be unforeseen data loss occurrences where either the hard disk drive will malfunction or crash, or errant data corruption will occur and the file system will no longer be mountable.  This is where a professional data recovery service is needed.

Ontrack has been successfully recovering data from Unix and Linux file systems for many years and our unique approach sets us apart from other data recovery companies.

References:

  • Dr. Mark H. Kryder (2006). "Future Storage Technologies: A Look Beyond the Horizon" (PDF). Storage Networking World conference, Seagate Technology
  • Dominic Kay, “File System Performance: The Solaris OS, UFS, Linux Ext3, and RieserFS” (PDF), Sun Microsystems, Inc.
  • Novell Developer Wiki, “Linux File System Overview” (html link), Novell, Inc.
  • Jim Mostek, Willian Earl, Dan Koren, Russell Cattelan, Kenneth Preslan, Mattew O’Keefe, “Porting the SGI XFS File System to Linux” (PDF), Silicon Graphics, Inc. and Sistina Software, Inc.
  • Sheryl Calish (2004) “Guide to Linux Filesystem Mastery” (html link), Oracle Corporation
  • Mingming Cao, Suparna Bhattacharya, Ted Tso (2007), “EXT4: The Next Generation of EXT2/EXT3 Filesystem” (PDF), USENIX 2007, International Business Machines
  • Val Henson, Amit Gud, Arjan van de Ven, Zach Brown, “Chunkfs: Using divide-and-conquer to improve file system reliability and repair” (PDF), USENIX 2006, Intel Corporation, Oracle Corporation, Kansas State University
  • Steve Best (2000), “JFS Overview”, International Business Machines
  • IBM AIX Documentation, “JFS and JFS2” (html link), International Business Machines
  • Eugenia Loli-Queru (2001), “Interview with the People Behind JFS, ReiserFS, and XFS” (html link), OS News
  • Brian Wong (2004), “Design, Features, and Applicability of Solaris File Systems” (PDF), Sun BluePrints, Sun Microsystems, Inc.
  • Juan Peirnas, Sorin Faibish (2007), “DualFS: A New Journaling File System for Linux” (PDF),  Linux Storage and Filesystem Workshop, Pacific Northwest National Laboratory, EMC2 Corporation
  • Eric Kustarz, “ZFS: The Last Word in File Systems” (PDF), Sun Microsystems, Inc.