The idea of IT data is very abstract and usually described in an unattractive and boring way. Yet most of the data stored on hard drives is personal, important to us and quite often unique. Regardless of whether it’s holiday photos, favourite movies or business data, the value of your disk drive is equal to the value of the data you store on it.
Almost 35 years ago, a tiny company called Ontrack entered the market with a programme that allowed you to install bigger hard drives, as big as 40 MB drives – those were the days! Why, then, have we switched industries? Because we lost our data – but that is another story.
Where are my files? – A detective’s puzzle
Regardless of whether you store your pictures; videos; ebooks; homework; or you are the administrator of a server that hosts websites and/or huge databases, the files themselves are not physically located on your media, but rather the information about the files – also known as the file directory. When data is lost, any attempt to find where a particular file is stored on a disk is likely to have failed. Information about the file is scattered throughout the disk (sometimes even over several different drives). In addition, each of these pieces of information can be recorded more than once.
“Every one of us to a greater or lesser extent is aware that the digital technology records data in a binary system, the data is nothing but a string of ones and zeros. That’s not quite true. What for us is a logical zero or one in reality is an electromagnetically structured molecule on the surface of the disk. Magnetic heads record the high or low values of the signal as it hovers over the platter. The information is then processed by an electronic drive system.”
Encryption – “Inception” on a disk
Before the information about our files is stored in the form of a magnetic trace on the disk, it must be encrypted. In a modern hard drive at least five levels of encryption are used. It reminds me about the movie Inception, starring Leonardo DiCaprio, in which the character enters deeper and deeper levels of surreal reality.
In our HDD case, it is absolutely necessary. Why must the information be encrypted so many times before it gets written to a disk? Encoding ensures that the code fragments that are read from the disk will be unique and unambiguous. In addition to that, certain types of encryption are responsible for correcting as many read errors as possible that may appear (and always do) during the recording and subsequent reading of the data, as well as minimising the search time.
Interestingly one of the stages of encryption is the so-called scrambler, which is used to randomise a sequence of the code. In other words, the bit strings are mixed at random (actually pseudo-random, because randomisation is done in line with the adopted algorithm). Why? It turns out that the originally encoded data containing information with common and repetitive structures (patterns), which – if written directly to disk – would create multiple repetitive magnetic strings. These patterns would deceive the reading head searching for a specific string.
However this is not the end of the problem. Even randomisation does not rule out some troublesome magnetic patterns. A good example of such pattern is a series of zeroes. Those zeroes would create a no signal zone which could be read incorrectly. Therefore, it is necessary to introduce the next code (RLL), which will guarantee that the record will not have more zeroes than it should (in 16 bit code – 10 to 15 zeroes).[bctt tweet=”Do you know what the actual ‘data’ on your disk is? It’s a lot more than just a row of 0s and 1s”]
Protection against errors
Encoding is also necessary to ensure maximum protection against errors. Although the magnetic recording is susceptible to numerous errors and damage, the average probability of error that prevents the correct reading is usually less than 10-13, i.e. it’s very rare. In order to achieve such results, another encoding is used. Error Correction Code (ECC) calculates pairs of bits that can be used during decoding to detect and fix errors.
As you can see the relation between our file and the information saved to the hard disk is very complex. Even if we can recover the disk image after the failure of the original media, we will still need to have the right tools and knowledge to fully decode the data. What’s more, if the failed disk has had previous data recovery attempts, it may not be possible to recover the original information. Such data resembles the Jacquard card that was turned into waste paper.
In the next post, I will describe what happens to the encoded information in the file when it is saved on the disk.