More data is transferred and generated globally than ever before. The analysts from IDC are expecting that by 2025 the global data sphere will grow to 163 zettabytes. That is an increase of more than 1000% from the 16.1 ZB of data in 2016. So how are companies storing all of this data? And which storage system is best for storing Big Data: File, Block or Object?
Why has there been an increase in data?
More sources and devices are generating data than before – embedded systems and a huge range of devices are gathering data and transferring it to Big Data applications and solutions to make a real-time analysis. The ongoing trend of using mobile devices, social media platforms, and online shopping is producing lots of data every day.
Companies are also undergoing a transformation to delivering data to their customers to meet their increasing demand for news and real-time data never seen before.
According to a new Gartner forecast, by the year 2020, more than half of major business processes and systems will incorporate some element of IoT (Internet of things) in their organisation. And with that, the amount of data generated, transferred and analysed by Big Data applications (stored either on premises or off premises) will grow enormously.
How to handle the increase?
Due to the increase in Big Data, the demand for storage solutions that can handle and archive more digital content than ever before is increasing drastically.
However, it is not only bigger storage devices that are needed. The increase in data storage also means a file system that can handle the outcome of Big Data is needed – to store the analysis of the data. Additionally, much of this storage demand will be handled internally as well as in the Cloud using services like Amazon´s S3 or Microsoft Azure by enterprises.
Old storage methods such as file storage and block storage no longer provide what is needed in terms of storage for Big Data. The new solution is Object storage (also named Object-based storage).
The differences between File, Block and Object storage
File and Block storage are methods to store data on Network Attached Storage (NAS) and Storage Area Network (SAN) systems.
On a NAS system, it exposes its storage as a network file system. When devices are attached to a NAS (Network Attached Storage) system a mountable file system is displayed and users can access their files with proper access rights. Because of that, a NAS system has to manage user privileges, file locking, and other security measures so several users can access files. The access to the NAS is handled via NFS and SMB/CIFS protocols. As with any server or storage solution, a file system is responsible for positioning the files in the NAS. This works very well for hundreds of thousands or even millions of files, but not for billions.
Block storage works in a similar way, but unlike file storage – where the data is managed on the file level – data is stored in data blocks. Several blocks (for example in a SAN system) build a file. A block consists of an address and the SAN application gets the block if it makes an SCSI-Request to this address. The storage application decides then where the data blocks are stored inside the system and on what specific disk or storage medium. How the blocks are combined and how they can be accesses are also decided by the storage application.
Blocks in a SAN do not have metadata that is related to the storage system or application. In other words: Blocks are data segments without description, association or an owner. Everything is handled and controlled by the SAN software. Because of this SAN and Block storage is often used for performance hungry applications, for example, databased or transactional sites.
When File and Block storage has worked well for years, why is there a need for a new solution?
The data that is being stored is changed. A lot of what is being produced now is unstructured data – content or material that will never be changed again. This is where Object storage comes into play.
Object storage bundles data along with metadata tags and a unique identifier – applications identify the object via this ID. The metadata is customisable, which means a lot more identifying information for each piece of data can be inputted. Each object is stored in a flat address space, making them much easier to locate and retrieve the data.
The many objects inside an object storage system are stored all over the given storage disks. In its pure form object storage can only save one version of a file (object). If a user makes a change another version of the same file is stored as a new object. Due to this, an object storage is a perfect for a backup or archive solution, for example, online video streaming sites.
The main difference between Object storage and other systems is that Object storage has no limit as to how much data can be stored. Additionally, all objects are managed via the application itself. This means that no real file system is needed, as the layer is obsolete. So, when an application sends a storage inquiry to the solution regarding where to store the object, the object is given an address inside the huge storage space and saved there by the application itself.
Due to this much simpler management of data, Object storage solutions can be scaled up much easier than File storage or Block storage-based systems. This is a huge benefit in times of exponential data growth.
Object storage is a perfect solution for huge amounts of data and is therefore regularly used by big cloud service providers like Amazon, Google, and others. But what about data protection and data recovery? The answers to these will be disclosed in the second part of this article…
Picture copyright: Gabi Schoenemann / pixelio.de