How to Migrate a Petabyte of Data in the Cloud

Thursday, January 26, 2017 by Michael Nuncic

The cloud seems to be on everyone's mind these days, but what does this "cloudy" concept really mean?  The term comes from the IT world, where, in the case of network diagrams, areas not specified in more detail are displayed with a stylized cloud.  A lot of data that we use today is stored in the cloud, where the physical location of servers and data centers is only known by the service provider.  The software responsible for the storage decides independently which hardware it uses and where it stores the data.  Frequently used data is typically stored on fast SSDs, whereas rarely used data or backups, for example, are stored on slower, more cost-effective HDDs. For some purposes, it’s also feasible to place the metadata of a file on an SSD, while the associated high-resolution graphics or videos find their place on a slower storage medium.  Therefore, it may not be easy to precisely locate a file.  You have to poke in the "mist" of the cloud to find it.

Benefits of Cloud Storage

There are a lot of good reasons that speak for cloud storage.  Those who rely on the cloud do not have to have storage systems, media, or administrators.  Another advantage is the possibility to get additional storage space quickly and easily.  In addition, security is also provided.  Up-to-date backups ensure fast data recovery in the event of a problem.  The question of to what extent the provider provides these services should be clarified in any case before a contract is signed.

An important point, apart from the trustworthiness of the provider, whose server farms should be in located in a legally secure country, is data traffic.  Normally, data runs over the Internet, which is why a stable and fast connection is essential.  Security should also be emphasized here.  An end-to-end encryption for uploads and downloads should be just as obvious as high-quality encryption of the data on the provider's servers.

How do you get data from your data center to the future location in the cloud?

The first possibility, which has already been talked about, is via Internet.  For manageable amounts of data this is a viable option, but it quickly reaches its limits.  A terabyte of data transmitted over a normal T1 connection, at almost 1.5Mbps, will take a good two and a half months to arrive at the destination.  No one can be satisfied with this, so the cloud providers offer different solutions. Special tools and methods, such as data compression, speed up the transport of data and reduces data volume.

The way the data packets move through the Internet is without rules, always trying to take about less busy connections, bouncing from one location to another before arriving at its final destination.  A more direct network connection significantly shortens these paths.  The bundling of Internet access or direct connections is also offered.

But what should you do if a steady line is not enough?  If petabytes of data need to be transferred, even with the fastest connection, it will seemingly take an endless amount of time and costs, so intelligent solutions are needed for extremely large amounts of data.

Amazon, for example, offers its customers a pick-up service for data called Snowmobile.  In this case, a high-performance storage server, which is installed in a container and transported by means of a semi truck, is sent directly to the customer's location.  The 100 PB storage is connected to the corporate network using a high-speed line.  In the best possible case, the entire storage of the snowmobile can be filled in about ten days.  The fully loaded storage is collected and then taken to the next Amazon data center, where the data is uploaded to the cloud.

Lost data in the cloud? Contact Ontrack for expert data recovery service.