Google opens peephole on mystery data center practices
Google has released a video showing at least some of the security and data protection techniques used in its worldwide network of data centers.
The video plays like a souped-up advertisement for the search giant and its Google Apps suite of online business applications - there are more than a few visual allusions to the Tom Cruise vehicle, Mission Impossible - and Google has previously discussed its security practices in a Google Apps white paper (PDF). But the video does provide a small glimpse into the operation of the nearly 40 server facilities Google has erected over the past several years. It focuses on a Google data center in Moncks Corner, South Carolina, but also gives a nod to a new facility in Hamina, Finland.
In additional to protecting the grounds with around-the-clock security personnel, cameras, and fences, Google controls access to facilities, the video says, using badges encoded with a lenticular printing mechanism designed to prevent forgeries. Some facilities also use iris scanners and other biometric devices. Once employees are inside the facility, there's a second line of badge readers and in some cases biometric devices restricting access to the actual data center floor.
Only certain Google employees are allowed inside the data center, and as Google is fond of pointing out, all data is sharded and spread across myriad machines and facilities, so if an unauthorized person did gain access to a hard drive, the data could not be read by the human eye.
Nonetheless, when a hard drive fails or no longer exhibits prime performance and must be disposed of, Google uses multiple techniques to ensure that the data can't be read at all. It overwrites the data, and then it uses a complete disk read to verify that all data has been removed. When disk reaches the end of its life, Google will then destroy it. This involves pushing a steel piston through the center of the drive and then shredding it into relatively small pieces. The remains of the drives are then sent to recycling centers.
The Crusher: Google gives hard drives the piston treatment
The video also alludes to Google's ability to shift data access to a new data center in the event of fire or other major failure. The company says that this process is "seamless" and "automatic", but no details are provided. This is apparently a reference to a Google-designed platform known as Spanner, which was described in a public presentation by Google fellow Jeff Dean in 2009.
Google still won't confirm the use of Spanner, but a company spokeswoman did tell us that data access shifts across "almost all" of its data centers.
According to a PowerPoint file that accompanied Dean's presentation, Spanner handles automated allocation of resources across Google's "entire fleet of machines," moving and replicating loads between its mega-data centers based on "constraints and usage patterns." This includes constraints related to bandwidth, packet loss, power, resources, and "failure modes".
Earlier that year, Google senior manager of engineering and architecture Vijay Gill appeared to describe Spanner when discussing a Google data center that had been built without chillers. "Sometimes there's a temperature excursion," Gill said, "and you might want to do a quick load-shedding - a quick load-shedding to prevent a temperature excursion because, hey, you have a data center with no chillers. You want to move some load off. You want to cut some CPUs and some of the processes in RAM."
He indicated Google could do this automatically and near-instantly, meaning without human intervention. "How do you manage the system and optimize it on a global level? That is the interesting part," he said. "What we've got here [with Google] is massive - like hundreds of thousands of variable linear programming problems that need to run in quasi-real-time. When the temperature starts to excurse in a data center, you don't have the luxury to sitting around for a half an hour . You have on the order of seconds."
Apparently, this chillerless data center is the one Google's operates in Saint-Ghislain, Belgium.
Dean describes Spanner as a "single global namespace," and names are completely independent of the location of the data. The design is similar to BigTable, Google's distributed database platform, but it organizes data in hierarchical directories rather than rows. Dean also indicates that Google splits its distributed infrastructure into various subsections that provide redundancy by operating independently of each other. The aim, he said, is to provide access to data in less than 50 milliseconds, 99 per cent of the time.
In the video released today, Google goes on to say that its facilities are closely monitored not only with traditional video cameras, but also with video-analytics software designed to automatically detect anomalies in the video feeds. Some facilities are also equipped with thermal imaging cameras that work to detect intruders.
For years, Google provided no information about the operation of data centers. But in the spring of 2009, it released a video that showed the inside of its first "containerized" data center, and just before this, it held a small event where it detailed at least some of its custom server and data-center designs. On Friday, when we asked Google about Spanner and the Linux distro used in its data center, it declined to provide specifics.