Home Products Data Google publishes paper on ”Spanner” database

Google publishes paper on ”Spanner” database

Google

US: Google recently published a paper, describing its storage and computation system that spans all its data centers. The database, named ”Spanner”, is based-on ”TrueTime API”. TrueTime uses GPS antennas and atomic clocks to get Google’s entire network running in lock step. A GPS antenna taps into the Global Position System, which relies on a series of space satellites to track time and location, while an atomic clock uses the properties of individual atoms to maintain correct time.

According to Google, it’s the first database that can quickly store and retrieve information across a worldwide network of data centers while keeping that information “consistent” — meaning all users see the same collection of information at all times — and it’s been driving the company’s ad system and various other web services for years.

Spanner draws on BigTable, but it goes much further. Whereas BigTable is best used to store information across thousands of servers in a single data center, Spanner expands this idea to millions of servers and multiple data centers.

Rather than try to improve the communication between servers, Google spreads clocks across its network. It equips various master servers with GPS antennas or atomic clocks, and — working in tandem with the TrueTime APIs — these time keepers keep the entire network in sync.

By using highly accurate clocks and a very clever time API, Spanner allows server nodes to coordinate without a whole lot of communication.

In short, the TrueTime API taps into those master time keepers, and servers across the network tap into the API. TrueTime then tells the servers how much “uncertainty” there is over the current time, and they can adjust their reads and writes accordingly.
Ordinary servers tap into public atomic clocks in an effort to maintain the correct time. But this method isn’t as accurate as it could be, said Andy Gross, the principal architect of Basho, an outfit that builds an open source database called Riak that runs across thousands of servers. Google has gone a step further, installing its own atomic clocks — and GPS antennas — directly on its machines.

The rub is that you can’t use Spanner unless you add hardware to your servers. In its paper, Google says the atomic clocks aren’t expensive, and 10gen’s Max Schireson can see other outfits installing similar equipment. But both Basho’s Gross and Cloudera’s Zedlewski believe the cost would be prohibitive for general use.

Today, there are many databases designed to store data across thousands of servers. Most were inspired either by Google’s BigTable database or a similar storage system built by Amazon known as Dynamo. They work well enough, but they aren’t designed to juggle information across multiple data centers — at least not in a way that keeps the information consistent at all times.

Source: Wired