Friday, June 10, 2011

Google infrastructure is old and not up to the mark - Edge for MS and Yahoo?

A former Google engineer who worked on a library at the heart of "nearly every Java server at Google" has dubbed the company's much-ballyhooed backend software "well and truly obsolete".
In a blog post published earlier this week, Dhanji R. Prasanna announced that he had resigned from the company, and though he praised Google in many ways, he made a point of saying that the company's famously distributed back-end is behind the times.

"Here is something you may have heard but never quite believed before: Google's vaunted scalable software infrastructure is obsolete," he wrote. "Don't get me wrong, their hardware and datacenters are the best in the world, and as far as I know, nobody is close to matching it. But the software stack on top of it is 10 years old, aging and designed for building search engines and crawlers. And it is well and truly obsolete."
As a member of the Google Wave team, Prasanna helped build the search and indexing pipelines for the ill-fated effort to reinvent communication on the web, but he also worked on Guice, a library "at the heart of nearly every single Java server at Google".
Prasanna did not immediately respond to a request to discuss his post. But he goes on to describe Google's Protocol Buffers, BigTable distributed database, and MapReduce distributed number-crunching platform as "ancient, creaking dinosaurs", compared with outside open source projects like MessagePack, JSON, and Hadoop, which is based on the ideas behind Google's MapReduce and distributed file system.
Google has previously acknowledged some short comings with the likes of MapReduce. But Prasanna went so far that newer Google infrastructure projects such as Megastore as well as developer tools such as Google Web Toolkit and Closure were "sluggish, overengineered Leviathans" compared to projects like MongoDB and jQuery. He complained that Google's new projects are "designed by engineers in a vacuum, rather than by developers who have need of tools."
Google is secretive about its back-end software infrastructure. It has published research papers on platforms such as the Google File System, Google MapReduce, and BigTable, but it otherwise says very little about how these platforms are used within the company. And, yes, the platforms are closed source.
On the public mailing list for Google App Engine – an online service that lets you run your own applications atop Google's infrastructure – Google developer programs engineer Ikai Lan took issue with at least some of Prasanna's post.
"The bit about Hadoop, for instance, raised a lot of eyebrows amongst Googlers who have extensive use of both (new hires with a few years Hadoop experience)," he said. "I'd also disagree that we are not rebuilding things. In fact, Google has the opposite problem of other technology companies: instead of 'don't touch it, it works!', we err on the side of 'it can be better, we should improve it - mid flight!'"
Prasanna did not actually say that Google has failed to rebuild its platforms. At one point, he specifically mentioned Megastore, a real-time, high-replication layer built atop BigTable. But he did imply that efforts to rebuild at Google are slow.
"In the short time I've been outside Google I've created entire apps in Java in the space of a single workday," he said. "I've gotten prototypes off the ground, shown it to people, or deployed them with hardly any barriers." This, however, would seem to describe a switch from any large corporation.


Last year, in an interview with the Association for Computer Machinery (ACM), a Google engineer acknowledged that GFS was unsuited for low-latency, real-time applications like YouTube and Gmail, and he said that Google was working to build a new version of the file system.
Googler Matt Cutts later told The Register that this "GFS 2" was part of the company's new search infrastructure codenamed Caffeine.
Several months later, at the launch of Google's Instant search interface, Eisar Lipkovitz, a senior director of engineering at the company, told us that within the company, GFS 2 is known as Colossus and that it moves the company's search indexing system off of MapReduce and onto BigTable.
A few weeks later, Google published a paper on Colossus and a new distributed data processing system known as Percolator. But according to Lipkovitz, these platforms were built specifically for search and may or may not be applied to other Google services.
For year, database guru Mike Stonebraker has criticized MapReduce and GFS, and Lipkovitz told us that Google has made "similar observations". MapReduce, he told us, is not suited to calculations that need to occur in near realtime.
Google has also said that the single-master design of GFS is a major limitation. "A single point of failure may not have been a disaster for batch-oriented applications, but it was certainly unacceptable for latency-sensitive applications, such as video serving," said Google's Sean Quinlan in his interview with the ACM. Colossus does not have this limitation.
At the moment, the open source version of Hadoop is burdened with single points of failure. But Facebook is running a version that eliminates these limitations.
In a recent conversation with The Register, Dwight Merriman, the CEO of 10gen, the company that founded the open source MongoDB distributed database, argued that MongoDB is superior to BigTable because it uses a document-oriented data model rather than tabular model.
"Today, 95 per cent of the code we're writing is in an object-oriented language," he said. "We're to the point where object-oriented programming is ubiquitous enough, having a database that works well with that sort of thing is important."
He said that Megastore is an improvement on BigTable, but that it doesn't change the database's fundamental tabular setup, and he added that most of the improvements provided by Megastore are already a part of MongoDB.

No comments: