The Coming Battle Over Grid Computing and Internet Services

google internet yahoo microsoft hardware grid computing

A comment I left on Wes Maldonado’s blog has started a [conversation about grid computing](http://www.brokenbuild.com/blog/2006/06/02/why-is-the- digipede-network-good-for-windows-environments/). He posted on Digipede, a Windows centric way to do distributed computing and I responded that it would be “nice” not to be forced to do this type of work on an operating system that required a GUI. That set off another post about cost effectiveness and using existing infrastructure, points I don’t disagree with. In an IT environment with a lot of computers running Windows and a problem that allows you to do distributed algorithms easily, Digipede seems pretty exciting. Ever since the original Distributed.net, I’d wondered if a company would bring a product like this to market. It is something that I would have a lot of fun playing with. That said, my point on electricity and running super computer clusters on Windows still stands. My comment wasn’t intended to disparage Digipede so much as point out the problem that Microsoft is going to have competing with companies like Google and Yahoo for the next generation of Internet Services. Some have estimated that Google’s data centers have well over 100,000 COTS PC’s setup in a distributed grid. Google is running Linux, which can run headless without a video card, or the need to install any GUI package. Linux has been “designed” to be completely scriptable from a command line interface. Windows however, appears to have a tight integration between the GUI layers and the NT kernel. As far as I know, it is impossible to install Windows on a machine without a video card. Obviously, the GUI layer will be paged to disk on all these machines, but the cost of a video card multiplied by several hundred thousand is needless. The other competitive advantage Google and Yahoo have is the scriptability of Linux and FreeBSD. While PowerShell is a step forward for Microsoft, my view is that the UNIX environment wins on system administration scriptability. The key to building super computer clusters is easy system administration. Perhaps Microsoft can leverage their existing infrastructure and prove that GUI tools can do everything the UNIX ones can and more, but they are starting with less experience. Google has already proven they can do it effectively. My back of the envelope estimation is that Google Linux sysadmins are each responsible for between 1,000 and 2,000 servers. I don’t see a Microsoft solution for that yet, and I don’t think the Digipede product is intended to compete in that type of environment. Digipede also probably isn’t going to compete in the National Labs super computer arena either (at least yet). The second problem is any kind of parallel programming is really hard. Even threads prove a huge challenge within a single application. While clever, I don’t think that [Map/Reduce](http://labs.google.com/papers /mapreduce-osdi04.pdf) is a magic bullet either. A lot of algorithms simply don’t scale linearly with computing power, so adding more hardware just burns a hole in your wallet and in your data center’s air conditioning. All that said, there are plenty of places that products like Digipede would fit perfectly. Mainly, I am interested to see how this all shakes out, as we see Microsoft, Google and Yahoo building their data centers close to hydro- electric power to cut costs. Sun is also a dark horse in this whole race, building out their grid infrastructure and custom chips that suck less juice. I can’t wait to see more!