If one trend has captured the hearts and minds of internet executives, entrepreneurs and developers alike over the past several years, it's cloud computing. And when it comes to market leaders, at the front of the pack is Amazon.
Its suite of offerings, known as Amazon Web Services (AWS), has attracted some of the most prominent consumer internet services, including Twitter, as well as a slew of up-and-coming startups looking for the ability to scale in their early days without Facebook-like funding. Through its cloud, companies can do everything from run resource-intensive applications to send high volumes of email.
On paper, the potential benefits of cloud computing are undeniable, but the cloud, and its most visible providers, aren't perfect.
Last week, Amazon's EC2 service had a major meltdown which didn't just cripple EC2 users, but brought some of them to their knees. Sites taken down for the count included Foursquare, Quora, Reddit and Hootsuite (and its owl.ly URL shortener).
Perhaps most embarassing, the homepage of Ruby cloud platform provider Heroku, which was recently acquired by Salesforce.com for more than $200m, served up nothing more than this ugly error message:
In the cloud, when it rains it pours.
So has Amazon's torrential rain of fail exposed the cloud as an emperor without clothes? Perhaps surprisingly, the answer is 'not really.' What it does expose: poor infrastructure and application architecture.
Noting that Amazon EC2 is available in multiple locations, and that the major failure was at one of them, Bob Warfield, writing on behalf of the Enterprise CIO Forum and HP, pointed out:
Most SaaS companies have to get huge before they can afford multiple physical data centers if they own the data centers. But if you’re using a Cloud that offers multiple physical locations, you have the ability to have the extra security of multiple physical data centers very cheaply.
The trick is, you have to make use of it, but it’s just software. A service like Heroku could’ve decided to spread the applications it’s hosting evenly over the two regions or gone even further afield to offshore regions.
In other words, the companies that were affected by the problems in Amazon's Northern Virginia data center put their applications (or critical pieces of their applications) into the cloud, but never realized (or ignored) the fact that they still had done nothing to prevent the creation of a single point of failure. That's a huge mistake, but it's also a mistake you wouldn't expect a seasoned CTO to make.
The lesson here is clear: the cloud, with all of its virtues, still requires thoughtful, informed and strategic technical leadership if it's to be used effectively. The problem with the cloud then clearly isn't that systems and networks will go down (they always will), but rather that there may be far too much of a "It's in the cloud, therefore it's okay" mentality. For the sake of the companies using the cloud, and the cloud itself, that has to change.
Part of effecting that change may entail giving a little bit less responsibility to developers. Many companies that do everything (or almost everything) in the cloud rely on developers to manage a huge portion of their technology stack.
In reality, this is usually a bad idea, as different layers of the stack require different knowledge and skill sets. Most developers, for obvious reasons, don't have mastery of all the layers, so Amazon's blowup reminds us that letting developers architect everything is a recipe for disaster.