The downtime experienced by parts of Amazon Web Service on Thursday 21 April 2011, has predictably raised questions about the efficacy of cloud computing. Big names such as Foursquare, Reddit and Quora were affected as were, ironically, monitoring systems such as Hoptoad. Affected systems reported non-availability starting early in the morning (Pacific Daylight Time) and continued for significant parts of the day.
The problem appears to have been localised to only one region (US East) based in North Virginia, reported systems affected were:
- Amazon CloudWatch
- Amazon Elastic Compute Cloud
- Amazon Elastic MapReduce
- Amazon Relational Database Service
- AWS CloudFormation
- AWS Elastic Beanstalk
The service level agreement for Amazon EC2 guarantees a 99.95 per cent availability for each region. Each region is divided into into multiple availability zones and Amazon had previously advised launching instances in separate zones to protect against failure in a single zone. The outage appears to have impacted many zones in the one region.
The outage may have shaken companies belief in Amazon’s availability zones concept but probably not Amazon itself; a fix may be to deploy into multiple regions until the causes and corrections of the outage become clearer.