The Black Swan In The Enterprise

Matthew Heusser
December 29, 2012

Amazon’s Service level agreement for it’s cloud computer service is 99.95% uptime for a service region (that’s a data center) during a one year period.

That means that the site will be down five one-hundredth’s of a percent of the time. That means that EC2 should be down for less than four and a half hours per year. Being tester, I set out to evaluate this claim, starting with a google search for “Amazon Ec2 Down 2012.” I found a half-dozen issues in the twenty-minute to half hour range plus a Seven Hour Outage in April.

Instead of tearing into Amazon, I’d like to be fair. They did a lot right. They have a great deal of technical depth, they have multiple redundant switches, internet connections, backup power, and servers. They have technical depth, 24/7 monitoring, and keep most outages to twenty minutes by acting swiftly. When they have those large outages, they are things no one could have predicted – and the folks at Amazon move to prevent them from ever happening again.

The problem is that something else will happen next time.

So if Amazon can’t prevent these sort of “unpredictable behaviors”, what makes us think that we can in traditional organizations? For that matter, if Amazon can’t meet it’s modest 99.95% claims, how can the folks at Microsoft promise to go beyond ‘five nines’ when five nines, or 99.999%, only allows for five minutes of downtime per year?

The first part of the probem is the Turkey problem; the second is a naive definition of the system.

The Turkey Problem

Consider the humble turkey, living on the farm. Each day, the farmer visits the turkey. He protects the turkey; he feeds the turkey, provides it shelter, and keeps it safe from natural enemies. Each day, the turkey has more reason to believe that tomorrow will be a nice, safe day.

Right up until Thanksgiving, when the farmer brings an axe.

Service Level Agreements have to be based on something. Like the Turkey, they are often based on past experience. Past experience has a problem; it can’t predict future events.

So when Amazon has two redundant power systems, they both can go down if a car slams into them. As systems become more complex the failure become more nuanced. In the seven-hour car-crash outage, there was a cutover to internal power, but the automated system made a mistake, and thought the power fault was inside the data center, so it shut down the generators for safety reasons.

The turkey example is not my idea; it comes from a book called “The Black Swan: The Impact of the Highly Improbable” by Nassim Nicholas Talib.

Taib calls these ‘Black Swans’, after the famous logic claim that when every swan that has been found it white, for a million swams, does not disprove that a black swan could exist. (It turns out that they do. English scientist John Latham found black swans in Australia in 1790)

So if a black swan can force Amazon to miss it’s modest Service Level Agreements, what can we do about it in our organizations?

I’ll talk about that more in my next post — but first we’ll have a guest post by Pete Walen, on Yugo Testing.

Recent articles

Automai Testing – FSLogix vs. Citrix Profile Management

Automai Watcher Connected Securely with Citrix NetScaler

Citrix Deployment Settings for MSEdge and Startup Boost Impacts

Working with Microsoft 365 Apps in Performance Testing Environments

Intelligent Session Recording for Citrix and Calabrio

Streamline your recordings while protecting the security of sensitive data.

Request demo

Subscribe to our newsletter

Get the latest from the world of automations.

Performance and load testing

Monitoring application performance

Functional testing

Robotic Process Automation

Automai robotic automation platform

Citrix performance testing

AVD performance testing

Citrix continuous testing

On-demand testing

Citrix application monitoring

Robotic Process Automation

Downloads

Customer Success Stories

Recorded webinars

Knowledge Base

Community

Video tutorials

Automations

Events

Automai blog

Who we are

Careers

Automai partners

News

(833) AUTOMAI (288-6624)

The Black Swan In The Enterprise

Recent articles

Intelligent Session Recording for Citrix and Calabrio

Subscribe to our newsletter

Products

Solutions

Resources

Company