Monday 3 September 2007

Software errors: A crash course

Just how much damage can a small software error do?

The costliest software error till date was the explosion of the unmanned Ariane 5 rocket about 37 seconds after lift-off on the morning of June 4, 1996.

This was its maiden flight, and the rocket was carrying 4 uninsured payloads worth about US$370 million. The mission critical Ariane 5 project itself took 10 years to develop at the cost of a whopping US$7 billion.

Apparently, it was blind software reuse that caused the problem, and sticking to the old "if it ain't broke, don't fix it" syndrome. An excerpt from a study on evolutionary design by the US Department of Homeland Security explains it well.

"The Ariane 5’s flight control software reused design specifications and code from its highly successful predecessor, the Ariane 4 launch vehicle. In particular, one of the on-board modules, the Inertial Reference System, performed a data conversion of a 64-bit floating point value related to the horizontal velocity of the rocket and attempted to place the result into a 16-bit signed integer variable. This computation had never caused a problem with the Ariane 4, but the more aggressive flight path and much faster acceleration of the Ariane 5 produced a higher horizontal velocity and a corresponding data value that was too large for the 16-bit signed integer variable, causing an arithmetic overflow. A redundant backup process used the same software and failed in the same manner. The Inertial Reference System then generated some diagnostic output that was incorrectly interpreted as flight control data by other portions of the flight control system. Based on this faulty interpretation, the flight control system took actions that led to the self-destruction of the rocket."

Wired has a list of 'History's worst software bugs' here.