Dodgy Coder: testing

Showing posts with label testing. Show all posts

Tuesday, October 1, 2019

Counter overflows and clock drift bugs in aircraft and missile defense systems

A counter overflow bug happens when an unsigned integer variable storing a counter reaches its maximum value; when its already at the maximum, as soon as one more value gets added to it, the variable will reset back to 0 and continue counting up from there. If the rest of the software is not expecting this counter reset, it can result in system failures. So the longer the system is running for, the nearer the system will get to a counter overflow event.

Clock drift bugs happen when an internal timer/clock gets calculated incorrectly, causing it to slowly drift out of sync with real time. The longer the system is running, the more the clock drifts and the bigger the error gets. When real time clocks are out of sync, all sorts of downstream calculations can become inaccurate.

Both of these types of bugs typically have a workaround that requires the operator to reboot the system every X hours or days.

Both counter overflows and clock drift bugs can occur in production systems when system testing didn't include a "soak test" - where you keep the system running for a very long period, typically measured in days rather than hours. The failure to pick up the bug during system testing means that a production system gets deployed with a bug, and the only workaround becomes a regular reboot of the system.

The kicker - these type of bugs have been happening in safety critical systems for at least the past 28 years

One of the most publicized cases was the failure of a Patriot missile defense system in 1991, in which the system failed to track an incoming Scud missile, due to a drifting and out of sync clock. Tragically, this particular failure resulted in the death of 28 U.S. soldiers based at a U.S. barracks near to Dhahran, Saudi Arabia. The workaround for this bug was to reboot the system every 8 hours, but unfortunately this had not been communicated to the base in time to avoid the disaster.

Another well known case was the counter overflow in the Boeing 787 Dreamliner firmware (2015), which meant the aircraft had to be rebooted once every 248 days to avoid a counter overflow bug.

The most recent case was an internal timer bug in the Airbus A350 firmware (2019), which requires that the aircraft be rebooted once every 149 hours.

Avoiding bugs in safety critical systems

Avionics systems are among the most complex things ever built by man. There's one leading example of a successful avionics project which was built from scratch targeting high quality and zero defects. Its the Boeing 777 - Boeing's first fly-by-wire aircraft.
For the 777, Boeing decided early on to standardize on the Ada programming language, at a time when using C was the norm. Compilers for Ada had been certified as correct, and the language itself included several safety features, such as design by contract and extremely strong typing.

Ada itself came about as a way to standardize all of the programming languages in use by the U.S. Department of Defense - before Ada they were using some 450 different programming languages. Ada was originally designed to be used for embedded and real time systems.

Ronald Ostrowski, Boeing's director of Engineering, claimed that the Boeing 777 was the most tested airplane of its time. For more than 12 months before its maiden flight, Boeing tested the 777's avionics and flight-control systems continuously - 24hrs - 7days - in laboratories simulating millions of flights.

The 777 first entered commercial service with United Airlines on 7th June, 1995. It has since received more orders than any other wide-body airliner. The 777 is one of Boeing's best-selling models; by 2018 it had become the most-produced Boeing wide-body jet, surpassing the legendary Boeing 747.

Further reading - Boeing flies on 99% Ada

Follow @dodgy_coder

Thursday, January 3, 2013

The first right answer is the only answer

Reading through David B. Stewart's paper entitled "Twenty-Five Most Common Mistakes with Real-Time Software Development" (PDF, 131 KB).

There's an interesting nugget of advice at number 8, "The first right answer is the only answer";

Inexperienced programmers are especially susceptible to assuming that the first right answer they obtain is the only answer. Developing software for embedded systems is often frustrating. It could take days to figure out how to set those registers to get the hardware to do what is wanted. At some point, Eureka! It works. Once it works the programmer removes all the debug code, and puts that code into the module for good. Never shall that code ever change again, because it took so long to debug, nobody wants to break it.

Unfortunately, that first success is often not the best answer for the task at hand. It is definitely an important step, because it is much easier to improve a working system, than to get the system to work in the first place. However, improving the answer once the first answer has been achieved seems to rarely be done, especially for parts of the code that seem to work fine. Indirectly, however, a poor design that stays might have a tremendous effect, like using up too much processor time or memory, or creating an anomaly in the timing of the system if it executes at a high priority.

As a general rule of thumb, always come up with at least two designs for anything. Quite often, the best design is in fact a compromise of other designs. If a developer can only come up with a single good design, then other experts should be consulted with to obtain alternate designs.

As David suggests, when dealing with complex, mission critical or concurrent sections of code, that first success is often not the best solution. Weeks later, you might find that its not performing as well as it should be in production and you have to revisit the section of code again looking for a better solution. But the best time to develop a better solution was back when the job was fresh in your mind, when you developed the original solution.

So he suggests adapting this into a new practice:

As a general rule of thumb, always come up with at least two designs for anything.

There's multiple problems with this advice:

YAGNI, you ain't gonna need it. Develop an extra solution only if and when you need it; don't create extra work where it might not be needed.
Premature optimization is the root of all evil. Knowing when you need to optimize a solution further is almost impossible to do at design time, before any benchmarking or code profiling have been done.
Lastly, if your solution passed the unit test, yet fails further down the line (in production for example), then that suggests there was a problem with your unit test, and not with your solution. If required, you should add some specific concurrency and/or performance testing to your unit test. This will mean you can then optimize your code, while maintaining a TDD approach.

I notice the original article is actually from 1999, which I believe is before test driven development came into prominence. I think this particular piece of advice ("come up with two designs for anything") might have been ok for some projects back then, but would now be considered flawed and certainly not advisable.

Follow @dodgy_coder

Subscribe to posts via RSS

Wednesday, June 27, 2012

Being a great tester

I just wanted to post a great little nugget of information from the end of Unit 1 of Software Testing: How to Make Software Fail (CS258) on Udacity, which is being taught by John Regehr and Sean Bennett.

The developer's attitude is "I want the code to succeed".
The tester's attitude is "I want the code to fail".
Although they seem contradictory, understand that you are both aiming to create more robust, higher quality software.
The developer aims to write quality code first time.
The tester aims to identify bugs which can then be fixed, thereby improving the robustness of the code.

Great testers ...

Test creatively - use their knowledge of the software under test to be creative in their test methods and test structure.
Don't ignore small glitches that are easy to overlook or ignore, but may hint at larger problems, which when followed up can often lead to major issues.

Watch the snippet on YouTube: Being a great tester (2 min, 33 sec)

Follow @dodgy_coder

Subscribe to posts via RSS

Sunday, June 3, 2012

Applying hardware testing concepts to software

I've recently been testing a desktop application I built with WPF and .NET 4 which collects data from the serial port and draws various charts on screen in real time. The volume of data isn't immense, one data frame of approx 200 bytes arriving every 200 milliseconds, but there is additional processing happening in the background. One of the steps being performed is regression analysis on various data points to construct curves of best fit.

In production this process will be run for likely not more than 30 minutes, but during testing I wanted to test the limits of the application by running the process for 2 hours, to be sure there was no memory leaks or performance issues. I was surprised when after just 1 hour the UI just became totally unresponsive, dramatically quickly.

I did find the issue and made the fix, but this got me thinking about where else this sort of issue could occur in software - because all I'd done was apply a simple form of software performance testing, known as endurance testing, which just involved running the software for significantly longer than normal. The endurance test has something in common with its hardware equivalent, soak testing.

The bathtub curve. The name derives from the cross-sectional shape of a bathtub. Image source: wikipedia

This leads to another hardware concept which I think can be partially applied to software - the bathtub curve, as shown above. The bathtub curve is used in reliability engineering to describe a particular form of the hazard function which comprises three parts.

Applied to hardware, the bathtub curve means:

The first part is a decreasing failure rate, known as early failures. Burn in testing aims to detect (and discard) products which fail at this stage. If the burn-in period is made sufficiently long (and artificially stressful), the system can then be trusted to be mostly free of further early failures once the burn-in test is complete.
The second part is a constant failure rate, known as random failures. These are like the "background" level of failures, which can usually never be totally eliminated from the production process. QC managers will often aim for a low level of failure here, via various quality control measures (such as TQM or Six Sigma).
The third part is an increasing failure rate, known as wear-out failures. These can be detected via soak testing. In electronics, soak testing involves testing a system up to or above its maximum ratings for a long period of time. Often, a soak test can continue for months, while also applying additional stresses like extreme temperatures and/or pressures, depending on the intended environment.

Applied to software, the bathtub curve might show bug count on the y-axis and total application run-time on the x-axis. Then, the three parts could mean:

First part (early failures): In software, this could apply to bugs which break the software's functional requirements or specifications, and might be tested with various functional testing methods such as unit testing or integration testing. These are normally found early in the test cycle.
Second part (random failures): Might apply to random bugs which are hard to detect, occur in specific conditions, and are outside the existing test coverage. An example might be a heisenbug.
Third part (wear-out failures): Bugs which appear due to memory leaks or performance degredation. These can be tested with endurance testing. During endurance tests, memory utilization is monitored to detect potential memory leaks, and throughput/response times are monitored to detect performance degradation.

Further Reading

Since writing the above I've found some other articles (listed below) about this concept which assume time on the bathtub curve to mean the software development life cycle's time, as opposed to my idea of being the individual run-time of a real-time application. I've listed them here for reference:

The bathtub curve for software reliability
Does the bathtub curve apply to modern software?
The Software Bathtub Curve
Improving Software Reliability using Software Engineering Approach - A Review

Follow @dodgy_coder

Subscribe to posts via RSS

+ - - - - - - - - - - - - - - - - - - - - - - - +
| Harris Walker Real Estate, Perth, WA, AUS |
| Specialists in residential housing sales and |
| property management in Perth, Australia. |
+ - - - - - - - - - - - - - - - - - - - - - - - +