On Zero-Defect Software

I have recently heard companies marketing their software as having zero defects. I find this very hard to believe. Even Praxis doesn't claim no bugs. First I want to talk about why I don't buy this. Then I'll talk about what it really means. Last I'll discuss ways to cope with bugs.

Unless you are making use of both extensive unit and functional tests AND formal methods, this claim is little more than arrogance. And I quote from the Ten Commandments of Formal Methods.

One can never have absolute correctness, and to suggest that one can is ludicrous.

Everyone likes to believe they've written flawless software. But even if it works for you, your manager, and the client, that doesn't mean it's free of defects. It could still fail tomorrow. Many bugs will never show up unless the code is ported to another machine or the environment changes significantly.

The number of possible states any program can take on becomes astronomically large very quickly. Think thousands of if statements, variables, loops, threads, sockets, files, and user input. And it's very easy to make a mistake, even if you are carefully every step of the way, which most people are not.

Large, complex systems are written by many people. There are always assumptions that are undocumented or misunderstood by the next person. Assumptions may be made by the original implementor without him even having realized it.

Software runs on software which runs on hardware. Even if your code really is perfect, your software is not. The compiler you used will have bugs. The OS your software runs on will have bugs. The libraries your code calls will have bugs, which means your software will have bugs. And then, of course, there's hardware which can also have bugs. Maybe a slightly more appropriate name would be zero-defect source code.

Suppose you have done very careful design, peer review, and testing at every level from the smallest functions to the highest level functional tests. You will have LESS bugs. A lot less. But if you really want to get rid of defects, you'll want to use formal methods. This is a process whereby you try to prove the correctness of your code. But I don't buy that it will reduce defects to zero for two reasons.

1) I got a brief introduction to formal methods in one of my software engineering classes. The idea is you try to prove things like the value of a variable is always valid within a for loop. You careful document pre and post conditions, etc. But techniques of this sort will only work on a small unit of code. You can verify individual functions this way. But try proving the correctness of a multi-threaded server with IO coming from the hard disks and over various ports. There are an enormous number of simultaneously interacting parts. At the very least, you're going to need some very powerful math such as dynamic logic. Good luck!

A simple for loop in C is one thing. But most modern languages are far more complicated and farther from the machine level. For example, if you're using C++, destructor code executes without you explicitly telling it to do so. The operator overloading rules and inheritance rules can be quite complex. There is a lot of room to make a proof mistake simply due to lack of visibility of what the code is doing or lack of understanding of the compiler. Do you have the language designer or the compiler writer working on your formal tests?

2) Most important math proofs are relatively short, say between a few lines and 100 pages in the more extreme cases. These proofs are peer reviewed by many very smart people over a long period of time. And in some sense, math proofs are simpler than code proofs because they're like one long sequential function broken into many very simple steps. And still, mistakes are made. Is there any reason why if I make a silly mistake in a for loop, I'm not just as likely to make a silly mistake proving the correctness of that loop?

Now, having said that, I did some googling for zero-defect software, and it seems that zero-defect does not actually mean zero-defect. Huh? This article and this one explain. I was not able to find much more information or even a little about the origins of the phrase.

But one must wonder about the very poorly chosen words. It seems too intentional to be a mistake. It sounds like a marketing gimmick, that is, a lie. I'm sure you're familiar with the fact that the size of a CRT isn't actually its true size. The marketing term for megabyte and gigabyte are different from the precise mathematical definition. (This has caused me confusion and headaches.) Just about any technology you buy will be sold to you in some form of a lie if marketing has done their job well. I've even heard of negative design aspects being sold as features. How would you know any better? You're just the ignorant consumer. Even the clock speed hype of previous generations of computer chips was an implicit lie. Notice that aspect isn't hyped up anymore.

I don't know about you. But I would not be very happy if I was told my software had zero defects and then something went wrong. Even if I only heard stories about problems, I would be upset.

Most software is not mission-critical. Defects are acceptable. If you are part of a team responsible for the design and implementation of code used in a nuclear power plant or a control system where lives are at stake, I hope everyone on that team has been trained to use formal methods, testing methodologies, testing tools, and whatever else is available. I hope the code also goes through a long peer review process.

In most other cases, the problem with buggy code is that it effects millions, and the model of development is faulty. Open source solves many of the issues. If the code is used by many people, it will receive a very thorough review process. So, the more users, the less bugs. It's possible for users to fix the bugs themselves. You don't have to wait three years for a new release to correct a bug. Releases are immediate (to some degree). Please read The Cathedral & the Bazaar.

Another kind of software I consider distinct from the previous two kinds are custom, in-house software and web services. Open sourcing the code probably won't do much good since the software is only of interest to you. In this case, some level of QA would be necessary, but not as much as in the mission-critical case.