Foxes in the Henhouse: Testing by Programmers

Friday, 25 January 2019

Why and how does software get broken? It’s a mistake to conclude simply that the programmers break it by writing “bad code”. They may indeed do so, but it’s generally true that software designers and business analysts contribute a much larger share of bugs than the coders do, while management decisions about who will create project work products, and particularly under what conditions, ultimately underlie most bugs.

Breaking the Software

One aim of being a software testing company that offers training courses is to help squash the idea that the tester’s job is to “break the software”. Test cases that find bugs don’t leave the software any more broken after they’re run, than it was before they were run. Our job isn’t to break the software, but to find out where it was already broken when it was delivered to us. Thus, the title of James Whittaker’s otherwise excellent book on tester “attacks”, How to Break Software, is wrong.

The real subject of this blog is who should have responsibility for finding and removing the bugs. The obvious answer is, “the testers”; but is this completely true?

Quality Control

Dynamic testing, in which we run test cases against executable software, is primarily a Quality Control (QC) activity. The role of QC is to find out whether the quality of products (and the processes that create or modify them) is adequately under control, or whether the process is injecting unacceptably high numbers of defects. The issue here is: who should be applying Quality Control, and when?

In traditional Waterfall projects, the development team may spend, say, 11 months exercising very little real control over the quality of work, with the testers given one month at the project end to try to find as many of the resulting bugs as possible. This, says Construx founder Steve McConnell, is the single major reason why a 12-month Waterfall project is likely to run for 26 months, according to industry statistics.

Now, it’s possible to run a Waterfall project that delivers software on-time, within budget, and fully-featured. The secret is to apply rigorous quality control at each stage of development, rather than trying to “crush quality in” at the end. This implies large amounts of static testing, using formal reviews (inspections). But because few organisations have known how to achieve this, Waterfall has got the bad rep for hugely expensive overruns that McConnell pointed to. The V-Model was essentially created to overcome this, but requires rigorous discipline of its own in order to be successful.

Waterfall, V-Model, Iterative-Incremental, Agile: Whatever your flavour of development project, the secret to success was pointed out many decades ago by W. (William) Edwards Deming, who developed “Fourteen points for the transformation of management”. The third point was the need to “Cease dependence on end-point testing to achieve quality. Eliminate the need for testing on a mass basis by building quality into the product in the first place” (lightly adapting his words to the software industry).

Foxes in the Henhouse

The idea for the title of this piece came from a blog by Joel Montvelisky on “Teaching programmers to test”. As Montvelisky pointed out, “testing” is very much something that developers are required to do in Agile projects. But while it's easy to agree with much of what he wrote, it's harder to agree with his First Principle of Test Planning for Programmers: “Don't test your own code!”

Holders of the ISTQB-BCS Foundation Certificate may feel they understand this. Developers may make poor testers of their own work, the Syllabus implies, because of “author bias”: “Development staff may participate in testing,” it says, “especially at the lower levels, but their lack of objectivity often limits their effectiveness”.

But “lack of objectivity” isn’t really the main problem. To build quality into a product, we must exercise quality control over its creation. And, said Deming,“Nobody is better placed to control the quality of the work being done than the person doing the work.”

This would mean that programmers would be in charge of the quality of their own work. As Montvelisky wrote, sounds a little like setting foxes to guard the chicken coop, right? Yet at the same time, it has a right sort of sound to it: shouldn’t we all be responsible for the quality of our own work? The question is, what does that take?

Equipping for Quality Control

Deming provided the answer. To be able to control the quality of their own work, practitioners must be Equipped with the “three Es”:

Enthusiasm for doing quality work and creating quality products, including an understanding of what “quality” means in a given context
Enablement with the skills and tools to measure the quality of their own work
Empowerment to address the real causes of quality issues—which, as I’ve pointed out, don’t for the most part lie with the coders themselves

The ISTQB Foundation Syllabus is aimed at developers just as much as testers, yet we rarely see developers on any testing course. It’s as if they (or their managers) believe that testing isn’t part of their job spec. Or it’s as if their managers (and perhaps they themselves) believe that they already know how to do it. But a self-taught programmer who has bothered to learn about testing is a rare fox indeed.

What about university graduates? We've had opportunities to ask many CompSci or Software Engineering graduates how much they learnt about testing in their three to four years of study. With only two exceptions (they each did an elective), the answer is always the same: virtually nothing. Perhaps a half-day. Enough to get their programming assignments debugged and sort-of working.

Most developers have plenty of enthusiasm for their job, but few of them have been appropriately “enabled” with the skills to manage or even understand the quality of their own work. This extends to management setting achievable targets for them to hit, such as scope and quality targets. A report of a few years ago by the Software Productivity Research organisation, based on a study of around ten thousand development projects, claimed that average Unit Testing achieved only 25% to 30% code coverage.

Despite the claims of some that coverage isn’t important, it means that developers may be releasing code into System Testing with no real idea whether two thirds to three quarters of it actually works.

Lines of code are where the basic rules that govern the behaviour of the software reside. When you buy a product over the Internet, every phase of the transaction is executed by lines of code. Failing to test them falls foul of Beizer’s Guarantee, said to be the only guarantee that testers have: every line of code that you don’t test is guaranteed to be a line of code in which you won’t find any bugs. And for all sorts of practical reasons, Unit Testing is the only real opportunity for ensuring that every line of code gets tested.

The same SPR study indicated that Unit Testing with only 25% to 30% code coverage is only finding about the same proportion of unit-level bugs, the sort that Unit Testing is supposed to find. All the rest are left for System Testing to find, which leaves the System Testers trying to do most of the developers’ testing job as well as their own. No wonder it’s so often so difficult and frustrating.

Hitting the Three Es: the Four Ts

In this problem, we see each aspect of Deming’s Equipping for Quality Control under attack.

The coders are showing little understanding of either code quality, or their personal responsibility for it
They lack adequate skills, tools, and processes for making objective code quality measurements
Their managers show inadequate leadership in requiring good quality control and providing the means to achieve it: Training, Tools, Targets, and Time

Covering 100% of the code won’t guarantee to find 100% of the code-level bugs, but it may find as many as 90% of them. Developers who are properly trained in Unit Testing “can” (as the ISTQB Foundation Syllabus says) “efficiently find many defects in their own code”. Yet how can you achieve this if you don’t know how to design covering test sets, and have no tools for measuring whether you’re achieving the target coverage or not?

Having targets to aim for is a Good Thing. Beizer tells us that “code coverage to the 100% level is a minimum mandatory requirement for Unit Testing” in order to avoid the implications of his guarantee. But coverage targets should be seen as “a floor below which we dare not fall, rather than a ceiling to which we should aspire.” Of course there should be 100% unit requirements coverage too; and targets should be set for quality as well as for scope.

But training, tools, and targets won’t get us anywhere if the developers aren’t given adequate time to do a good job of testing. Yes, it may take longer, though there are plenty of tools to help, particularly in Agile environments where the whole approach to code development is supposed to be “test-driven”. But a little more time spent controlling the quality of earlier work provides massive dividends in saving time on finding and fixing quality problems later on. It’s an investment.

Joel Montvelisky’s “Teaching programmers to test” blog is full of good ideas, though it’s particularly aimed at Agile environments, and we’re not all on that bus yet. But should programmers be testing their own code? Absolutely! Provided they’re appropriately Equipped, let them loose on the chickens!

Training

White Paper

Featured Blog

Featured Case Study

Featured White Paper