Monday, January 05, 2009

Testability and Reproducibility

Doug Szabo has asked me two interesting questions, and we thought we would share my answers.

Doug: Can you point me to some guiding literature that explains how to make code testable?


Explanation: I noticed at least one mention in Perfect Software: and Other Illusions about Testing of making code testable. I have seen that language in a couple of other software testing books, including Software Testing Techniques. I have even been told by more than one developer that they needed to "make the code more testable" before they wanted me to start testing. Since the developers who told me this didn't have the faintest clue what testing was about, I really don't know what they intended to do with their code. Apparently, neither did they - I eventually asked, "what do you need to do to make it more testable?", to which they replied that they didn't know. I was and still am confounded by the lack of explanation for what it is that makes code testable. I understand when code is very difficult to test, like when you have a "horrible loop" (Software Testing Techniques), or when multithreaded code is not first tested in single thread contexts, but I don't understand what needs to be done to code to make it testable. Are we talking about instrumenting the code with Debug statements, assert statements, or some symbol that a test tool can detect?
Jerry: I share your dismay at the lack of publications about how to make code testable. There are many, many little techniques (like initialization on entry, not exit; eliminating as many special cases as possible; and general simplification for a reader), but they're not as important as three things:

1. All code should be open code that anyone in the project can read and critique.

2. All code must be reviewed in a technical review (see my The Handbook of Walthroughs, Inspections, and Technical Reviews)--in which at one professional tester is present and fully participating.

3. Same as 2, except for design reviews and requirements reviews).

If you do these thing, the organization will quickly learn how to make code testable.

But, yes, someone should write the book and start with these three things.

Q2: Do you have some strategies for triaging bugs that do not reproduce consistently?

Explanation: I was a developer before my current role of tester. Hey. Don't roll your eyes at me. I was an engineer (a real one, with a professional license and all that hoopla) before I became a developer. So I already knew the value of testing, even if I didn't know what software testing really was. Well, as an engineer it was crucial that tests be designed such that the results would be reproducible, and measurable against a control. When I got into software development, I tried to stick to the engineering principle with respect to testing. Unfortunately, as I worked on larger software projects, particularly those where my code talked to other processes, and also where I had multiple threads running, I found that bugs were starting to occur where steps to reproduce did not consistently reproduce the bug. Oh oh. I knew that meant there was something wrong, I just didn't have something that I could run under a debugger and know where to set a breakpoint. As a developer, it seemed like there were always enough reproducible bugs that I had lots of excuses to avoid trying to solve those that might not reproduce. Now, as a tester, I am empathetic to developers and have a self-imposed guideline to make an entry of any non-reproducible issue, but at the same time I don't assign it to the pool of issues to fix until a set of steps to consistently reproduce is found. What I got out of Perfect Software is that perhaps I should be passing the tough to reproduce issues over to the pool to fix, but then what would you recommend for a triage approach, to convince stakeholders to take those issues as seriously as the ones that do consistently reproduce?


Jerry: Great question, Doug.

First answer. Try changing "not reproducible" to "I am unable to reproduce."

Then parse the second category into sub-categories like:

a. I saw this one, but was never able to make it happen again.

b. I see this when I run with this setup, but sometimes I don't.

c. This is the same anomaly I see under several setups, but not all the time.

d. Under X setup, this happens sometimes and not other times. There may be something different from one X to the other, but I' not seeing it.

In each case, you are unable to pinpoint the fault underlying the failure. There may be several faults producing the same failure, or different failures appearing from the same underlying fault. Since you don't really know the frequency of this failure, the way I use to triage the failure is to apply the question: "What would it cost us every time this failure appears in the field?"

If that number is high, then get a team working on the bug. If it's low, let the bug rest while (perhaps) new data accumulates or the bug disappears mysteriously as other changes are made to the code.