Testers – The Scientists Who Produce Knowledge about the Quality of SUT
I met Michael Bolton in US at a software testing conference in 2011, who shared with me a series of CBC radio broadcasts about “How to think about science”. I was especially fascinated by one of them – the interview with Simon Schaffer, a professor of history of science at the University of Cambridge, and the co-author of the book <Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life>.
The book changed the way of people thinking about the history of science. In the interview, Simon Schaffer explained how science is made, or how knowledge is produced. There are three important questions when talking about the history of science: what scientists say about the world (i.e. what knowledge do they produce)? How they find out the knowledge? How they make people come to agree to the knowledge?
Those three questions invoked me rethinking about my understanding of science. And most importantly, I began to ask similar new questions about software testing:
n What does testing really look like? Is testing a way of producing knowledge? Or is testing only a way of discovering knowledge – the knowledge about the quality of the system under test (SUT)?
n What does a tester really look like? Can testing be done almost by any IT people or even by high school students? Or can software testing only be done by ingenious and smart people?
n What kind of knowledge or information does a tester produce?
n How does a tester find out that information?
n How does a tester make people agree to the information he/she provides? How does a tester explain his/her testing to others?
In this paper, I’d like to share with you some of my understandings of these questions.
2. What does testing really look like?
Knowledge is institution
At the beginning of the interview, “…the history of science is usually meant the biography of scientists or studies of the socio-contexts in which scientific discoveries were made. Scientific ideas were discussed, but the procedures and axioms of science itself were not in question.” But since 1970’s, a new generation of scholars began to ask new questions about science and they wanted to know how knowledge is produced.
Different from other scholars who studied the history of science mainly by relying on what scientists said, Simon Schaffer and his colleagues used field methods, working along side scientists in labs, in field stations, to try to look at what they did through observing the process of producing knowledge.
The observations prompted Simon Schaffer and his colleagues to get some significant conclusions: scientists look more like skilled carpenters than like oracles; their knowledge was not the very voice of nature, but a human product, something that has to be made and then maintained; knowledge is institution and it has to be analyzed as such.
These conclusions, at least, turn my view of science upside down. In my previous knowledge, natural sciences, are about discovering the natural world. I mean, the knowledge is right over there for thousands of years, waiting for scientists to be discovered. With more discoveries, people know more about the world. As said in the interview: “Formally, science has been seen as social only when it was wrong. Social interests distorted and corrupted knowledge. True knowledge was immaculate, untouched by human hands.” But Simon Schaffer said “No.” Our knowledge about the world was not the very voice of nature, but was made and maintained by scientists.
Then, what about software testing?
What kind of knowledge does a tester produce? I think most of you will agree with this: what a tester produces is the knowledge about the quality of SUT. Some people say that testing is just like a ruler, measuring SUT and finding out what’s wrong and what’s right with the product. Dose it mean the knowledge that testers produce is the VERY VOICE of a product? After some experiments (test execution), testers DISCOVERED this knowledge and transferred to other people?
Testers produce knowledge, rather than discover knowledge
As Lee Copeland said in his book, A Practitioner’s Guide to Software Test Design, “testing is the process of comparing “what is” with “what ought to be”. The first part “what is”, represents the real product. Who knows absolutely the real product? Testers? Developers? Or Customers? Probably none. The second part “what ought to be”, represents the supposed product. Who knows absolutely the supposed product? Testers? Developers? Or Customers? Probably no one knows it, too. In fact, the information about the quality of a product is not a never-changing un-social thing, which exists right over there waiting to be discovered. Instead, the knowledge about the quality of SUT is a human product. Testers, and other roles in a project, produce and maintain this knowledge. If a tester says this part of SUT functions well and contains no bugs, in the other day, the appearance of a new bug will destroy his/her belief; if a tester says this is a bug, in the other day and in another context, the bug may turn into a feature. So, testers produce the knowledge about the quality of SUT, and they maintain this knowledge.
To say testers produce knowledge rather than discover knowledge is to admit that we testers can never know the complete true state of the quality of SUT. True knowledge is known by none. SUT, just like the nature, is there right before everybody’s eyes. Yet we only know something about it, and there are more that we don’ t know. Testers (and other roles in a project) construct the knowledge about the quality of SUT based on observations. What we know now may change in the future. The fact that testers produce knowledge about SUT is just like the fact that scientists produce knowledge about the world.
Some people believe that developers are more creative than testers because developers produce products while testers can only work based on that product. Yes, testers don’t construct the product like what developers do, but testers do have their own product – the knowledge about the quality of SUT. I believe that testers, just like scientists, are very creative people and produce knowledge, too.
3. What does a tester really look like?
What do you think of scientists? In the interview, Simon Schaffer said, “I think there were two standard images of what the scientists are: one image is that scientists are absolutely special people, that they are much more moral, much more virtuous, and much more clever, and they do things nothing like what anybody else does; and on the other hand, there is an equally powerful public image of science, which is that science organizes common senses, that they are just cookery raised to a fairly sophiscated as are. Those are two dominant images of public science of our culture, and neither of them is right.”
Science, according to many philosophers, was centrally reasoning and action. After a lot of observations and studies, Simon Schaffer and his colleagues came to a very different conclusion. To them, scientists are like skilled carpenters or engineers. “They didn’t have very much bigger brains, their skulls look like very similar to other people, they didn’t seem doing things that are terribly special in terms of methods, they didn’t seem to be much more skeptical than the rest of us, they didn’t seem to constantly make bold conjectures then they desperately try to falsify them. They seem to be ingenious manipulators, managers, art designers.”
Compared with software testing, I’d like to discuss two things here. One is about reasoning and action; the other is about skilled carpenters.
Reasoning and action
Software testing is a practice-based science. Undoubtedly，there is much reasoning and actions in software testing. In fact, testing is about continuous reasoning and action. For example, test design contains a lot of reasoning work and test execution contains a lot of actions work.
But interestingly, in many organizations, two different types of roles are defined: test designers and test executors. Are scientists marked “reasoning scientists” or “action scientists” separately? Is the classification of test designers and test executors helpful to more effective and efficient testing? Or is there any other field in which work is more effective due to the coordination of reasoning roles and action roles?
Cem Kaner pointed out in one of his tutorials A Tutorial in Exploratory Testing, “Testing is like CSI (Crime Scene Investigation)”. Let’s take a look at policemen who are investigating a crime. Often there are two teams: one controller team and one practitioner team. The controllers seldom go to fields to do on site work. Most of time, they sit in their office. What they do most is to make investigation plans, study maps, analyze clues, send out commands, and control other policemen to catch the criminals. On the other hand, the practitioners have to go to accident fields, executing commands, inquiring people, trying to find more clues, and reporting their findings to controllers. Is this proving that we should also set test designers and test executors in software testing? I don’t think so.
There are two important points here for successful crime scene investigations. One is that practitioners must have enough exploration skills and they also have enough freedom to explore when they are working at crime scenes. The other is that controllers must get timely feedback from practitioners and adjust investigation plans accordingly. In fact, controllers and practitioners work in parallel, each exploring much in their work and flexible enough to adapt changes. They are actually doing collective exploration.
In software testing, if real-time communication between test designers and test executors happens, and test plans can be timely adjusted based on current risks, an ideal description for this would be paired exploratory testing. To put it extremely, if test design and test execution are done by the same person, and this person changes his thought between test design and test execution frequently (continuous reasoning and action), and this person adjust his test plan based on what he got past, he is actually doing exploratory testing!
So, to make testing more effective, it is not to define separate test roles, but to try to tightly coordinate test design work and test execution work and make them as mutually supportive work as possible.
Are testers more like skilled carpenters?
A more accurate question would be: are excellent testers more like skilled carpenters? Perhaps anyone can test something, but only a few can be excellent testers.
To become an excellent tester, you need to practice a lot rather than by only reading a lot of testing books. To find out valuable information, you need to make your own decisions as to when and what and how to test next, based on your past tests, rather than totally relying on what anybody else tells you to do. To utilize your limited resources and time efficiently, you need to adjust your testing according to context changes.
In a word, in order to do effective and continuous reasoning and actions, i.e. effective testing, testers need to behave more like skilled carpenters than like priests. The product of skilled carpenters may not be the best, but it must be good enough.
4. How does a tester find out the information about the quality of SUT?
Knowledge is institution
Knowledge is institution. As Simon Schaffer said, “What we learned that there are social institutions and work to produce what we know, and indeed to produce what anybody claims to know in any particular period.” Similarly, there are testing organizations and work to produce what we know about the quality of SUT.
How to evaluate testing? Managers like to evaluate testing by reading test reports, relying mostly on what testers say. People seldom engage closely with the really active work that testers do. But back into those early days in a project, no one knows for sure what the product and its quality will be like. Then, how does a tester find out how things are? How does a tester make people come to agree with his/her information?
How to do a better testing? Most people rely on what books say and what standards or test procedures say. But there are also some pre-conditions described in books and standards. In real projects, those pre-conditions may not be satisfied. We probably can’t get high quality requirements; it’s hard to design all right test cases for once up front in time; it seems infeasible if we make a test plan in advance and then just act accordingly; we are always lack of resources and time…
It’s high time to look into every detail of real testing wok. We need to pair with testers, knowing testing contexts, observing their operations, understanding their thinking, and identifying possible improvements for better testing. In order to evaluate testing, we don’t want to rely entirely on what testers say, but try to look at what they do. We’re trying to find bugs using field methods, except this time instead of testing a system, we’re testing the people who are doing testing.
Knowing a testing mind
During past two years, I did some field studies about this topic and I called it “Knowing A Tester’s Mind”. Many books say testing is the process of planning, designing, executing and reporting. But when I paired with testers and observed their testing, I found many other interesting things, which were seldom recorded in testing books.
If you give a product for a tester to test, you will find what this tester does first is to learn the product. And this learning process will last from the beginning of testing to the end. I found testing is actually an active leaning process: learning the system under test, learning the areas that can work well and the areas that have problems, learning the risky areas, learning functionalities or scenarios that are new or unknown to them before. In fact, testing is about active learning to know something. Through active learning, testers collect a lot of information then they can produce the knowledge about SUT. I believe the same is true for scientific work.
By observing those excellent testers, I found that they didn’t just follow testing procedures or guidelines, but take all that they know as heuristics, for a reference in testing; they are more like well-trained detectives with great skills, finding clues and hiding bugs; they have agile minds, continuously collecting a variety of information and distinguishing them quickly; they sometimes use quick learning skills, sometimes use reverse thinking, sometimes use divergent thinking, and sometimes use focused thinking…
But the approaches or methods used by testers are not very much different from those used by other workers in other fields. In general, systemic thinking, constant learning, analysis capability, practices over and over again, and feedback-based strategy adjustment, etc. are all needed in every field.
Some people think that testers have a very special skill – the spirit of doubting anything. But when observing testers in close details, you will find that they are not always doing bold assumptions and then go to verify these assumptions. They do not always question everything. They are more like skilled carpenters or engineers, and they use a lot of heuristics. The decisions they made during testing and the things they will verify next depend on the testing they just did and the information they just collected. These testing decisions are based on analysis, not just based on bold conjectures.
You may say that the testers I observe are exploratory testers and they’re doing exploratory testing. But I would say if you observe a tester doing manual testing, he or she will explore more or less. Or every manual tester is an exploratory tester and will use some kind of exploratory skills. The testing minds, rather than testing procedures or tools, are at the center of testing, and help testers find valuable information about SUT.
5. How does a tester make people agree to the information he/she provides?
Scientists in 1660’s were facing the similar problem, i.e. how to make people agree to their claims. As Simon Schaffer introduced, roughly before the 1600′s, natural philosophy, the knowledge of nature, was understood of being knowing how nature normally behaves, how it commonly is. But Robert Boyle, who established experimental philosophy and the mechanical philosopher in Britain, said no to this. He thought that the way to find out how things are was to produce singular, odd, strange, mechanically produced instances, stop observing nature as normally was, start producing effects, which you could isolate and analyze. That’s a huge shift in how to find out about the nature in that period, though it may seem obvious for us now. The experimental philosophy involves using elegant complicated machinery to find out how nature worked by performing experiments. First, design a commission. Second, invite a group of people witnessing the experiment. Third, write down what was happening and print out the paper for people to sign on it.
So how does a tester make people agree to the information he/she provides? Doing experiments and producing effects. If what you tell people about your claims of the quality information for SUT is just based on your guesses or your past experiences, it was hard for people to agree with you. Your claims need to be based on sound testing experiments. Fortunately, a big part of testing work is to test, to do experiments, and to produce effects.
Testing for falsification
As testing proceeds, product quality information like a picture gradually clears up. As testers, we never know for sure what the absolute quality information is, but do we know how close they are? Are our stakeholders satisfied with the information we provide?
Imagine that the test object is composed of thousands of pieces big or small, each piece with its own functions. But for each of these pieces, we can’t prove it functioning well absolutely in every situation only by means of testing. As we know, there are always two different ways of testing: testing for confirmation and testing for falsification. From the view of testing for confirmation, even a large amount of testing is just to prove that a piece of software can work in limited scenarios. Our approach is, from the perspective of testing for falsification, to expose the scenes where the test object can not work properly; and from the perspective of reverse and negative testing, to observe the system performance under extreme or abnormal usage scenarios so as to have a deeper and better understanding of SUT and its quality status.
If during testing, we are stumbled upon something surprising, that means some information falls out of our original knowledge. It can be said that all test cases written in advance are designed according to the tester’s knowledge (known information) about the product. Frequently, there are much information that are unknown to testers and those unknown information matters much, much more than those known information (This viewpoint is put forward wonderfully by Nssim Nicholas Taleb in his book The Black Swan.). By analyzing those defects leaked to users, we can know about those unknowns so that we can do better testing in the future. Testing is just about finding unknowns.
More emphasis on test execution
In some companies, there is a tendency of emphasizing more on test design than on test execution: experienced testers design tests and novice testers execute tests; a lot of time is allocated for test analysis and test design and the time for test execution is always so tight; test designers get paid more than test executors.
In my opinion it’s high time to know the value of test execution for at least the following reasons:
- Test execution means doing actual experiments with the real product, which, as said above, is helpful for making people agree to the information that testers provide.
- Test execution is one of the most effective and efficient ways of finding defects. Defects are negative information about the SUT and help testers understand the product and its quality status better. The more time for test design and test preparation, the less time for test execution, and the less opportunity in finding valuable defects.
- Often, designing many tests long time before means a kind of waste. Some tests may be inaccurate, and some tests are never executed. Moreover, you cannot put every detail in your test scripts and testers have to learn to explore in test execution to find valuable defects. Only when we see or touch the real test object, can we know more about some detailed information about SUT, about the risks related, and about how we’re going to test it.
6. How does a tester explain his/her testing to others?
Since Robert Boyle couldn’t bring everyone into his laboratory observing his experiment, he had to find a way explaining his claim to other people. Simon Schaffer and his colleagues were very stuck by the way in which Robert Boyle particular wrote, so when you read his stories, as if you were watching what he was saying. This was called virtual witnessing by Simon Schaffer. This new kind of writing brought you to present, which allows you to imagine you were witnessing too.
Software testing is a kind of experimental philosophy, too. You cannot learn software testing well only by learning theories. You have to continuously practice and practice. Software testing uses virtual witnessing, too. You can’t bring everyone observing your testing. You have to explain your testing to others in some ways, either by talking to other people directly, or by writing down something so that people can read, or replay your testing process, etc. You cannot just show your 500 test cases to others. People cannot understand your testing very well either because these test cases are too distributed, or because your test cases are not clear enough, or because people just don’t have time to read through your test cases. Then what are the possible ways of explaining testing?
Possible ways of explaining testing
James Bach talked about three important parts of a test report in his Rapid Software Testing class. Let’s discuss how we testers explain our testing based on these three parts.
Part I: What do you find during your testing?
- Defect report
As said above, we can never know for sure the true quality of SUT. But we can be close to the truth by negative examples rather than by positive evidences. Defects are important clues helping us be close to the truth. Can we regard defect reports as a kind of important virtual witnessing document and write them in such a way that brings our readers to present, as if they are witnessing our testing?
Here, I’d like to point out one important section in a defect report – defect-reproducing conditions.
Although Robert Boyle’s experimental philosophy was very popular at that time, another thinker Thomas Hobbes had a very deep objection to his claims. “Hobbes couldn’t foresee a world in which people could come to an easy agreement because they just show something. That was one major argument. Second argument that Hobbes had was that just because it happened in one place, how do we know it happened elsewhere? How do we know that? Show me, show me it happening everywhere, and show me it happening always. You can’t do that. You can’t build knowledge of a universe from what you see on one occasion”, Simon Schaffer said in the radio program.
When you find a defect in your testing, people can ask you similar questions: will the defect happen in users environment? Can it be reproduced? In what circumstances can the defect be triggered? The answers lie in this part of your defect report – defect-reproducing conditions. So take it seriously.
- Risk lists
Let’s say you designed 1000 test cases for one feature. You executed all of them and all passed. What does this say? Can it prove that the feature works correctly? It proves nothing but these 1000 test cases passed. If you found 2 serious defects, what does this say? You can say for sure that the feature cannot always work well. So not all facts are equally important. Negative facts like defects and risk lists have more significance than positive evidences.
A risk list can include many things: risky areas, tests not executed, defects not fixed, ambiguous areas, etc.
Part II: What did you do to find these things?
- Test strategy
Your test strategy can clearly describe your way of testing, including the negative ways and positive ways of approaching SUT.
You can tell your testing story to others face to face.
Part III: How well is your testing?
If possible, you’d better talk to stakeholders right after your testing. Debriefing is especially important in session based exploratory testing.
- Recordings of your testing
Sometimes, record your testing process and replay the videos to stakeholders can be another effective way of explaining your testing.
- Test depth graph
Using a test depth graph, telling people your test depth and test width, which is useful to help people recognize how deeply and how widely your testing is.
Simon Schaffer thinks the western natural sciences work very well partly because they organize trust extremely efficiently, not because they organize skepticism and doubt extremely efficiently. Most people within a project know the software quality not through their own experiences, but through what they are told; users may experience quality. Learning to explain testing is a powerful tool in organizing trust about software testing and software quality within a project.
I had a dream when I was young – I want to be a scientist. Now I’ve been a software tester for more than ten years. I had thought I would never be a scientist for my life. After I wrote this article, I have a different opinion now. There are so many similarities between a tester and a scientist that I thought I AM a scientist now – only with the difference that testers are the scientists who produce knowledge about the quality of SUT, not about the world!
Similarities between software testing and scientific work
Steven Shapin, Simon Schaffer, Leviathan and the Air-Pump: Hobbes, Boyle, and the Experimental Life, 2011
Nassim Nicholas Taleb, The Black Swan: Second Edition: The Impact of the Highly Improbable, 2008