All posts by jonathankohl

Five Dimensions of Exploratory Testing

(Edit: was Four Dimensions of Exploratory Testing. Reader feedback has shown that I missed a dimension.)

I’ve been working on analyzing what I do when Exploratory Testing. I’ve found that describing what I actually do when testing can be difficult, but I’m doing my best to attempt to describe what I do when solving problems. One area that I’ve been particularly interested in lately is intermittent failures. These often get relegated to bug purgatory by being marked as “Non-reproducible” or “unrepeatable bugs”. I’m amazed at how quickly we throw up our hands and “monitor” the bugs, allowing them to languish sometimes for years. In the meantime, a customer somewhere is howling, or quietly moving on to a competitor’s product because they are tired of dealing with it. If only “one or two” customers complain, I know that means there are many others who are not speaking up and possibly quietly moving to a competitor’s product.

So what do I do when Exploratory Testing when I track down an intermittent defect? I started out trying to describe what I do with an article for Better Software called Repeating the Unrepeatable Bug, and I’ve been mulling over concepts and ideas. Fortunately, James Bach has just blogged about How to Investigate Intermittent Problems which is an excellent, thorough post describing ideas on how to make an intermittent bug a bug that can be repeated regularly.

When I do Exploratory Testing, it is frequently to track down problems after the fact. When I test software, I look at risk, I draw from my own experience testing certain kinds of technology solutions, and I use the software. As I use the software, I inevitably discover bugs, or potential problem areas, and often follow hunches. I constantly go through a conjecture and refutation loop where I almost subconciously posit an idea about an aspect of the software, and design a test to decide whether my conjecture is falsifiable or not. (See the work of Karl Popper for more on conjectures and refutations and falsifiability.) It seems that I do this so much, I rarely think about it. Other times, I very consciously follow the scientific method, and design an experiment with controlled variables, a manipulated variable, and observe the responding variables.

When I spot an intermittent bug, I begin to build a theory about it. My initial theory is usually wrong, but I keep gathering data, altering it, and following the conjecture/refutation model. I draw information from others as I gather information and build the theory. I run the theory by experts in particular areas of the software or system to get more insight.

When I do Exploratory Testing to track down intermittent failures, these are five dimensions that I consider:

  • Product
  • Environment
  • Patterns
  • People
  • Tools & Techniques

Product

This means having the right build, installed and configured properly. This is usually a controlled variable. This must be right, as a failure may occur at a different rate depending on the build. I record what builds I have been using, and the frequency of the failure on a particular build.

Environment

Taking the environment into account is a big deal. Installing the same build on slightly different environments can have an impact on how the software responds. This is another controlled variable that can be a challenge to maintain, especially if the test environment is used by a lot of people. Failures can manifest themselves differently depending on the context where they are found. For example, if one test machine has less memory than another, it might exacerbate the underlying problem. Sometimes knowing this information is helpful for tracking it down, so I don’t hesitate to change environments if an intermittent problem occurs more frequently in one than another, using the environment a manipulated variable.

Patterns

When we start learning to track down bugs when we find them in a product, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous information to have a concise bug report. With intermittent bugs, these details may not be important. In many cases I’ve seen defect reports for the same bug logged as several separate bugs in a defect database. Some of them have gone back for two or three years. We seldom look for patterns, instead, we focus on actions. With intermittent bugs, it is important to weed out the details, and apply an underlying pattern to the emerging theory.

For example, if a web app is crashing at a certain point, and we see SQL or database connection information in a failure log, a conjecture might be: “Could it be a database synchronization issue?” Through collaboration with others, and using tools, I could find information on where else in the application the same kind of call to a database is made, and test each scenario that makes the same kind of call to try to refute that conjecture. Note that this conjecture is based on the information we have available at the time, and is drawn from inference. It isn’t blind guesswork. The conjecture can be based on inference to the best explanation of what we are observing, or “abductive inference”.

A pattern will emerge over time as this is repeated, and more information is drawn in from outside sources. That conjecture might be false, so I adjust and retest and record the resulting information. Once a pattern is found, the details can be filled in once the bug is repeatable. This is difficult to do, and requires patience and introspection as well as collaboration with others. This introspection is something I call “after the fact pattern analysis”. How do I figure out what was going on in the application when the bug occured, and how do I find a pattern to explain what happened? This emerges over time, and may change directions as more information is gathered from various sources. In some cases, my original hunch was right, but getting a repeatable case involved investigating the other possibilities and ruling them out. Aspects from each of these experiments shed new light on an emerging pattern. In other cases, a pattern was discovered by a process of elimination where I moved from one wrong theory to the next in a similar fashion.

The different patterns that I apply are the manipulated variables in the experiment, and the resulting behavior is the responding variable. Once I can repeat the responding variable on command, it is time to focus on the details and work with a developer on getting a fix.

Update:
Patterns are probably the most important dimension, and reader feedback shows I didn’t go into enough detail in this section. I’ll work on the patterns dimension and explain it more in another post.

People

When we focus on technical details, we frequently forget about people. I’ve posted before about creating a user profile, and creating a model of the user’s environment. James Bach pointed me to the work of John Musa who has done work in software reliability engineering. The combination of the user’s profile and their environment I was describing is called an “operational profile”.

I also rely heavily on collaboration when working on intermittent bugs. Many of these problems would have been impossible for me to figure out without the help and opinions of other testers, developers, operations people, technical writers, customers, etc. I recently described this process of drawing in information at the right time from different specialists to some executives. They commented that it reminded them of medical work done on a patient. One person doesn’t do it all, and certain health problems can only be diagnosed with the right combination of information from specialists applied at just the right time. I like the analogy.

Tools & Techniques

When Exploratory Testing, I am not only manually testing, but also use whatever tools and techniques help me build a model to describe the problem I’m trying to solve. Information from automated tests, log analyzers, looking at the source code, the system details, anything might be relevant to help me build a model on what might be causing the defect. As James Bach and Cem Kaner say, ET isn’t a technique, it’s a way of thinking about testing. Exploratory Testers use diverse techniques to help gather information and test out theories.

I refer to using many automated or diagnostic testing tools to a term I got from Cem Kaner: “Computer Assisted Testing.” Automated test results might provide me with information, while other automated tests might help me repeat an intermittent defect more frequently than manual testing alone. I sometimes automate certain features in an application that I run while I do manual work as well which I’ve found to be a powerful combination for repeating certain kinds of intermittent problems. I prefer the term Computer Assisted Testing over “automated tests” because it doesn’t imply that the computer takes the place of a human. Automated tests still require a human brain behind them and to analyze their results. They are a tool, not a replacement for human thinking and testing.

Next time you see a bug get assigned to an “unrepeatable” state, review James’ post. Be patient, and don’t be afraid to stand up to adversity to get to the cause. Together we can wipe out the term “unrepeatable bug”.

The Role of a Tester on Agile Projects

I’ve stopped using this phrase: “The role of a tester on agile projects”. There has been endless debate over whether there should be dedicated testers on agile projects. In the end, endless debate doesn’t appeal to me. I’ve been on enough agile teams now to see that testers can add value in pretty much the same way they always have. They provide a service to a team and customers that is information-based. As James Bach says: “testing lights the way.” Programmer testing and tester testing complement each other, and agile projects can provide an ideal environment for an amazing amount of collaboration and testing. It can be hard to break in to some agile projects though when many agilists seem to view conventional testing with indifference or in some cases with disdain. (Bad experiences with QA Police doesn’t help, so the testing community bears some responsibility with this poor view of testing.)

What I find interesting is hearing experiences from people on Agile projects who fit in and contributed on Agile teams. I like to hear stories about how a Tester worked on an XP team, or how a Business Analyst fit in and thrived on an Agile team. One of the most fascinating stories I’ve encountered was a technical writer who worked on an XP team. It was amazing to find out how they adapted and filled a “team member role” to help get things done. It’s even more fascinating to find how the roles of different team members change over time, and how different people learn new skills and roll up their sleeves to get things done. There is a lot of knowledge in areas such as user experience work, testing, technical writing and others that Agile team members can learn from. In turn, they can learn a tremendous amount from Agilists. This collaborative learning helps move teams forward and can really push a team’s knowledge envelope. When people share successes and failures of what they have tried, we all benefit.

I like to hear of people who work on a team in spite of being told “dedicated testers aren’t in the white book”, or “our software doesn’t need documentation”. I’m amazed at how adaptable smart, talented people are who are blazing trails and adding value on teams that have challenging and different constraints. A lot of the “we don’t need your role” discussion sounds like the same old arguments testers, technical writers, user experience folks and others have been hearing already for years from non-Agilists. Interestingly enough, those who work on agile projects often report that in spite of initial resistance, they manage to fit in and thrive once they adapt. Those who were their biggest opponents at the beginning of a project often become their biggest supporters. This says to me that there are smart, capable people from a lot of different backgrounds who can offer something to any team they are a part of.

The Agile Manifesto says: “we value: Individuals and interactions over processes and tools.” Why then is there so much debate over the “tester role”? Many Agile pundits dismiss having dedicated testers on Agile projects. I often hear: “There is no dedicated tester role needed on an Agile team”. Isn’t this notion of a “role” to be excluded from a team putting a process over people? If someone joins an Agile team who is not a developer and they believe in the values of the methodology and want to work with a great team, do we turn them away? Shouldn’t we embrace the people even if the particular process we are following does not spell out their duties?

I would like to see people from non-developer roles be encouraged to try working on agile teams and share successes and failures so we can all benefit. I still believe software development has a long way to go, and we should try to improve every process. A danger of codifying and spreading a process is that it doesn’t have all the answers for every team. When we don’t have all the answers, we need to look at the motivations and values behind a process. For example, what attracted me most to XP were the values. My view on software processes is that we should embrace the values, and use them as a base to strive towards constant improvement. That means we experiment and push ideas forward, and fight apathy, hubris and as Deming said, drive out fear.

Unhealthy Goals

Chris Morris has blogged recently on a topic that is frequently misunderstood. There is often an attitude that we should shoot as high as we can when setting project goals. “Why not attempt to achieve perfection? Even if we don’t get there, we’ll get better results than if we set a lower goal.” is the line of thinking. Unfortunately on software projects (and probably other projects), this line of thinking can have the opposite effect, actually causing harm to a project.

Here are some project goals from my experience that have hindered the quality of a product, and have been detrimental to the development team:

  1. 100% test coverage
  2. Zero defects
  3. 100% test automation

Goal 1, “100% test coverage” is easily refuted as shown in this paper by Cem Kaner: The Impossibility of Complete Testing. What’s wrong with having this as a goal? An experienced tester realizes that it is an impossible task, and will probably feel like they are now on a death march project. At best, they might feel demoralized and not be able to finish an impossible task. At worst, they might feel pressured to falsify test results to please management.

An inexperienced tester might get complacent, and stop thinking about testing, instead rubber stamping the software with a suite of regression tests. Why should we challenge our ideas about testing the software when we already have 100% coverage? Every time I have seen a product go out the door with “100% Coverage”, a bug was found in the field. This ruins the credibility of the project team, especially if these numbers are used to measure performance, or market some sort of quality.

Goal 2, “Zero Defects” is also an impossible goal. If we can’t test every possible permutation and combination that the software might be exercised with ever, how can we guarantee that our software has zero defects? But this is a good goal you say, even if it is unachievable. Not in my experience. Every zero defect attempt project I have seen has caused defect reporting to become politicized. After the initial exhuberance wears off and testers and developers realize they are finding a lot of defects, strange things happen. Terminology changes, so certain classes of defects are called “issues” or “variances” or “thingies” so that they aren’t measured anymore. Developers (and managers) pressure testers and other developers to not log defects. Defects are closed without being fixed. A “shadow process” emerges where the defects that really need to be fixed aren’t logged through formal channels. Instead, they are logged and fixed away from the eyes and ears of other teammates and management. Even more defects can be injected into the code because of the resulting lack of communication and collaboration.

This is important to note, as Joel writes: “Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it.” These are businesses we are running and working for, and everything we do needs to make sense for the financial health of the company.

Goal 3, “100% test automation” has been recently re-popularized by the agile development community. The problem with this goal, is that there are certain tests that we aren’t able to automate. In fact, there is little about testing that we can automate, especially if the end user of the software is a human. An automated test script is a very rough approximation of what a human does because computers are not intelligent. Entire classes of tests are ignored or not thought of because they do not fit this testing paradigm, especially exploratory testing. Once automated test suites become sufficiently large, maintenance becomes an issue. There is pressure to not add or execute new test cases on software because there are too many automated test cases to worry about.

Thorough, accurate, meaningful testing can be sacrificed when it is more important to automate tests to reach this goal. Relying too much on these automated tests often allows bugs to be released that a human would have caught instantly. Less rich, manual testing is completed as those cycles are taken up with maintenance.

Measuring individuals and teams by these sorts of standards is almost always counter-productive. SMART goals and other devices used on performance appraisals map nicely to numbers and percentages, but there is little in life we do that can be accurately mapped to a two-variable graph. There are lots of other factors beyond our control which are known as “spurious variables”. At some point, people who are measured against something that is always just beyond their reach will cause them to behave in unintended ways. There are even Dilbert cartoons about rewarding developers for the amount of bugs fixed, and measuring testers on bugs found is equally counter-productive. Both can lead to a breakdown within a team and product quality suffering.
When I talk to the people who decide on these goals, in most cases the actual goal they set isn’t the intended result they are looking for. If a senior manager says to me that they have a goal for zero defects, I ask why.

Usually there is a quality issue that is angering customers with the software being delivered, and they desperately want it fixed. If the issue is reframed to: “Would it be ok if the software you delivered was reliable and robust enough so that your customers are happy?” “Why not make having happy customers who can rely on our software as our goal?” Often, that is a satisfactory goal to the manager, and is something that is reasonable to shoot for. It is also helpful to look at goals over the long term, and have a goal of consistent attempts at improvement.

When I hear things like “100% test automation”, and I question further, it is almost always about efficiency. If the issue is reframed to: “We need to look for ways to be more efficient in our testing. Why don’t we analyze our testing processes (both manual and automated), and choose the most efficient methods we can, and keep working for more efficiency. Why not make efficiency our goal?” In some cases, strategic manual testing may be more efficient and cost effective than automated testing. In many projects, especially in the beginning of a test automation effort, a little test automation can go a long way.

The true motivation behind these goals is important to understand. In many cases I’ve seen the numbers line up nicely with the goals, only to have the intended but unexpressed goal fail. “That’s great that you have zero defects when shipping, but our customers are unhappy!” an executive might say. Mary Poppendieck says: “Measure Up!” This means measure what is really important. Look at the big picture. If you measure details too much, you may miss the big picture. If you do set details-oriented goals, carefully analyze resources, schedules and the people on the project instead of just picking a number to shoot for. Be sure to measure how the goals feed the big picture. If they aren’t helping contribute to the big picture (the bottom line, happy customers, a happy, healthy, productive team), drop them. It might be surprising under scrutiny to see how many of these unrealistic goals are barriers to a good bottom line, happy customers and a happy, healthy, productive team.

Many process certified projects with wonderful charts and graphs and SMART goals all over the place release crummy products that customers quietly stop using. After all the cheering over process numbers fades and the company finds it can’t sell products like it used to, people wonder why. If we measure product success instead of adherence to a process, and measure how the project feeds the bottom line instead of things like “Zero Defects”, we might end up with better results.

Occam’s Butter Knife

When I work in test automation, I always seek the most simple solution that will work for the task at hand. Extreme Programming’s STTCPW (simplest thing that can possibly work) and YAGNI (you aren’t going to need it) are principles I’ve applied to test automation. Occam’s Razor is a principle that can be interpreted as having the same meaning as STTCPW. Simple tests are something I value because I don’t:

  • Want complex test code that can be a source for bugs
  • Have a lot of time for maintenance
  • Wish to have tests I can’t throw away easily

Recently, while automating a complex functional test for a tester, I ran into a difficult problem. I was retro-fitting a test into a framework and I couldn’t figure out how to translate the model into automated test code. I struggled for a while, and paired with the tester (who was a non-programmer) and described my problem. I was thinking about a Decorator pattern, and data structures and algorithms, but couldn’t make this model fit. I was trying to get a simple solution together, but I had constraints.

The tester patiently listened as I explained my problem and diagrammed on a white board, explained what I was trying to do, and then when I was finished said: “Forgive me for stating the obvious…” and paused. I said “What may be obvious to you may be something I haven’t considered, please continue.” The tester explained to me that I had already created that model when I had automated test data generation, and asked why I couldn’t re-use it? They were right on the money. The solution didn’t involve any design patterns or clever solutions, I just had to go grab the data from a different area of the test. The solution was sitting right there in front of me, but I was looking at the problem in the wrong way.

Non-programming testers can offer a lot to development efforts. Their minds are not cluttered with design patterns, xUnit frameworks, data structures and algorithms which allows them to see things in a different way. When I’m up to my eyeballs in code, I sometimes stop thinking like a tester. In spite of my wish to get a simple design, I had needlessly over complicated my automated test development. So much for my attempt to apply Occam’s Razor – I needed a tester to help me see the simple solution. When I’m a tester working in test automation, I sometimes lose that “tester’s perspective”. Collaboration is key, and also helps me when I’m not particularly sharp.

Testing an Application in Layers

There is often debate about test automation versus manual testing. When I think about testing, I look at an application in 3 broad layers: the code (on the machine side), the system (where the finished software lives), and the visible layer, or how the software is used from an end user’s perspective. I often call this visible layer the social context because of the environment much end-user software is used in. When we spend a lot of time in one context, testing starts to specialize because we concentrate on part of the picture.

When we view an application from the source code view, the testing is dominated by automation. When we look at the system context (as some of my operations friends do), testing involves integration in a system, and testing hardware, firmware, drivers, etc. to make sure the software gets served up correctly. Automated testing tends to get more complex the more we move from the code to the user interface. Attempting to emulate user actions is difficult, and high-volume automated functional tests can involve massive amounts of automated test code. This can be problematic to maintain. Sometimes functional tests become so complex they involve as much test code as the software they are testing has to serve up a component, plus test data generation code, as well as the code that attempts to emulate user actions.

Personally, I agree with Cem Kaner and call automated testing “computer-assisted testing”. The computer is a tool I use in conjunction with good manual testing. Until machines are intelligent, we can’t really automate testing. We can automate some aspects of testing to help maximize problem-solving efficiency.

Traditionally, software testers tend to have a handle on the social context, or how the software is used in a business context. As a result, much conventional testing is focused on the visible layer of the application. I tend to prefer testing at various layers in an application. I value testing components in isolation as well as testing software within a system. There are advantages and drawbacks to both. While I value isolation, testing, particularly brain-engaged manual testing at a visible UI layer has merit. Sometimes testers focus a great deal on testing in this context when component isolation might be more efficient. Often, traditional testers hope for an automated testing tool that can do the work of a tester. I’ve yet to see this occur successfully, but there are still tasks that can be automated to help testing efforts that are a big help.

Frequently there are bugs that are difficult to track down that crop up in the visible layer of the application. These are due to the visible application at runtime becoming greater than the sum of its parts. There is a kind of chaos theory situation that occurs due to the application being used in a way that the underlying code may not be designed to handle. By the time a minor fault at the code level bubbles up to the UI, it may have rippled through the application causing a catasrophic failure. Unfortunately, these kinds of usage-driven faults are problems automated tests at various layers do not tend to catch. Often it is some sort of strange timing issue as I note in this article. Other times, it’s due to actions undertaken in a social environment by an unpredictable, cognitive human. These variable actions motivated by inductive reasoning, driven by tacit knowledge are difficult to repeat by others.

Focusing too much on one testable interface in an application can skew our view. If we view the application by the code most of the time, we have a much different picture than if we view it through the UI. Try flipping the model of an application on it’s side instead of viewing it bottom up(from the code), or top down(from the ui). You may discover new areas to test that require different techniques, some of which are great candidates for automation, while others require manual, human testing.

User Profiles and Exploratory Testing

Knowing the User and Their Unique Environment

As I was working on the Repeating the Unrepeatable Bug article for Better Software magazine, I found consistent patterns in cases where I have found a repeatable case to a so-called “unrepeatable bug”. One pattern that surprised me was how often I do user profiling. Often, one tester or end-user sees a so-called unrepeatable bug more frequently than others. A lot of my investigative work in these cases involves trying to get inside an end-user’s head (often a tester) to emulate their actions. I have learned to spend time with the person to get a better perspective on not only their actions and environment, but their ideas and motivations. The resulting user profiles fuel ideas for exploratory testing sessions to track down difficult bugs.

Recently I was assigned the task of tracking down a so-called unrepeatable bug. Several people with different skill levels had worked on it with no success. With a little time and work, I was able to get a repeatable case. Afterwards, when I did a personal retrospective on the assignment, I realized that I was creating a profile of the tester who had come across the “unrepeatable” cases that the rest of the dev team did not see. Until that point, I hadn’t realized to what extent I was modeling the tester/user when I was working on repeating “unrepeatable” bugs. My exploratory testing for this task went something like this.

I developed a model of the tester’s behaviour through observation and some pair testing sessions. Then, I started working on the problem and could see the failure very sporadically. One thing I noticed was that this tester did installations differently than others. I also noticed what builds they were using, and that there was more of a time delay between their actions than with other testers (they often left tasks mid-stream to go to meetings or work on other tasks). Knowing this, I used the same builds and the same installation steps as the tester; I figured out that part of the problem had to do with a Greenwich Mean Time (GMT) offset that was set incorrectly in the embedded device we were testing. Upon installation, the system time was set behind our Mountain Time offset, so the system time was back in time. This caused the system to reboot in order to reset the time (known behavior, working properly). But, as the resulting error message told me, there was also a kernel panic in the device. With this knowledge, I could repeat the bug about every two out of five times, but it still wasn’t consistent.

I spent time in that tester’s work environment to see if there was something else I was missing. I discovered that their test device had connections that weren’t fully seated, and that they had stacked the embedded device on both a router and a power supply. This caused the device to rock gently back and forth when you typed. So, I went back to my desk, unseated the cables so they barely made a connection, and—while installing a new firmware build—tapped my desk with my knee to simulate the rocking. Presto! Every time I did this with a same build that this tester had been using, the bug appeared.

Next, I collaborated with a developer. He went from, “that can’t happen,” to “uh oh, I didn’t test if the system time is back in time, *and* that the connection to the device is down during installation to trap the error.” The time offset and the flakey connection were causing two related “unrepeatable” bugs. This sounds like a simple correlation from the user’s perspective, but it wasn’t from a code perspective. These areas of code were completely unrelated and weren’t obvious when testing at the code level.

The developer thought I was insane when he saw me rocking my desk with my knee while typing to repeat the bug. But when I repeated the bugs every time, and explained my rationale, he chuckled and said it now made perfect sense. I walked him through my detective work, how I saw the device rocking out of the corner of my eye when I typed at the other tester’s desk. I went through the classic conjecture/refutation model of testing where I observed the behavior, set up an experiment to emulate the conditions, and tried to refute my proposition. When the evidence supported my proposition, I was able to get something tangible for the developer to repeat the bug himself. We moved forward, and were able to get a fix in place.

Sometimes we look to the code for sources of bugs and forget about the user. When one user out of many finds a problem, and that problem isn’t obvious in the source code, we dismiss it as user error. Sometimes my job as an exploratory tester is to track down the idiosyncrasies of a particular user who has uncovered something the rest of us can’t repeat. Often, there is a kind of chaos-theory effect that happens at the user interface, that only a particular user has the right unique recipe to cause a failure. Repeating the failure accurately not only requires having the right version of the source code and having the test system deployed in the right way, it also requires that the tester knows what a that particular user was doing at that particular time. In this case, I had all three, but emulating an environment I assumed was the same as mine was still tricky. The small differences in test environments, when coupled with slightly different usage by the tester, made all the difference between repeating the bug and not being able to repeat it. The details were subtle on their own, but each nuance, when put together, amplified each other until the application had something it couldn’t handle. Simply testing the same way we had been in the tester’s environment didn’t help us. Putting all the pieces together yielded the result we needed.

Note: Thanks to this blog post by Pragmatic Dave Thomas, this has become known as the “Knee Testing” story.


Quick Update

I’ve been blogging less lately; have had a lot on the go. I’m also spending more time writing code and will spare you the pain of having to read source code in a blog post.

Some quick points:

Software Testing and Scrum

Edit: update. I wrote an article for InformIT on this topic which was published Sept. 30, 2005.

I’ve been getting asked lately about how a software testing or QA department fits when a development team adopts Scrum. I’ll post my experiences of working as a conventional tester on a variety of Scrum projects. Stay tuned for more posts on this subject.

TDD in Test Automation Projects

I’ve written before about pairing with developers during Test-Driven Development(TDD), and I’ve been fortunate to work with very talented TDD developers who are apt to teach. I’ve learned a lot, and decided to try TDD in a programming role. Recently, I’ve taken off my tester hat and started doing TDD myself with test automation projects. I’m not completely there yet – I often need to do an architectural spike first when I’m developing something new. Once I have figured out a general design, or have learned how a particular library works, I throw away the spike code and start off development by writing a test. I then write enough code to get the test to pass, write a new test, add new code and repeat until the design is where I need it to be.

So what does this gain in test automation projects? I’m loathe to have test cases that are so complex that they themselves require testing. If our test cases are so complex that they are causing problems themselves, that’s a test design smell. However, there are other kinds of software in our automation projects than just the test cases. In automation frameworks, we need special libraries, or adaptors, or a way to access an application we need to test, and all sorts of utilities that help us with automation. Since these utilities are still software, they are subject to the same problems that any other software development effort is. Sometimes there is nothing more frustrating than buggy test code, so we need to do what we can to make it as reliable as possible.

In my own development, I’m finding a lot of benefits doing TDD. My designs improve, because if they aren’t testable, I know there is a problem. When I make my code testable, it suddenly becomes more usable and more reliable. Too often testers aren’t given time to refactor test code. I’ve found that refactoring usually starts when technical debt in a test harness and custom test library are starting to interfere with productivity. When there are no unit tests for custom test library code, it can involve a few minutes to change the code, and several hours testing it. Having a safety net of unit tests helps immensely with refactoring. You can refactor your test code with greater confidence, and when it’s done consistently with automated unit tests, with much greater speed. It just becomes a normal part of development.

Recently, I’ve found several bugs in my test library code in the elaborative phase of TDD that I didn’t find testing manually. The opposite is also true, I tested libraries I was developing manually and found a couple of bugs that my unit tests didn’t uncover. The TDD-discovered bugs required design changes that helped my design immensely. The tests guided the design into something different (and better) than what I had in my head when I started. However, after a couple of days of only running the unit tests to satisfy the “green bar”, I found a big hole that was only uncovered by using the library the way an end user would. The balance of testing techniques is helpful. I have also adopted a practice of pairing a positive test with a negative test, a technique I learned from John Kordyback. If I do an assert_equal for example, I also do an assert_not_equal (using test::unit style). This has really come in handy at times where one assertion would work, but the other would fail.

TDD may not be for everyone, but I find it is a nice complement to other kinds of testing we can do. For my own work, it seems to suit the way I think about development. Even if the test cases I develop while programming are trivial and small, there is strength in numbers, and writing and using them helps assuage that tester voice in the back of my head that comes out when programming. I encourage other conventional testers who work on automation projects to give it a try. You may find that your designs improve, and you have a safety net of automated tests to give you more confidence in your automation code, especially when you need to enhance it down the road. At the very least, it helps you gain an appreciation and helps you communicate when working with TDD folks on a project.

Exploratory Testing using Personas

I get asked a lot about Exploratory Testing on agile projects. In my position paper for the Canadian Agile Network Workshop last month, I described Exploratory Testing using personas. I’m reposting it here to share some of my own experience Exploratory Testing on agile projects.

Usability Testing: Exploratory Testing using Personas

Usability tests are almost impossible to automate. Part of the reason might be that usability is a very subjective thing. Another reason is that automated tests do not run within a business context. Program usability can be difficult to test at all, but working regularly with many end users can help. We usually don’t have the luxury of many users testing full-time on projects; usually there is one customer representative. Once we get information from our test users, how do we continue testing when they aren’t around? One possible method is Exploratory Testing with personas.

I’ve noticed a pattern on some agile teams. At first the customer and those testing have some usability concerns. After a while, the usability issues seem to go away. Is this because usability has improved, or has the team become too close to the project to test objectively? On one project, the sponsor rotated the customer representative due to scheduling issues. We were concerned at first, but a benefit emerged. Whenever a new customer was brought into the team, they found usability issues when they started testing. Often, the usability concerns were the same as what had been brought up by the testers and the customer earlier on, but had been contentious issues that the team wasn’t able to resolve.

On another project using XP and Scrum, a usability consultant was brought in. They did some prototyping and brought in a group of end users to try out their ideas. Any areas the users struggled with were addressed in the prototypes. The users were also asked a variety of questions about how they used the software, and their level of computer skills, which we used to create user profiles or personas. As the developers added more functionality in each iteration, testers simulated the absent end users by Exploratory Testing with personas to more effectively test the application for usability. The team wanted to automate these tests, but could not.

Exploratory Testing was much more effective at providing rapid feedback because it relies on skilled, brain-engaged testing within a context. The personas helped provide knowledge of the business context, and the way end-users interacted with the program in their absence. The customer representative working on the team also took part in these tests.

Tension on usability issues seemed to be reduced as well. These issues were no longer mere opinions. Now the team had something quantifiable to back up usability concerns. Instead of having differing opinions from developers, testers could say: “when testing with the persona ‘Mary’, we found this issue.” This proved to be effective at reducing usability debates. The team compromised with most issues being addressed, and others not. There were still three contentious issues that were outstanding when the project had completed the UI changes. We scheduled time to revisit end-users and had some surprising results.

Each end-user struggled with the three contentious usability issues the testers had discovered, which justified the approach, but there were three more areas we had completely missed. We realized that the users were using the software in a way we hadn’t intended. There was a flaw in our data gathering. Our first sample of users tested in our office, not their own. We had them work with the software at their own desks, and within their business context. Lesson learned: get the customer data when they are using the software in their own work environment.

On this project, Exploratory Testing with personas proved to be an effective way to compensate for limited full-time end user testing on the project. It also helped to provide rapid feedback in an area that automated tests couldn’t address. It didn’t replace the customer input, but worked well as a complementary testing technique with automation and customer acceptance testing. It helped to retain their voice in the usability of the product throughout development instead of sporadically, and helped to combat group think.