All posts by jonathankohl

Tracking Intermittent Bugs

Recognizing Patterns of Behavior

In my last post, the term “patterns” caused strong responses from some readers. When I use the term “pattern,” I do not mean a design pattern, or a rule to apply when testing. For my purposes, patterns are the rhythms of behavior in a system.

When we start learning to track down bugs, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous details to create a concise bug report. However, with an intermittent bug, the sequence of steps may vary greatly, even to the point where they seem unrelated. In many cases, I’ve seen the same bug logged in separate defect reports, sometimes spanning two or three years. But there may be a pattern to the bug’s behavior that we are missing when we are so close to the operation of the program. This is when we need to take a step back and see if there is a pattern that occurs in the system as a whole.

Intermittent or “unrepeatable” bugs come into my testing world when:

  1. Someone tells me about an intermittent problem they have observed and needs help.
  2. I observe an intermittent problem when I’m testing an application.

How do I know when a bug is intermittent? In some cases, repeating the exact sequence of actions that caused the problem in the first place doesn’t cause the failure to reoccur. Later on, I run into the problem again, but again am frustrated in attempts to repeat it. In other cases, the problem occurs in a seemingly random fashion. As I am doing other kinds of testing, failures might occur in different areas of the application. Sometimes I see error messaging that is very similar; for example, a stack trace from a web server may be identical with several errors that come from different areas of the application. In still other cases my confidence that several bugs that were reported in isolation is the same bug is based on inference – I have a gut feeling based on experience (called abductive inference).

When I looked back at how I have tracked down intermittent bugs, I noticed that I moved out to view the system from a different perspective. I took a “big picture” view instead of focusing on the details. To help frame my thinking, I sometimes visualize what something in the physical world looks like from high above. If I am in traffic, driving from traffic light to traffic light, I don’t see the behavior of the traffic itself. I go through some intersections, wait at others, and can only see the next few lights ahead. But if I were to observe the same traffic from a tall building, patterns would begin to emerge in the traffic’s flow that I couldn’t possibly see from below. One system behavior that I see with intermittent bugs is that the problem is seemingly resistant to fixes that are completed by the team. The original fault doesn’t occur after a fix, but it pops up in another scenario not described in the test case. When several bugs are not getting fixed or are occurring in different places it is a sign to me that there is possibly one intermittent bug behind them. Sometimes a fault is reported by customers, but is not something I can repeat in the test lab. Sometimes different errors occur over time, but a common thread appears: similar error messages, similar back end processing, etc.

Once I observe a pattern that I am suspicious about, I work on test ideas. Sometimes it can be difficult to convince the team that it is a single, intermittent bug instead of a several similar bugs. With one application, several testers were seeing a bug we couldn’t repeat occur infrequently in a certain part of the application we were testing. At first we were frustrated with not being able to reproduce it, but I started to take notes and save error information whenever I stumbled upon it. I also talked to others about their experiences when they saw the intermittent failure. Once I had saved enough information from error logs, the developer felt he had a fix. He applied a fix, and I tested it and didn’t see the error again. We shipped the product thinking we had solved the intermittent problem. But we hadn’t. To my shock and dismay, I stumbled across it again in a slightly different area than we had been testing after the release.

It took a while to realize that the bug only occurred after working in one area of the application for a period of time. I kept notes of my actions prior to the failure, but I couldn’t reliably repeat it. I talked to the lead developer on the project, and he noticed a pattern in my failure notes. He told me that the program was using a third-party tool through an API in that area of the application. The error logs I had saved pointed to problems with memory allocation, so he had a hunch that the API was running out of allocated space and not handling the error condition gracefully. We had several other bug reports that were related to actions in that area of the application, and a sales partner who kept calling to complain about the application crashing after using it for an hour. When I asked the sales partner what area of the application they were using when it crashed, and it turned out to be the same problem area.

The team was convinced we had several intermittent bugs in that area of the application, based on their experience and bug reports. But the developer and I were suspicious it was one bug that could be triggered by any number of actions showing up in slightly different ways. I did more testing, and discovered that it didn’t matter what exactly you were doing with the application, it had to do with how the application was handling memory in one particular area. Our theory was that the failures occurring after a period of time passing while using the application had to do with memory allocation filling up, causing the application to get into an unstable state. To prove our theory, we had to step back and not focus on the details of each individual case. Instead, we quickly filled up the memory by doing actions that were memory intensive. Then, we could demonstrate to others on the team that various errors could occur using different types of test data and inputs within one area of the application. Once I recorded the detailed steps required to reproduce the bug, other testers and developers could consistently repeat it as well. Once we fixed that one bug, the other, supposedly-unrelated, intermittent errors went away as well.

I am sometimes told by testers that my thinking is “backwards” because I fill in the details of exact steps to repeat the bug only after I have a repeatable case. Until then, the details can distract me from the real bug.

Five Dimensions of Exploratory Testing

(Edit: was Four Dimensions of Exploratory Testing. Reader feedback has shown that I missed a dimension.)

I’ve been working on analyzing what I do when Exploratory Testing. I’ve found that describing what I actually do when testing can be difficult, but I’m doing my best to attempt to describe what I do when solving problems. One area that I’ve been particularly interested in lately is intermittent failures. These often get relegated to bug purgatory by being marked as “Non-reproducible” or “unrepeatable bugs”. I’m amazed at how quickly we throw up our hands and “monitor” the bugs, allowing them to languish sometimes for years. In the meantime, a customer somewhere is howling, or quietly moving on to a competitor’s product because they are tired of dealing with it. If only “one or two” customers complain, I know that means there are many others who are not speaking up and possibly quietly moving to a competitor’s product.

So what do I do when Exploratory Testing when I track down an intermittent defect? I started out trying to describe what I do with an article for Better Software called Repeating the Unrepeatable Bug, and I’ve been mulling over concepts and ideas. Fortunately, James Bach has just blogged about How to Investigate Intermittent Problems which is an excellent, thorough post describing ideas on how to make an intermittent bug a bug that can be repeated regularly.

When I do Exploratory Testing, it is frequently to track down problems after the fact. When I test software, I look at risk, I draw from my own experience testing certain kinds of technology solutions, and I use the software. As I use the software, I inevitably discover bugs, or potential problem areas, and often follow hunches. I constantly go through a conjecture and refutation loop where I almost subconciously posit an idea about an aspect of the software, and design a test to decide whether my conjecture is falsifiable or not. (See the work of Karl Popper for more on conjectures and refutations and falsifiability.) It seems that I do this so much, I rarely think about it. Other times, I very consciously follow the scientific method, and design an experiment with controlled variables, a manipulated variable, and observe the responding variables.

When I spot an intermittent bug, I begin to build a theory about it. My initial theory is usually wrong, but I keep gathering data, altering it, and following the conjecture/refutation model. I draw information from others as I gather information and build the theory. I run the theory by experts in particular areas of the software or system to get more insight.

When I do Exploratory Testing to track down intermittent failures, these are five dimensions that I consider:

  • Product
  • Environment
  • Patterns
  • People
  • Tools & Techniques

Product

This means having the right build, installed and configured properly. This is usually a controlled variable. This must be right, as a failure may occur at a different rate depending on the build. I record what builds I have been using, and the frequency of the failure on a particular build.

Environment

Taking the environment into account is a big deal. Installing the same build on slightly different environments can have an impact on how the software responds. This is another controlled variable that can be a challenge to maintain, especially if the test environment is used by a lot of people. Failures can manifest themselves differently depending on the context where they are found. For example, if one test machine has less memory than another, it might exacerbate the underlying problem. Sometimes knowing this information is helpful for tracking it down, so I don’t hesitate to change environments if an intermittent problem occurs more frequently in one than another, using the environment a manipulated variable.

Patterns

When we start learning to track down bugs when we find them in a product, we learn to repeat exactly what we were doing prior to the bug occurring. We repeat the steps, repeat the failure, and then weed out extraneous information to have a concise bug report. With intermittent bugs, these details may not be important. In many cases I’ve seen defect reports for the same bug logged as several separate bugs in a defect database. Some of them have gone back for two or three years. We seldom look for patterns, instead, we focus on actions. With intermittent bugs, it is important to weed out the details, and apply an underlying pattern to the emerging theory.

For example, if a web app is crashing at a certain point, and we see SQL or database connection information in a failure log, a conjecture might be: “Could it be a database synchronization issue?” Through collaboration with others, and using tools, I could find information on where else in the application the same kind of call to a database is made, and test each scenario that makes the same kind of call to try to refute that conjecture. Note that this conjecture is based on the information we have available at the time, and is drawn from inference. It isn’t blind guesswork. The conjecture can be based on inference to the best explanation of what we are observing, or “abductive inference”.

A pattern will emerge over time as this is repeated, and more information is drawn in from outside sources. That conjecture might be false, so I adjust and retest and record the resulting information. Once a pattern is found, the details can be filled in once the bug is repeatable. This is difficult to do, and requires patience and introspection as well as collaboration with others. This introspection is something I call “after the fact pattern analysis”. How do I figure out what was going on in the application when the bug occured, and how do I find a pattern to explain what happened? This emerges over time, and may change directions as more information is gathered from various sources. In some cases, my original hunch was right, but getting a repeatable case involved investigating the other possibilities and ruling them out. Aspects from each of these experiments shed new light on an emerging pattern. In other cases, a pattern was discovered by a process of elimination where I moved from one wrong theory to the next in a similar fashion.

The different patterns that I apply are the manipulated variables in the experiment, and the resulting behavior is the responding variable. Once I can repeat the responding variable on command, it is time to focus on the details and work with a developer on getting a fix.

Update:
Patterns are probably the most important dimension, and reader feedback shows I didn’t go into enough detail in this section. I’ll work on the patterns dimension and explain it more in another post.

People

When we focus on technical details, we frequently forget about people. I’ve posted before about creating a user profile, and creating a model of the user’s environment. James Bach pointed me to the work of John Musa who has done work in software reliability engineering. The combination of the user’s profile and their environment I was describing is called an “operational profile”.

I also rely heavily on collaboration when working on intermittent bugs. Many of these problems would have been impossible for me to figure out without the help and opinions of other testers, developers, operations people, technical writers, customers, etc. I recently described this process of drawing in information at the right time from different specialists to some executives. They commented that it reminded them of medical work done on a patient. One person doesn’t do it all, and certain health problems can only be diagnosed with the right combination of information from specialists applied at just the right time. I like the analogy.

Tools & Techniques

When Exploratory Testing, I am not only manually testing, but also use whatever tools and techniques help me build a model to describe the problem I’m trying to solve. Information from automated tests, log analyzers, looking at the source code, the system details, anything might be relevant to help me build a model on what might be causing the defect. As James Bach and Cem Kaner say, ET isn’t a technique, it’s a way of thinking about testing. Exploratory Testers use diverse techniques to help gather information and test out theories.

I refer to using many automated or diagnostic testing tools to a term I got from Cem Kaner: “Computer Assisted Testing.” Automated test results might provide me with information, while other automated tests might help me repeat an intermittent defect more frequently than manual testing alone. I sometimes automate certain features in an application that I run while I do manual work as well which I’ve found to be a powerful combination for repeating certain kinds of intermittent problems. I prefer the term Computer Assisted Testing over “automated tests” because it doesn’t imply that the computer takes the place of a human. Automated tests still require a human brain behind them and to analyze their results. They are a tool, not a replacement for human thinking and testing.

Next time you see a bug get assigned to an “unrepeatable” state, review James’ post. Be patient, and don’t be afraid to stand up to adversity to get to the cause. Together we can wipe out the term “unrepeatable bug”.

The Role of a Tester on Agile Projects

I’ve stopped using this phrase: “The role of a tester on agile projects”. There has been endless debate over whether there should be dedicated testers on agile projects. In the end, endless debate doesn’t appeal to me. I’ve been on enough agile teams now to see that testers can add value in pretty much the same way they always have. They provide a service to a team and customers that is information-based. As James Bach says: “testing lights the way.” Programmer testing and tester testing complement each other, and agile projects can provide an ideal environment for an amazing amount of collaboration and testing. It can be hard to break in to some agile projects though when many agilists seem to view conventional testing with indifference or in some cases with disdain. (Bad experiences with QA Police doesn’t help, so the testing community bears some responsibility with this poor view of testing.)

What I find interesting is hearing experiences from people on Agile projects who fit in and contributed on Agile teams. I like to hear stories about how a Tester worked on an XP team, or how a Business Analyst fit in and thrived on an Agile team. One of the most fascinating stories I’ve encountered was a technical writer who worked on an XP team. It was amazing to find out how they adapted and filled a “team member role” to help get things done. It’s even more fascinating to find how the roles of different team members change over time, and how different people learn new skills and roll up their sleeves to get things done. There is a lot of knowledge in areas such as user experience work, testing, technical writing and others that Agile team members can learn from. In turn, they can learn a tremendous amount from Agilists. This collaborative learning helps move teams forward and can really push a team’s knowledge envelope. When people share successes and failures of what they have tried, we all benefit.

I like to hear of people who work on a team in spite of being told “dedicated testers aren’t in the white book”, or “our software doesn’t need documentation”. I’m amazed at how adaptable smart, talented people are who are blazing trails and adding value on teams that have challenging and different constraints. A lot of the “we don’t need your role” discussion sounds like the same old arguments testers, technical writers, user experience folks and others have been hearing already for years from non-Agilists. Interestingly enough, those who work on agile projects often report that in spite of initial resistance, they manage to fit in and thrive once they adapt. Those who were their biggest opponents at the beginning of a project often become their biggest supporters. This says to me that there are smart, capable people from a lot of different backgrounds who can offer something to any team they are a part of.

The Agile Manifesto says: “we value: Individuals and interactions over processes and tools.” Why then is there so much debate over the “tester role”? Many Agile pundits dismiss having dedicated testers on Agile projects. I often hear: “There is no dedicated tester role needed on an Agile team”. Isn’t this notion of a “role” to be excluded from a team putting a process over people? If someone joins an Agile team who is not a developer and they believe in the values of the methodology and want to work with a great team, do we turn them away? Shouldn’t we embrace the people even if the particular process we are following does not spell out their duties?

I would like to see people from non-developer roles be encouraged to try working on agile teams and share successes and failures so we can all benefit. I still believe software development has a long way to go, and we should try to improve every process. A danger of codifying and spreading a process is that it doesn’t have all the answers for every team. When we don’t have all the answers, we need to look at the motivations and values behind a process. For example, what attracted me most to XP were the values. My view on software processes is that we should embrace the values, and use them as a base to strive towards constant improvement. That means we experiment and push ideas forward, and fight apathy, hubris and as Deming said, drive out fear.

Unhealthy Goals

Chris Morris has blogged recently on a topic that is frequently misunderstood. There is often an attitude that we should shoot as high as we can when setting project goals. “Why not attempt to achieve perfection? Even if we don’t get there, we’ll get better results than if we set a lower goal.” is the line of thinking. Unfortunately on software projects (and probably other projects), this line of thinking can have the opposite effect, actually causing harm to a project.

Here are some project goals from my experience that have hindered the quality of a product, and have been detrimental to the development team:

  1. 100% test coverage
  2. Zero defects
  3. 100% test automation

Goal 1, “100% test coverage” is easily refuted as shown in this paper by Cem Kaner: The Impossibility of Complete Testing. What’s wrong with having this as a goal? An experienced tester realizes that it is an impossible task, and will probably feel like they are now on a death march project. At best, they might feel demoralized and not be able to finish an impossible task. At worst, they might feel pressured to falsify test results to please management.

An inexperienced tester might get complacent, and stop thinking about testing, instead rubber stamping the software with a suite of regression tests. Why should we challenge our ideas about testing the software when we already have 100% coverage? Every time I have seen a product go out the door with “100% Coverage”, a bug was found in the field. This ruins the credibility of the project team, especially if these numbers are used to measure performance, or market some sort of quality.

Goal 2, “Zero Defects” is also an impossible goal. If we can’t test every possible permutation and combination that the software might be exercised with ever, how can we guarantee that our software has zero defects? But this is a good goal you say, even if it is unachievable. Not in my experience. Every zero defect attempt project I have seen has caused defect reporting to become politicized. After the initial exhuberance wears off and testers and developers realize they are finding a lot of defects, strange things happen. Terminology changes, so certain classes of defects are called “issues” or “variances” or “thingies” so that they aren’t measured anymore. Developers (and managers) pressure testers and other developers to not log defects. Defects are closed without being fixed. A “shadow process” emerges where the defects that really need to be fixed aren’t logged through formal channels. Instead, they are logged and fixed away from the eyes and ears of other teammates and management. Even more defects can be injected into the code because of the resulting lack of communication and collaboration.

This is important to note, as Joel writes: “Fixing bugs is only important when the value of having the bug fixed exceeds the cost of the fixing it.” These are businesses we are running and working for, and everything we do needs to make sense for the financial health of the company.

Goal 3, “100% test automation” has been recently re-popularized by the agile development community. The problem with this goal, is that there are certain tests that we aren’t able to automate. In fact, there is little about testing that we can automate, especially if the end user of the software is a human. An automated test script is a very rough approximation of what a human does because computers are not intelligent. Entire classes of tests are ignored or not thought of because they do not fit this testing paradigm, especially exploratory testing. Once automated test suites become sufficiently large, maintenance becomes an issue. There is pressure to not add or execute new test cases on software because there are too many automated test cases to worry about.

Thorough, accurate, meaningful testing can be sacrificed when it is more important to automate tests to reach this goal. Relying too much on these automated tests often allows bugs to be released that a human would have caught instantly. Less rich, manual testing is completed as those cycles are taken up with maintenance.

Measuring individuals and teams by these sorts of standards is almost always counter-productive. SMART goals and other devices used on performance appraisals map nicely to numbers and percentages, but there is little in life we do that can be accurately mapped to a two-variable graph. There are lots of other factors beyond our control which are known as “spurious variables”. At some point, people who are measured against something that is always just beyond their reach will cause them to behave in unintended ways. There are even Dilbert cartoons about rewarding developers for the amount of bugs fixed, and measuring testers on bugs found is equally counter-productive. Both can lead to a breakdown within a team and product quality suffering.
When I talk to the people who decide on these goals, in most cases the actual goal they set isn’t the intended result they are looking for. If a senior manager says to me that they have a goal for zero defects, I ask why.

Usually there is a quality issue that is angering customers with the software being delivered, and they desperately want it fixed. If the issue is reframed to: “Would it be ok if the software you delivered was reliable and robust enough so that your customers are happy?” “Why not make having happy customers who can rely on our software as our goal?” Often, that is a satisfactory goal to the manager, and is something that is reasonable to shoot for. It is also helpful to look at goals over the long term, and have a goal of consistent attempts at improvement.

When I hear things like “100% test automation”, and I question further, it is almost always about efficiency. If the issue is reframed to: “We need to look for ways to be more efficient in our testing. Why don’t we analyze our testing processes (both manual and automated), and choose the most efficient methods we can, and keep working for more efficiency. Why not make efficiency our goal?” In some cases, strategic manual testing may be more efficient and cost effective than automated testing. In many projects, especially in the beginning of a test automation effort, a little test automation can go a long way.

The true motivation behind these goals is important to understand. In many cases I’ve seen the numbers line up nicely with the goals, only to have the intended but unexpressed goal fail. “That’s great that you have zero defects when shipping, but our customers are unhappy!” an executive might say. Mary Poppendieck says: “Measure Up!” This means measure what is really important. Look at the big picture. If you measure details too much, you may miss the big picture. If you do set details-oriented goals, carefully analyze resources, schedules and the people on the project instead of just picking a number to shoot for. Be sure to measure how the goals feed the big picture. If they aren’t helping contribute to the big picture (the bottom line, happy customers, a happy, healthy, productive team), drop them. It might be surprising under scrutiny to see how many of these unrealistic goals are barriers to a good bottom line, happy customers and a happy, healthy, productive team.

Many process certified projects with wonderful charts and graphs and SMART goals all over the place release crummy products that customers quietly stop using. After all the cheering over process numbers fades and the company finds it can’t sell products like it used to, people wonder why. If we measure product success instead of adherence to a process, and measure how the project feeds the bottom line instead of things like “Zero Defects”, we might end up with better results.

Occam’s Butter Knife

When I work in test automation, I always seek the most simple solution that will work for the task at hand. Extreme Programming’s STTCPW (simplest thing that can possibly work) and YAGNI (you aren’t going to need it) are principles I’ve applied to test automation. Occam’s Razor is a principle that can be interpreted as having the same meaning as STTCPW. Simple tests are something I value because I don’t:

  • Want complex test code that can be a source for bugs
  • Have a lot of time for maintenance
  • Wish to have tests I can’t throw away easily

Recently, while automating a complex functional test for a tester, I ran into a difficult problem. I was retro-fitting a test into a framework and I couldn’t figure out how to translate the model into automated test code. I struggled for a while, and paired with the tester (who was a non-programmer) and described my problem. I was thinking about a Decorator pattern, and data structures and algorithms, but couldn’t make this model fit. I was trying to get a simple solution together, but I had constraints.

The tester patiently listened as I explained my problem and diagrammed on a white board, explained what I was trying to do, and then when I was finished said: “Forgive me for stating the obvious…” and paused. I said “What may be obvious to you may be something I haven’t considered, please continue.” The tester explained to me that I had already created that model when I had automated test data generation, and asked why I couldn’t re-use it? They were right on the money. The solution didn’t involve any design patterns or clever solutions, I just had to go grab the data from a different area of the test. The solution was sitting right there in front of me, but I was looking at the problem in the wrong way.

Non-programming testers can offer a lot to development efforts. Their minds are not cluttered with design patterns, xUnit frameworks, data structures and algorithms which allows them to see things in a different way. When I’m up to my eyeballs in code, I sometimes stop thinking like a tester. In spite of my wish to get a simple design, I had needlessly over complicated my automated test development. So much for my attempt to apply Occam’s Razor – I needed a tester to help me see the simple solution. When I’m a tester working in test automation, I sometimes lose that “tester’s perspective”. Collaboration is key, and also helps me when I’m not particularly sharp.

Testing an Application in Layers

There is often debate about test automation versus manual testing. When I think about testing, I look at an application in 3 broad layers: the code (on the machine side), the system (where the finished software lives), and the visible layer, or how the software is used from an end user’s perspective. I often call this visible layer the social context because of the environment much end-user software is used in. When we spend a lot of time in one context, testing starts to specialize because we concentrate on part of the picture.

When we view an application from the source code view, the testing is dominated by automation. When we look at the system context (as some of my operations friends do), testing involves integration in a system, and testing hardware, firmware, drivers, etc. to make sure the software gets served up correctly. Automated testing tends to get more complex the more we move from the code to the user interface. Attempting to emulate user actions is difficult, and high-volume automated functional tests can involve massive amounts of automated test code. This can be problematic to maintain. Sometimes functional tests become so complex they involve as much test code as the software they are testing has to serve up a component, plus test data generation code, as well as the code that attempts to emulate user actions.

Personally, I agree with Cem Kaner and call automated testing “computer-assisted testing”. The computer is a tool I use in conjunction with good manual testing. Until machines are intelligent, we can’t really automate testing. We can automate some aspects of testing to help maximize problem-solving efficiency.

Traditionally, software testers tend to have a handle on the social context, or how the software is used in a business context. As a result, much conventional testing is focused on the visible layer of the application. I tend to prefer testing at various layers in an application. I value testing components in isolation as well as testing software within a system. There are advantages and drawbacks to both. While I value isolation, testing, particularly brain-engaged manual testing at a visible UI layer has merit. Sometimes testers focus a great deal on testing in this context when component isolation might be more efficient. Often, traditional testers hope for an automated testing tool that can do the work of a tester. I’ve yet to see this occur successfully, but there are still tasks that can be automated to help testing efforts that are a big help.

Frequently there are bugs that are difficult to track down that crop up in the visible layer of the application. These are due to the visible application at runtime becoming greater than the sum of its parts. There is a kind of chaos theory situation that occurs due to the application being used in a way that the underlying code may not be designed to handle. By the time a minor fault at the code level bubbles up to the UI, it may have rippled through the application causing a catasrophic failure. Unfortunately, these kinds of usage-driven faults are problems automated tests at various layers do not tend to catch. Often it is some sort of strange timing issue as I note in this article. Other times, it’s due to actions undertaken in a social environment by an unpredictable, cognitive human. These variable actions motivated by inductive reasoning, driven by tacit knowledge are difficult to repeat by others.

Focusing too much on one testable interface in an application can skew our view. If we view the application by the code most of the time, we have a much different picture than if we view it through the UI. Try flipping the model of an application on it’s side instead of viewing it bottom up(from the code), or top down(from the ui). You may discover new areas to test that require different techniques, some of which are great candidates for automation, while others require manual, human testing.

User Profiles and Exploratory Testing

Knowing the User and Their Unique Environment

As I was working on the Repeating the Unrepeatable Bug article for Better Software magazine, I found consistent patterns in cases where I have found a repeatable case to a so-called “unrepeatable bug”. One pattern that surprised me was how often I do user profiling. Often, one tester or end-user sees a so-called unrepeatable bug more frequently than others. A lot of my investigative work in these cases involves trying to get inside an end-user’s head (often a tester) to emulate their actions. I have learned to spend time with the person to get a better perspective on not only their actions and environment, but their ideas and motivations. The resulting user profiles fuel ideas for exploratory testing sessions to track down difficult bugs.

Recently I was assigned the task of tracking down a so-called unrepeatable bug. Several people with different skill levels had worked on it with no success. With a little time and work, I was able to get a repeatable case. Afterwards, when I did a personal retrospective on the assignment, I realized that I was creating a profile of the tester who had come across the “unrepeatable” cases that the rest of the dev team did not see. Until that point, I hadn’t realized to what extent I was modeling the tester/user when I was working on repeating “unrepeatable” bugs. My exploratory testing for this task went something like this.

I developed a model of the tester’s behaviour through observation and some pair testing sessions. Then, I started working on the problem and could see the failure very sporadically. One thing I noticed was that this tester did installations differently than others. I also noticed what builds they were using, and that there was more of a time delay between their actions than with other testers (they often left tasks mid-stream to go to meetings or work on other tasks). Knowing this, I used the same builds and the same installation steps as the tester; I figured out that part of the problem had to do with a Greenwich Mean Time (GMT) offset that was set incorrectly in the embedded device we were testing. Upon installation, the system time was set behind our Mountain Time offset, so the system time was back in time. This caused the system to reboot in order to reset the time (known behavior, working properly). But, as the resulting error message told me, there was also a kernel panic in the device. With this knowledge, I could repeat the bug about every two out of five times, but it still wasn’t consistent.

I spent time in that tester’s work environment to see if there was something else I was missing. I discovered that their test device had connections that weren’t fully seated, and that they had stacked the embedded device on both a router and a power supply. This caused the device to rock gently back and forth when you typed. So, I went back to my desk, unseated the cables so they barely made a connection, and—while installing a new firmware build—tapped my desk with my knee to simulate the rocking. Presto! Every time I did this with a same build that this tester had been using, the bug appeared.

Next, I collaborated with a developer. He went from, “that can’t happen,” to “uh oh, I didn’t test if the system time is back in time, *and* that the connection to the device is down during installation to trap the error.” The time offset and the flakey connection were causing two related “unrepeatable” bugs. This sounds like a simple correlation from the user’s perspective, but it wasn’t from a code perspective. These areas of code were completely unrelated and weren’t obvious when testing at the code level.

The developer thought I was insane when he saw me rocking my desk with my knee while typing to repeat the bug. But when I repeated the bugs every time, and explained my rationale, he chuckled and said it now made perfect sense. I walked him through my detective work, how I saw the device rocking out of the corner of my eye when I typed at the other tester’s desk. I went through the classic conjecture/refutation model of testing where I observed the behavior, set up an experiment to emulate the conditions, and tried to refute my proposition. When the evidence supported my proposition, I was able to get something tangible for the developer to repeat the bug himself. We moved forward, and were able to get a fix in place.

Sometimes we look to the code for sources of bugs and forget about the user. When one user out of many finds a problem, and that problem isn’t obvious in the source code, we dismiss it as user error. Sometimes my job as an exploratory tester is to track down the idiosyncrasies of a particular user who has uncovered something the rest of us can’t repeat. Often, there is a kind of chaos-theory effect that happens at the user interface, that only a particular user has the right unique recipe to cause a failure. Repeating the failure accurately not only requires having the right version of the source code and having the test system deployed in the right way, it also requires that the tester knows what a that particular user was doing at that particular time. In this case, I had all three, but emulating an environment I assumed was the same as mine was still tricky. The small differences in test environments, when coupled with slightly different usage by the tester, made all the difference between repeating the bug and not being able to repeat it. The details were subtle on their own, but each nuance, when put together, amplified each other until the application had something it couldn’t handle. Simply testing the same way we had been in the tester’s environment didn’t help us. Putting all the pieces together yielded the result we needed.

Note: Thanks to this blog post by Pragmatic Dave Thomas, this has become known as the “Knee Testing” story.


Quick Update

I’ve been blogging less lately; have had a lot on the go. I’m also spending more time writing code and will spare you the pain of having to read source code in a blog post.

Some quick points:

Software Testing and Scrum

Edit: update. I wrote an article for InformIT on this topic which was published Sept. 30, 2005.

I’ve been getting asked lately about how a software testing or QA department fits when a development team adopts Scrum. I’ll post my experiences of working as a conventional tester on a variety of Scrum projects. Stay tuned for more posts on this subject.

TDD in Test Automation Projects

I’ve written before about pairing with developers during Test-Driven Development(TDD), and I’ve been fortunate to work with very talented TDD developers who are apt to teach. I’ve learned a lot, and decided to try TDD in a programming role. Recently, I’ve taken off my tester hat and started doing TDD myself with test automation projects. I’m not completely there yet – I often need to do an architectural spike first when I’m developing something new. Once I have figured out a general design, or have learned how a particular library works, I throw away the spike code and start off development by writing a test. I then write enough code to get the test to pass, write a new test, add new code and repeat until the design is where I need it to be.

So what does this gain in test automation projects? I’m loathe to have test cases that are so complex that they themselves require testing. If our test cases are so complex that they are causing problems themselves, that’s a test design smell. However, there are other kinds of software in our automation projects than just the test cases. In automation frameworks, we need special libraries, or adaptors, or a way to access an application we need to test, and all sorts of utilities that help us with automation. Since these utilities are still software, they are subject to the same problems that any other software development effort is. Sometimes there is nothing more frustrating than buggy test code, so we need to do what we can to make it as reliable as possible.

In my own development, I’m finding a lot of benefits doing TDD. My designs improve, because if they aren’t testable, I know there is a problem. When I make my code testable, it suddenly becomes more usable and more reliable. Too often testers aren’t given time to refactor test code. I’ve found that refactoring usually starts when technical debt in a test harness and custom test library are starting to interfere with productivity. When there are no unit tests for custom test library code, it can involve a few minutes to change the code, and several hours testing it. Having a safety net of unit tests helps immensely with refactoring. You can refactor your test code with greater confidence, and when it’s done consistently with automated unit tests, with much greater speed. It just becomes a normal part of development.

Recently, I’ve found several bugs in my test library code in the elaborative phase of TDD that I didn’t find testing manually. The opposite is also true, I tested libraries I was developing manually and found a couple of bugs that my unit tests didn’t uncover. The TDD-discovered bugs required design changes that helped my design immensely. The tests guided the design into something different (and better) than what I had in my head when I started. However, after a couple of days of only running the unit tests to satisfy the “green bar”, I found a big hole that was only uncovered by using the library the way an end user would. The balance of testing techniques is helpful. I have also adopted a practice of pairing a positive test with a negative test, a technique I learned from John Kordyback. If I do an assert_equal for example, I also do an assert_not_equal (using test::unit style). This has really come in handy at times where one assertion would work, but the other would fail.

TDD may not be for everyone, but I find it is a nice complement to other kinds of testing we can do. For my own work, it seems to suit the way I think about development. Even if the test cases I develop while programming are trivial and small, there is strength in numbers, and writing and using them helps assuage that tester voice in the back of my head that comes out when programming. I encourage other conventional testers who work on automation projects to give it a try. You may find that your designs improve, and you have a safety net of automated tests to give you more confidence in your automation code, especially when you need to enhance it down the road. At the very least, it helps you gain an appreciation and helps you communicate when working with TDD folks on a project.