All posts by jonathankohl

Software Testing 2.0?

For so many years the Quality Assurance ideal has dominated software testing. “QA”-flavored software testing often feels like equal parts of Factory School and Quality School thrown together. When I was starting out as a tester, I quickly learned through hard experience that a lot of the popular software testing thought was built around folklore. I wanted results, and didn’t like process police, so I often found myself at odds with the Quality Assurance community. I read writings by Cem Kaner, James Bach and Brian Marick, and worked on my testing skill.

When the Agile movement kicked into gear, I loved the Agile Manifesto and the values agile leaders were espousing, but the Agile Testing community rehashed a lot of the same old Factory School folklore. Instead of outsourcing testing to lesser-skilled, cheaper human testers, testing was often outsourced to automated testing tools. While there were some really cool ideas, “Agile Testing” ideals still frequently felt like testing didn’t require skills, other than programming. I was frequently surprised at how “Agile Testing” thought was attracted to a lot of the old Factory School thoughts, like they were oppositely charged magnets. As a big proponent of skilled testing, I found I was often at odds with “Agile Testers”, even though I agreed with the values and ideals behind the movement. Testing in that community did not always feel “agile” to me.

Then Test-Driven Development really got my attention. I worked with some talented developers who taught me a lot, and wanted to work with me. They told me they wanted me to work with them because I thought like they did. I was still a systems thinker, but I came at the project from a different angle. Instead of confirming that their code worked, I creatively thought of ideas to see if it might fail. They loved those ideas because it helped them build more robust solutions, and in turn, taught me a lot about testing through TDD. I learned that TDD doesn’t have a lot to do with testing in the way I’m familiar with, but is still a testing school of thought. It is focused on the code-context, and I tend to do more testing from user contexts. Since I’m not a developer, and TDD is predominantly a design tool, I wasn’t a good candidate for membership in the TDD community.

The Context-Driven Testing School is a small, influential community. The founders all had an enormous influence on my career as a tester. One thing this community has done is build up and teach skilled testing, and has influenced other communities. Everywhere I go, I meet smart, talented, thoughtful testers. In fact, I am meeting so many, that I believe a new community is springing up in testing. A community born of experience, pragmatism and skill. Testers with different skillsets and ideas are converging and sharing ideas. I find this exciting.

I’m meeting testers in all sorts of roles, and often the thoughtful, skilled ones aren’t necessarily “QA” folks. For example, some of my thought-leader testing friends are developers who are influenced by TDD. Some skilled testers I meet are test automation experts, some are technical writers, some are skilled with applying exploratory testing concepts. All are smart, talented and have cool ideas. I am meeting more and more testers from around the world with different backgrounds and expertise who share a common bond of skill. I’m beginning to believe that a new wave of software testing is coming, and a new skills-focused software testing community is being formed through like-minded practitioners all over the world. This new community growing in the software development world is driven by skilled testers.

This is happening because skilled testers are sharing ideas. They are sharing their results by writing, speaking, and practicing skilled testing. Results mean something. Results build confidence in testers, and in the people who work with them. Skill prevails over process worship, methodology worship and tool worship. I’ve said before that skilled software testing seems to transcend the various methodologies and processes and add value on any software development project. I’m finding that other testers are finding this out as well. This new wave of skilled tester could be a powerful force.

Are you frustrated with the status quo of software testing? Are you tired of hearing the same hollow maxims like “automate all tests”, “process improvement” and “best practices”? Do you feel like something is missing in the Quality Assurance and Agile communities when it comes to testing? Do you feel like you don’t fit in a community because of your views on testing? You aren’t alone. There are many others who are working on doing a better job than we have been doing for the past few years. Let’s work together to push skilled software testing as far as it will go. Together, we are creating our own community of practice. The “second version” of software testing has begun to arrive.

Reckless Test Automation

The Agile movement has brought some positive practices to software development processes. I am a huge fan of frequent communication, of delivering working software iteratively, and strong customer involvement. Of course, before “Agile” became a movement, a self-congratulating community, and a fashionable term, there were companies following “small-a agile” practices. Years ago in the ’90s I worked for a startup with a CEO who was obsessed with iterative development, frequent communication and customer involvement. The Open Source movement was an influence on us at that time, and today we have the Agile movement helping create a shared language and a community of practice. We certainly could have used principles from Scrum and XP back then, but we were effective with what we had.

This software business involves trade-offs though, and for all the good we can get from Agile methods, vocal members of the Agile community have done testing a great disservice by emphasizing some old testing folklore. One of these concepts is “automate all tests”. (Some claimed agilists have the misguided gall to claim that manual testing is harmful to a project. Since when did humans stop using software?) Slavishly trying to reach this ideal often results in: Reckless Test Automation. Mandates of “all”, “everything” and other universal qualifiers are ideals, and without careful, skillful implementation, can promote thoughtless behavior which can hinder goals and needlessly cost a lot of money.

To be fair, the Agile movement says nothing officially about test automation to my knowledge, and I am a supporter of the points of the Agile Manifesto. However, the “automate all tests” idea has been repeated so often and so loudly in the Agile community, I am starting to hear it being equated with so-called “Agile-Testing” as I work in industry. In fact, I am now starting to do work to help companies undo problems associated with over-automation. They find they are unhappy with results over time while trying to follow what they interpret as an “Agile Testing” ideal of “100% test automation”. Instead of an automation utopia, they find themselves stuck in a maintenance quagmire of automation code and tools, and the product quality suffers.

The problems, like the positives of the Agile movement aren’t really new. Before Agile was all the rage, I helped a company that had spent six years developing automated tests. They had bought the lie that vendors and consultants spouted: “automate all tests, and all your quality problems will be solved”. In the end, they had three test cases developed, with an average of 18 000 lines of code each, and no one knew what their intended purpose was, what they were supposed to be testing, but it was very bad if they failed. Trouble was, they failed a lot, but it took testers anywhere from 3-5 days to hand trace the code to track down failures. Excessive use of unrecorded random data sometimes made this impossible. (Note: random data generation can be an incredibly useful tool for testing, but like anything else, should be applied with thoughtfulness.) I talked with decision makers and executives, and the whole point of them buying a tool and implementing it was to help reduce the feedback loop. In the end, the tool greatly increased the testing feedback loop, and worse, the testers spent all of their time babysitting and maintaining a brittle, unreliable tool, and not doing any real, valuable testing.

How did I help them address the slow testing feedback loop problem? Number one, I de-emphasized relying completely on test automation, and encouraged more manual, systematic exploratory testing that was risk-based, and speedy. This helped tighten up the feedback loop, and now that we had intelligence behind the tests, bug report numbers went through the roof. Next, we reduced the automation stack, and implemented new tests that were designed for quick feedback and lower maintenance. We used the tool to complement what the skilled human testers were doing. We were very strict about just what we automated. We asked a question: “What do we potentially gain by automating this test? And, more importantly, what do we lose?” The results? Feedback on builds was reduced from days to hours, and we had same-day reporting. We also had much better bug reports, and frankly, much better overall testing.

Fast-forward to the present time. I am still seeing thoughtless test automation, but this time under the “Agile Testing” banner. When I see reckless test automation on Agile teams, the behavior is the same, only the tools and ideals have changed. My suggestions to work towards solutions are the same: de-emphasize thoughtless test automation in favor of intelligent manual testing, and be smart about what we try to automate. Can a computer do this task better than a human? Can a human do it with results we are happier with? How can we harness the power of test automation to complement intelligent humans doing testing? Can we get test automation to help us meet overall goals instead of thoughtlessly trying to fullfill something a pundit says in a book or presentation or on a mailing list? Are our test automation efforts helping us save time, and helping us provide the team the feedback they need, or are they hindering us? We need to constantly measure the effectiveness of our automated tests against team and business goals, not “percentage of tests automated”.

In one “Agile Testing” case, a testing team spent almost all of their time working on an automation effort. An Agile Testing consultant had told them that if they automated all their tests, it would free up their manual testers to do more important testing work. They had automated user acceptance tests, and were trying to automate all the manual regression tests to speed up releases. One release went out after the automated tests all passed, but it had a show-stopping, high profile bug that was an embarassment to the company. In spite of the automated tests passing, they couldn’t spot something suspicious and explore the behavior of the application. In this case, the bug was so obvious, a half-way decent manual tester would have spotted it almost immediately. To get a computer to spot the problem through investigation would have required Artificial Intelligence, or a very complex fuzzy logic algorithm in the test automation suite, for one quick, simple, inexpensive, adaptive, yet powerful human test. The automation wasn’t freeing up time for testers, it had become a massive maintenance burden over time, so there was little human testing going on, other than superficial reviews by the customer after sprint demos. Automation was king, so human testing was de-emphasized and even looked on as inferior.

In another case, developers were so confident in their TDD-derived automated unit tests, they had literally gone for months without any functional testing, other than occasional acceptance tests by a customer representative. When I started working with them, they first defied me to find problems (in a joking way), and then were completely flabbergasted when my manual exploratory testing did find problems. They would point wide-eyed to the green bar in their IDE signifying that all their unit tests had passed. They were shocked that simple manual test scenarios could bring the application to its knees, and it took quite a while to get them to do some manual functional testing as well as their automated testing. It took them a while to leave their automation dogma aside, to become more pragmatic, and then figure out how to also incorporate important issues like state into their test efforts. When they did, I saw a marked improvement in the code they delivered once stories were completed.

In another “Agile Testing” case, the testing team had put enormous effort into automating regression tests and user acceptance tests. Before they were through, they had more lines of code in the test automation stack than what was in the product it was supposed to be testing. Guess what happened? The automation stack became buggy, unwieldly, unreliable, and displayed the same problems that any software development project suffers from. In this case, the automation was done by the least skilled programmers, with a much smaller staff than the development team. To counter this, we did more well thought out and carefully planned manual exploratory testing, and threw out buggy automation code that was regression test focussed. A lot of those tests should never have been attempted to be automated in that context because a human is much faster and much superior at many kinds of tests. Furthermore, we found that the entire test environment had been optimized for the automated tests. The inherent system variablity the computers couldn’t handle (but humans could!), not to mention quick visual tests (computers can’t do this well) had been attempted to be factored out. We did not have a system in place that was anything close to what any of our customers used, but the automation worked (somewhat). Scary.

After some rework on the testing process, we found it cheaper, faster and more effective to have humans do those tests, and we focussed more on leveraging the tool to help achieve the goals of the team. Instead of trying to automate the manual regression tests that were originally written for human testers, we relied on test automation to provide simulation. Running simulators and manual testing at the same time was a powerful investigative tool. Combining simulation with observant manual testing revealed false positives in some of the automated tests which had to been unwittingly released to production in the past. We even extended our automation to include high volume test automation, and we were able to greatly increase our test effectiveness by really taking advantage of the power of tools. Instead of trying to replicate human activities, we automated things that computers are superior at.

Don’t get me wrong – I’m a practitioner and supporter of test automation, but I am frustrated by reckless test automation. As Donald Norman reminds us, we can automate some human tasks with technology, but we lose something when we do. In the case of test automation, we lose thoughtful, flexible, adaptable, “agile” testing. In some tasks, the computer is a clear winner over manual testing. (Remember that the original “computers” were humans doing math – specifically calculations. Technology was used to automate computation because it is a task we weren’t doing so well at. We created a machine to overcome our mistakes, but that machine is still not intelligent.)

Here’s an example. On one application I worked on, it took close to two weeks to do manual credit card validation by testers. This work was error prone (we aren’t that great at number crunching, and we tire doing repetitive tasks.) We wrote a simple automated test suite to do the validation, and it took about a half hour to run. We then complemented the automated test suite with thoughtful manual testing. After an hour and a half of both automated testing (pure number crunching), and manual testing (usually scenario testing), we had a lot of confidence in what we were doing. We found this combination much more powerful than pure manual testing or pure automated testing. And it was faster than the old way as well.

When automating, look at what you gain by automating, and what you lose by automating a test. Remember, until computers become intelligent, we can’t automate testing, only tasks related to testing. Also, as we move further away from the code context, it usually becomes more difficult to automate tests, and the trade-offs have greater implications. It’s important to make considerations for automated test design to meet team goals, and to be aware of the potential for enormous maintenance costs in the long term.

Please don’t become reckless trying to fulfill an ideal of “100% test automation”. Instead, find out what the goals of the company and the team are, and see how all the tools at your disposal, including test automation can be harnessed to help meet those goals. “Test automation” is not a solution, but one of many tools we can use to help meet team goals. In the end, reckless test automation leads to feckless testing.

Update: I talk more about alternative automation options in my Man and Machine article, and in chapter 19 in the book: Experiences of Test Automation.

Rick Brenner

I met Rick Brenner at Consultants Camp. I’ve been following his writing ever since, and I get a lot out of what he has to say. If you haven’t read his work before, check out his article on “The Tweaking CC”. It is a good article, and the message is important.

I look forward to his email newsletter “Point Lookout” every week, and I’ve got much more out of what Rick has to say than many of the popular “management” books.

RE: Model Myopia in Testing

Chris McMahon asks:

How do I find a useful model when unit tests are passing and show 100% code coverage?

Chris asked me this a bit tongue-in-cheek because he has heard me rant about these sorts of things before. I’m glad he asked the question though, because it brings up a different kind of model myopia. In yesterday’s post, we have modeling myopia with regards to the application we are testing. This question brings up model myopia regarding our potential testing contexts.

This is a common question, particularly on agile teams that utilize automated unit testing who are struggling with a tricky bug. Let’s examine this situation in more detail. What does the claim in the question tell us about testing?

This claim provides information about a certain testing context, the code context. We know that within this testing context, we have a measure of coverage that claims 100%, and that the tests that achieve this coverage claim pass. Is this enough information? We have a quantitative claim, but this doesn’t tell us much from a qualitative perspective.

Why does this matter?

One project I saw had 300 automated unit tests that achieved a high level of code coverage on a small application. When we did testing in another context (the user context, or through a Graphical User Interface), the application couldn’t pass basic tests. On further investigation, we found that most of the unit tests were programmed to pass, no matter what the result of the test was. The developers were measured on the amount of automated unit tests they wrote, on the code coverage attained as reported by a coverage tool, and they could only check in code when the tests passed. As a measurement maxim says: show me how people are measured, and I’ll show you how they behave.

This is an extreme example, but underlines how dangerous numbers can be without a context. Sure, we had a number of passing tests with a high degree of code coverage on a simple application, but the numbers were almost meaningless from a management perspective because the tests were of poor quality, and many of them didn’t really test anything. I’ve seen reams of automated test code not test anything too many times to take a quantitative claim on its own. (I’ve also seen a lot of manual tests that didn’t test anything much either.)

Now what happens if we agree that the quality of the unit tests are sufficiently good? We don’t have ridiculous measures like SMART goals where we “pay for performance” on automated unit tests, code coverage, bug counts, etc., so we don’t have this “meet the numbers” mentality that frustrates our goals. We know that our tests are well vetted, thought out and have helped steer our design. Our developers are intrinsically motivated to write solid tests, and they are talented and trustworthy. What now? All the tests passed. Where can the problem be?

It’s important to understand that “all the tests we could think of at a particular time passed”, not that “all the possible tests we could ever think of, express in program code and run have passed”. This is an important distinction. Brian Marick has good stories about faults of ommission, and reminds us that a test is only as good as what it tests. It doesn’t test what we didn’t write in the first place. Narrow modeling is usually a culprit, and once we learn more about the problem we can write a test for it. It’s hard to try to write a test for a particular problem that has stumped us and we don’t have enough information yet, or to predict all possible problems up front when doing TDD.

In other cases, I have seen tests pass in an automated framework that were due to using the framework incorrectly which caused false positives. This is rare, but I have seen it happen.

Michael Bolton has summed up automated unit testing assumptions eloquently:

When someone says “all the unit tests passed”, we should hear “the unit tests that we thought of, for theories of error of which we were aware, and that we considered important enough to write given the time we had, and that we presume to have run on the units of the product that we assume to have been built, and that we presume actually test something, and that we presume provide a trustworthy result 100% of the time, passed.”

Testing Contexts

Another important distinction to make is that we know this information comes from one testing context, the code context. We know there are others. I split an application with a user interface into three broad contexts: the code, the system the application is installed in, and the user interface. We know at the user interface layer, the application is greater than the sum of its parts. It will often react differently when tested from this context than it will from the code context. There are many potential testable interfaces within each context that can provide different information when testing. What interfaces, and in what contexts are we gathering test-generated information from?

When I see a claim like this: “we have 100% of our automated unit tests passing with 100% code coverage”, that only tells me about one of three possible model categories for testing. If we haven’t done testing in other contexts, we are only seeing part of the picture. We might be falling into a different type of modeling myopia – looking too narrowly at one testing execution model.

It’s common to only view the application through the context we are used to. I tend to use the term “interface within a context” rather than “black box” and “white box” after my experience with TDD. The interface we interact with the software with the most can cause us to narrow our focus on testing. We then model the testing we can do with a preference to this interface. Mike Kelly has a blog post that deals with bounded awareness and inattentional blindness, and how they can affect our decisions.

I see this frequently with developers who are working predominantly in the code context, and testers who are testing only in the user interface context. Frequently, each treat testing in the other context with some disdain. Often, testing diversity pays off, and when we work together bringing expertise and ideas from any testable interface we were using, we can combat model myopia.

Testing on one context does not mean testing in another context is duplicated effort or unnecessary. Often, testing using a different model reveals a bug that isn’t obvious, or easily reproducible in one context.

Not too long ago, I had two meetings. One was with two developers who asked why I bothered to test against the UI when we had so many other interfaces that were easier to test against behind the GUI. In another meeting, a tester asked why I bothered doing any testing behind the GUI when the end user only uses the GUI anyway. In both cases, my answer was the same: “When I stop finding bugs that you appreciate and find useful in one interface that I couldn’t find more easily in the other, I’ll stop testing against that interface.”

When testing in a context fails to provide us with useful information about the product we are testing, we can drop that model in its importance to our testing. Knowing about a testing model and choosing not to use it is different than thinking we have covered all ourtesting models, and still be stumped as to why we have problems.

Combating Model Myopia in Testing

Part 1

In my last post, I posed a testing problem. This post begins to address how I view the problem, and my motivation behind posing it.

Models

In software testing, a “model” is a representation of something we are trying to understand. It might be a representation in our mind (an idea), a diagram (like an Entity-Relationship diagram), or it might be a physical representation. A model helps us think of a system or concept in a certain way, and as a representation is not going to be perfect or complete. We can have many models for the same thing – our only limitation is our imagination.

Here’s an example:

How many ways can you model a web page? I can think of at least three categories:

  1. as an end-user, or consumer view where I am using a web page for some purpose. I see the webpage as having text, colors, pictures and hyperlinks. I visualize that web page, served up in a web browser when I think about a particular page I find useful, and want to go to it. I interface with it and relate to it as it is shown through a web browser.
  2. as a web developer. I tend to develop web pages using a text editor and do all of the HTML, CSS, etc. by hand. I interface with the web page with a text editor, and I visualize it and relate to it by the HTML view which is made up of markup tags. I interface with the web page primarily through a text editor.
  3. as a web designer. I either have a visual picture in my mind of the layout and design of a web site I want to create and I follow that, or sometimes I draw it out using a tool. Even a rough sketch of a general layout on a piece of paper or on a whiteboard will convey that idea. I visualize the web page according to that layout.

These are three simple examples of modeling a web page. I added a couple of dimensions to help describe the models – a user, and an interface. Each one is different, but I may switch between all three as I’m developing a web page, or when I’m testing.

Since I often test web applications, I think of these different views a lot. I think of the first two very often, especially since I did a lot of web development in the mid-late ’90s. When I started writing web applications, I also learned to model from a programming context. My HTML tags, page layout, and content would be generated according to program logic, so I had a fourth model category derived from my role as a web application programmer. Understanding all the components used in rendering a web page is important to me when testing a web application. Everything from the web browser to the application server, web server, database if one is used, any drivers that are used, as well as the program code design is important information that can help with testing. I actually visualize what is happening across the entire application path when data is input into a web page, sent to the web server, processed, and the page I am on is updated. Even a simple HTML page has an interesting path. From the web server we have a text document that contains HTML markup, with certain tags that denote it as such. When a web browser asks for that page, the web server takes the tags, processes them and sends them via HTTP to the machine that is asking for it. The web browser takes that HTTP response, interprets it, and if it comes through without an error, it processes the HTML and displays it in a web browser.

Why is this important?

Modeling applications like this helps me imagine areas of potential problems based on experience and technical knowledge of the various technologies used, and also helps me narrow down or get repeatable cases for bugs I find in the application. It also helps me generate different testing ideas instead of one “happy path” based off of requirements documents, I can think of other potential usage of the software that might have the potential for problems. When I work with software users in their own environment, I am always amazed at how they use the software in unintended ways which they are perfectly happy doing.

What other models can you think of when testing web applications? Who are the users? Not all of them will be human. What are the interfaces? Not all of them will involve a Graphical User Interface like most modern, familiar web browsers.

But I’ve Run Out of Ideas!

The problem I described was something I deliberately chose as a modeling problem. Notice that in the energy industry, when I got a failure on a search attempt that contained the word “well”, I immediately had a suspicion of what the problem might be. But if I was in another industry, “well” may not be in my common vocabulary as a noun, but primarily an adverb. (Note that language is important when testing.) In my description here, we have at least two modeling categories of the software. One as “enterprise search”, and one as “my energy company search”. The ideas that each model represent are going to be subtly different. That subtle shift between models can be the difference between tracking down the source of a bug or not. But if I didn’t have energy experience, how can I find that model that will help me track down the bug?

I call this so-called running out of ideas “testing model myopia”. If we were omniscient, we would know all the possible ways to model the software, and we’d know where all the bugs were. If we were semi-omniscient and merely knew all the possible ways to model our software, we’d have an easier time. Since we aren’t, we have to use our skill as testing thinkers to break out of mental blocks. This can be very difficult if the entire team has testing model myopia as I described in the problem. When under project stress, with constraints on a development team such as time and resources, a tester may have to build a case in order to be able to pursue a testing idea. The predominant model of the software as shared by the team can be very entrenched, and it can be threatening to have that model questioned or analyzed. Nevertheless, as a tester, we are what Jerry Weinberg has described: “someone who can imagine that there might be another way.”

One of the first places I would start to tackle this problem is to imagine other models of the software that we aren’t addressing when testing.

Stay tuned for Part 2.

Well, well, well

Living in Alberta, I do testing work in the energy industry, or with companies that support the energy industry with software or services. Alberta is Canada’s energy capital, the largest oil and gas producer in the country. Rudyard Kipling said when he visited Alberta that we “…seem to have all hell for a basement”.

As a kid growing up in the ’70s and early ’80s, it seemed every farmer’s field in our area had a pumpjack servicing an oil well, or had a gas well feeding into a pipeline. In the energy industry, the term “well” is common, but vital. Wells are one source from which oil and gas flow, so a lot of resources are put behind drilling to find new sources, or to extract more oil and gas from known producing reservoirs beneath the surface.

I was recently testing an application that serves this industry. It had a neat search function, so I typed in the name of one of my test wells I had set up. I immediately got an error, with messaging informing me that the application could not complete my search because it had “common” words in it. The example it gave was “the”, and the messaging said to try it again without a common word. I immediately suspected that it was filtering out the word “well”, so I repeated the search without the term “well”, and it worked like a charm.

I logged the issue and the developers investigated the cause. (I was off-site, so I didn’t get all of the details I normally would.) They told me that this problem was due to the database server configuration that filters out common words on queries. The software that managed the database had a feature that they tought would be helpful in many contexts. Sadly, this feature the database vendor thought would be helpful caused problems in our context. An assumption with the term “well” as always being a common word for all contexts caused this feature to be useless in our context. Context is important, and assumptions are frequently a source of bugs.

In what other contexts would this be a problem? I can think of at least one. If I had this feature turned on, my database and I worked for a company that managed water sources, I imagine I would run into problems as well with searching in this way, since water wells are also common, particularly in rural areas. But this is a context where the problem is going to be obvious.

When I saw the error message, I used inference to generate the best explanation. My inference immediately told me that it was probably filtering out the term “well”, so I tested a conjecture (“they are thinking well is a common word”) and as far as I can tell, it turned out to be the case. But what if I was in an industry that was one step away, where people aren’t talking about wells, and reciting well ids, or talking about drilling, extraction, and other tasks related to wells? That’s where these kinds of problems turn into what might seem to be a sporadic or “unrepeatable bug”. We know there is a problem, but it isn’t obvious to us because our context doesn’t make it obvious. It would be obvious to someone with experience in the right context, it just isn’t obvious to us. Our observations are filtered due to our context, our knowledge and our biases. How do we get beyond this when dealing with a problem that seems mysterious to us?

Here is an example that I am dreaming up to fit this scenario. Let’s pretend that we have written a cool search tool that is fast, accurate and works well in corporate environments. We have sold this tool to companies in the transportation and financial industries. We also sold it to a couple of companies in the energy industry. Our tool works like this: it is installed on application servers in the corporate head office and is managed by the IT department. It uses a database to store information gathered from disparate sources, and updates the data found on the network, intranet, etc, on a set time. It might update every half hour, or on a nightly job. Our customers are thrilled with the tool, they now have a simple interface into this scattered data, but a couple of them are complaining about not being able to find a small amount of critical data. It appears to be lost. You, the reader are the tester working for the search company and are helping investigate this problem.

The team is frustrated, they just can’t get a repeatable case with the information they have. You have tried all your existing test data that you’ve generated and normally use with regression tests, but the software works fine in all your tests. The team feels that they have exhausted testing ideas, but a couple of clients are still complaining, and getting less patient with your company.

Given the information you know from this post, what would you do? Feel free to email me your ideas, and I’ll post a follow up with some of your suggestions as well as my own.

Daily Standups

My friend and former colleague Jay Boehm, a Senior Development Manager and I frequently discuss issues we face when working on development teams. Jay has been particularly interested in the daily standup that gets a lot of exposure in Scrum and XP. I gave him my usual rant about how all good things on a project flow from good communication, and some pointers on holding a successful standup, such as:

  • Keep it 15 minutes long
  • Limit discussion to:
    • What I worked on since the last standup
    • What I plan to work on today
    • Any issues preventing me from doing my work
  • Don’t use standups for employee evaluation

Jay reported back to me today:

We’ve been doing a daily stand up for about 3 weeks with my team, and I am really, really pleased with how it’s working out. I love how people talk afterwards and make plans to tackle problems. The shy ones on my team are starting to gain some confidence in their colleagues. It’s great. I’m becoming a convert.

I love hearing stories like that. Nice job Jay!

Procedural Test Scripts

Cem Kaner has sometimes called detailed manual procedural test scripts “an industry worst practice”. I tend to agree. At one time I thought they were a good idea, but they lead to all kinds of problems. One is a lack of diversity in testing, another is that they become a maintenance nightmare and rob time that could be spent actually testing the software with new ideas. Another problem is that we usually write them off of requirements which narrows our focus too much, and we write them early in a project when we know little about the product. But I’m not going to get into that in this post. Instead, I’m going to describe a recent conversation that outlines why we as testers should question why we do this practice.

I was recently discussing the creation of procedural test scripts prior to testing with developers. They were skeptical of my views that pre-scripting detailed manual test cases are a scourge on the testing world. “How will testers know what to test then?” I replied: “Good testers will use their judgment and skill. When they don’t have enough information, they will seek it out. They will utilize different testing heuristics to help meet the particular mission of testing that is required at the time. They will use information that is available to them, or they will test in the absence of information, but they will rely on skill to get the job done.” This didn’t resonate, so I came up with an equivalent for developers. Here is the “procedural development script” that follows what is so often done in testing. (To make it more authentic, it should be written as long before the actual development work is done as possible).

Development Procedural Script:

Purpose: write widget foo that does this business functionality

Steps:

  1. Open your Eclipse IDE. Start/Programs/Eclipse.
  2. Select your workspace for this project.
  3. In the package explorer, create a new java source code file.
  4. Begin typing in the IDE
  5. Use the such-and-such pattern to implement this functionality, type the following:
         public void <method name>(){
         etc.
         }

The development manager chuckled and said he’d fire a developer who needed this much direction. He needs people with skill that he can trust to be able to program in Java, to use their judgment and implement what needs to be done under his guidance. He would never expect developers to need things spelled out like that.

I countered that the same is true of testing. Why do we expect our testers to work this way? If we scoff at developers not needing that kind of direction, why do we use it in testing? Why do we promote bad practices that promote incompetence? Testers need to have skill to be able to figure out what to test without having everything handed to them. If we can’t trust our testers to do skilled work without having to spell everything out first, we need to get better testers.

Developers and business folk: demand skill from your testers.

Testers: demand skill from yourselves.

Are you a tester who wants to improve their skills? Cem Kaner’s free Black Box Software Testing is worth checking out.

Testing Debt

When I’m working on an agile project, (or any process using an iterative lifecycle), an interesting phenomenon occurs. I’ve been struggling to come up with a name for it, and conversations with Colin Kershaw have helped me settle on “testing debt”. (Note: Johanna Rothman has touched on this before, she considers it to be part of technical debt.) Here’s how it works:

  • in iteration one, we test all the stories as they are developed, and are in synch with development
  • in iteration two, we remain in synch testing stories, but when we integrate what has been developed in iteration one with the new code, we now have more to test than just the stories developed in that iteration
  • in iteration three, we have the stories to test in that iteration, plus the integration of the features developed in iterations that came before

As you can see, integration testing piles up. Eventually, we have so much integration testing to do as well as story testing, we have to sacrifice one or the other because we are running out of time. To end the iteration (often two to four weeks in length) some sort of testing needs to be cut in this iteration to be looked at later. I prefer keeping in synch with development, so I consciously incur “integration testing debt”, and we schedule time at the end of development to test a completed system.

Colin and I talked about this, and we explored other kinds of testing we could be doing. Once we had a sufficiently large list of testing (unit testing, “ility” testing, etc.), it became clear that the “testing debt” was more appropriate than “integration testing debt”.

Why do we want to test that much? As I’ve noted before, we can do testing in three broad contexts: the code context (addressed through TDD), the system context and the social context. The social context is usually the domain of conventional software testers, and tends to rely on testing through a user interface. At this level, the application becomes much more complex, greater than the sum of its parts. As a result, we have a lot of opportunity for testing techniques to satisfy coverage. We can get pretty good coverage at the code level, but we end up with more test possibilities as we move towards the user interface.

I’m not talking about what is frequently called “iteration slop” or “trailer-hitched QA” here. Those occur when development is done, and testing starts at the end of an iteration. The separate QA department or testing group then takes the product and deems it worthy of passing the iteration after they have done their testing in isolation. This is really still doing development and testing in silos, but within an iterative lifecycle.

I’m talking about doing the following within an iteration, alongside development:

  • work as a sounding board with development on emerging designs
  • help generate test ideas prior to story development (generative TDD)
  • help generate test ideas during story development (elaborative TDD)
  • provide initial feedback on a story under development
  • test a story that has completed development
  • integration test the product developed to date

Of note, when we are testing alongside development, we can actually engage in more testing activities than when working in phases (or in a “testing” phase near the end). We are able to complete more testing, but that can require that we use more testers to still meet our timelines. As we incur more testing debt throughout a project, we have some options for dealing with it. One is to leave off story testing in favour of integration testing. I don’t really like this option; I prefer keeping the feedback loop as tight as we can on what is being developed now. Another is to schedule a testing phase at the end of the development cycle to do all the integration, “ility”, system testing etc. Again I find this can cause a huge lag in the feedback loop.

I prefer a trade-off. We have as tight a feedback loop on testing stories that are being developed so we stay in synch with the developers. We do as much integration, system, “ility” testing as we can in each iteration, but when we are running out of time, we incur some testing debt in these areas. As the product is developed more (and there is now much more potential for testing), we bring in more testers to help address the testing debt, and bring on the maximum number we can near the end. We schedule a testing iteration at the end to catch up on the testing debt that we determine will help us mitigate project risk.

There are several kinds of testing debt we can incur:

  • integration testing
  • system testing
  • security testing
  • usability testing
  • performance testing
  • some unit testing

And the list goes on.

This idea is very much a work-in-progress. Colin and I have both noticed that on the development side, we are also incurring testing debt. Testing is an area with enormous potential, as Cem Kaner has pointed out in “The Impossibility of Complete Testing” (Presentation) (Article).

Much like technical debt we can incur it unknowingly. Unlike refactoring, I don’t know of a way to repay this other than to strategically add more testers, and to schedule time to pay it back when we are dealing with contexts other than program code. Even in the code context, we still may incur testing debt that refactoring doesn’t completely pay down.

How have you dealt with testing debt? Did you realize you were incurring this debt, and if so, how did you deal with it? Please drop me a line and share your ideas.