Living in Alberta, I do testing work in the energy industry, or with companies that support the energy industry with software or services. Alberta is Canada’s energy capital, the largest oil and gas producer in the country. Rudyard Kipling said when he visited Alberta that we “…seem to have all hell for a basement”.
As a kid growing up in the ’70s and early ’80s, it seemed every farmer’s field in our area had a pumpjack servicing an oil well, or had a gas well feeding into a pipeline. In the energy industry, the term “well” is common, but vital. Wells are one source from which oil and gas flow, so a lot of resources are put behind drilling to find new sources, or to extract more oil and gas from known producing reservoirs beneath the surface.
I was recently testing an application that serves this industry. It had a neat search function, so I typed in the name of one of my test wells I had set up. I immediately got an error, with messaging informing me that the application could not complete my search because it had “common” words in it. The example it gave was “the”, and the messaging said to try it again without a common word. I immediately suspected that it was filtering out the word “well”, so I repeated the search without the term “well”, and it worked like a charm.
I logged the issue and the developers investigated the cause. (I was off-site, so I didn’t get all of the details I normally would.) They told me that this problem was due to the database server configuration that filters out common words on queries. The software that managed the database had a feature that they tought would be helpful in many contexts. Sadly, this feature the database vendor thought would be helpful caused problems in our context. An assumption with the term “well” as always being a common word for all contexts caused this feature to be useless in our context. Context is important, and assumptions are frequently a source of bugs.
In what other contexts would this be a problem? I can think of at least one. If I had this feature turned on, my database and I worked for a company that managed water sources, I imagine I would run into problems as well with searching in this way, since water wells are also common, particularly in rural areas. But this is a context where the problem is going to be obvious.
When I saw the error message, I used inference to generate the best explanation. My inference immediately told me that it was probably filtering out the term “well”, so I tested a conjecture (“they are thinking well is a common word”) and as far as I can tell, it turned out to be the case. But what if I was in an industry that was one step away, where people aren’t talking about wells, and reciting well ids, or talking about drilling, extraction, and other tasks related to wells? That’s where these kinds of problems turn into what might seem to be a sporadic or “unrepeatable bug”. We know there is a problem, but it isn’t obvious to us because our context doesn’t make it obvious. It would be obvious to someone with experience in the right context, it just isn’t obvious to us. Our observations are filtered due to our context, our knowledge and our biases. How do we get beyond this when dealing with a problem that seems mysterious to us?
Here is an example that I am dreaming up to fit this scenario. Let’s pretend that we have written a cool search tool that is fast, accurate and works well in corporate environments. We have sold this tool to companies in the transportation and financial industries. We also sold it to a couple of companies in the energy industry. Our tool works like this: it is installed on application servers in the corporate head office and is managed by the IT department. It uses a database to store information gathered from disparate sources, and updates the data found on the network, intranet, etc, on a set time. It might update every half hour, or on a nightly job. Our customers are thrilled with the tool, they now have a simple interface into this scattered data, but a couple of them are complaining about not being able to find a small amount of critical data. It appears to be lost. You, the reader are the tester working for the search company and are helping investigate this problem.
The team is frustrated, they just can’t get a repeatable case with the information they have. You have tried all your existing test data that you’ve generated and normally use with regression tests, but the software works fine in all your tests. The team feels that they have exhausted testing ideas, but a couple of clients are still complaining, and getting less patient with your company.
Given the information you know from this post, what would you do? Feel free to email me your ideas, and I’ll post a follow up with some of your suggestions as well as my own.