Reacting to failure

Who to blame and how to fix things?

Previous: Acceptance Strategies Next: Advanced Testing

Our tests are written, we can run them whenever we like, and they produce a lovely report when they're completed. Everything is good so we can confidently update the code and then run the tests again to get the next feature built happy in the knowledge that we won't break the features that are completed.

That's the ideal situation, but what happens in real life isn't always like that. Things break. We need to think about the best way to deal with a broken test (or lots of broken tests.)

One important thing to note here, however, is that a failing test is not a bad thing. We want our tests to fail. Assuming everything is working properly a failing test is a sign that something has changed, and either the code that's been introduced is incorrect or the code is correct and the test needs to be updated. Don't look at a failing test as a problem; see it as an opportunity to improve either the code or the test suite.

Probably the most important thing about reacting to a failing test is to remember that a failure only happens if our quality assurance process hasn't worked. With good processes in place code that fails a test will be catch quickly during development, or during a code review, or a manual test session. Using a structured series of steps we can be more sure that the code that's written isn't going to break a build. Following those steps is important.

If a code change breaks the tests then it's this process that has failed, not the developer who wrote the code. The process should be a living entity that we work to make more and more robust - whenever something fails that should be seen as an opportunity to improve it. We can learn from things that go wrong, and add steps to make sure they don't go wrong again.

Developing a quality process to drive a well tested application requires input from the developers and testers at every stage of the project. Without thinking about how to test an application there's no way to make sure the code is robust, and robust code is what drives the success of an application.

When a problem does happen and we find our tests failing the first thing to check is that the test suite software is working. Sadly, in the age of automatically updating browsers it's all too common to find a Webdriver component isn't compatible with the version of the browser that's installed. This can be mitigated to an extend by using a containerised test browser with automatic updates disabled (see Advanced Testing) but that means we won't necessarily be testing on the version of a browser that most users are using.

There are many separate parts to a test suite, and any update to one of them can lead to a test suite failing. We have to learn to manage our test software. It's well worth investing time in learn some devops skills to script and orchestrate your testing software to make this process as painless as possible.

NOTE: This assumes that the test is the problem rather than the code. Checking the code is correct should always be the first thing we do. If the test worked previously then it's safe to assume, to start with, that the problem lies in the application. Only when that code has been verified should we try debugging the tests themselves.

If we've confirmed that our test suite is working properly the next thing to do is to debug our failing test.

There are a number of reasons why a test that was passing might start to fail. If our HTML structure, or the name of a class, or an ID has changed then our tests may not be able to locate an element on the page. This is especially true if we've used XPath based locators. Our test report should indicate this reason with something like "Element could not be found." If we've used a page object to interact with the page then we just need to update that to get the test working again.

The next common problem is a test timing out. If our application code is interacting with an API then we need to check that the API is responding to requests. If it isn't then the next step is either to fix that problem so the test code can be left as is (if possible), or alternatively to mock the API in our test so when it runs it doesn't need to interact with the real service. This makes the test much more robust as it means the code being tested is limited to what we can control directly. Unfortunately that isn't always possible though.

TODO: More reasons.

Finding and fixing bugs in our tests is definitely important, but what do we do after that? It's worth thinking about the impact of a bug, who should take responsibility for it, and what to do when someone breaks the build.

The first thing to note is that these issues shouldn't ever really be a major problem. If you use source control well then any breaking changes should be limited to a separate branch. Broken code shouldn't make it's way back to the develop or master branches. If that happens then we should rethink the strategy you use for branch creation, code review, and pull requests.

The next thing to understand is that writing software is hard, and everyone breaks the code sometimes. Blaming a developer for a failing test isn't helpful. What failed was the devlopment process, not the individual. With that in mind, we should learn from the creative process used at Pixar called "plussing". The basis of plussing is that everyone involved in a project should be trying to add to it rather than take away from it. Blaming someone for making something worse is a negative reaction. By reframing the problem to see how we can add something we have to accept the code is broken and work from that point - by fixing the code ourselves, or by helping the developer responsible to fix their code. That is a more constructive approach that allows the team to work together to build great products.

Ultimately the goal is to create software that works for the client and their users, and the best way to do that is to push forwards as a team instead of a group of individuals.

With everyone on the team pulling in the same direction to make great software we can achieve some really clever stuff. But as thinks get cleverer, so our tests need to get cleverer to keep up. This is when we need to start looking at some more advanced testing techniques.

Previous: Acceptance Strategies Next: Advanced Testing