Using Mutation Testing to Improve Quality of Unit Tests
Code kata: using mutation testing to improve quality of unit tests.
This is a set of exercises that will demonstrate
- That having passing unit tests and high unit test coverage numbers may be giving a false sense of security due to low quality of the tests
- how to identify problem spots using mutation testing and common test smells
- how to fix these problems
What Is a Code Kata?
A code kata is an exercise in programming which helps programmers hone their skills through practice. A code kata is usually set up as a series of unit tests, which fail. Your task is to write code to make them pass. The idea is inspired by the Japanese concept of kata in the martial arts. Just like in the martial arts, you can repeat a kata multiple times to make improvements to your solutions.
Please note that this kata is a little different - all the tests initially pass. Don't worry, all the same ideas mentioned above still apply here. We will improve the tests, in the process making them fail, and then we will fix the code to make the tests pass, but by then they will be good tests.
Running This Kata
To build this kata you will need
- Java 8 or newer
- Maven 3.6.1 or newer
- an IDE of your choice
There are two modules in this project
- kata - contains the exercises, including the domain and test classes described below. You should be working with this module.
- solutions - contains solutions to the exercises in the kata as well as explanations of the test smells (see the "Unit Test Smells" section below). There is more than one way to solve the kata, so your solutions may not look exactly like the ones in this module, in fact you may find ways to improve on the solutions here.
The domain for the kata is made up of two classes: Company and Employee:
Run all the unit tests in the
mtk.domain.CompanyTestclass. They should all pass. Check the test coverage metrics using either maven output or a coverage reporting function in your IDE test runner. The coverage should be close to 100%. Good news: there are tests, they all pass, and they cover all of our business logic. Looks like the software is ready to ship!
Unfortunately that would be a terrible idea as the code is full of bugs. To prove it, just take a look at the
mtk.CompanyRunnerclass, which contains some simple business logic in its
mtk.CompanyRunner.main()and looks at the console output. Does it look right? How can we have all these bugs despite having all these test?
Run the unit tests with mutations. Mutations will be introduced in your code by PIT - a mutation testing tool .
- Enable the
pitestmaven profile for the project. This profile is bound to
testphase of the maven lifecycle.
- Run the
testtask in the module
kata. (To run it from the command line with the profile activated, execute the
mvn test -P pitestcommand.) With the profile enabled, this task will invoke the PIT framework to first introduce changes in the application code and then execute tests.
- Inspect the results. The results are written in HTML format into a file in the
target/pit-reports/YYYYMMDDHHMIdirectory. Open this file in a browser - you should see quite a bit of red. This means that some of the code mutations managed to survive - were not caught by the unit tests. Which means that in fact our unit tests do not test what they are supposed to.
- Enable the
Fix the test smells. Each test in the test class exhibits one or more test smells. Going through the tests one by one, fix the smell and make sure the test actually does what it is supposed to. To help you, the comments in some of the test methods explicitly say what smell is present there. Once you remove the smell, the test should start failing. This is a good thing, because now we have tests that actually validate the behavior of our software.
Fix the business logic, to make the tests pass. Look at the comments in the code, they may explain its intended behavior (does not mean the method as written behaves as intended).
Kill all mutants! The tests that have been fixed this way should catch mutation introduced by PIT. When all the tests (and the logic under test) are fixed, no mutations should be able to survive. So the end state should be passing tests and dead mutants (and no smells).
The rest of this documents offers some general pointers, which may come in handy if you are new to unit testing.
Unit Tests - Necessary, But Not Sufficient...
...to build confidence in the software system under test. While our focus here is on unit tests, it helps to put them in a broader context. The table below lists common types of tests.
The meaning of the columns:
- Category - A category of tests
- Purpose - Why is this kind of testing needed
- Who - Roles involved in creation of tests and validating test results
- Tools - Example of tools supporting this type of testing
|Unit||Validate a unit of behavior at the low (code) level, focusing on a small part of the system (e.g., a method)||Dev||JUnit|
|Acceptance||Validates that the business logic is implemented as specified for a given scenario||Dev, User||FitNesse|
|Mutation||Ensure quality of unit and acceptance tests||Dev||PITest|
|Integration||Detect issues in interactions between modules of the system||Tech Ops, Dev|
|User Acceptance||Certify by the users that the system as a whole is operating as expected||Dev, User|
|Production Mirror||Test system under load identical to production||Tech Ops|
|Chaos Engineering||Test system resiliency by failure injection into infrastructure (service processes, networks, clients, etc.)||Tech Ops||Chaos Monkey|
|Breakpoint||Determine the maximum amount of load that the system can support||Tech Ops||The Grinder|
Unit Test Best Practices
These are some of the practices to follow to ensure that the unit tests are effective, easy to maintain, easy to execute:
- Automated - require no human involvement to determine the outcome
- Focused - each test method tests one scenario
- Complete - test the edge cases, try to cover all meaningfully different scenarios
- Well named - test method name describes the scenario being tested
- Fast - the relevant tests execute in a few seconds or faster
- Independent - no external dependencies, no dependencies on other tests
- Test the behavior, not the implementation
Unit Test Smells
These are the signs that there is possibly something wrong with the test - either because the test itself is not well written, or the code under test is not test friendly (which probably means that this code is not well factored):
- No assertions
- Irrelevant assertions
- Use of Mocks
- Expected results are calculated rather than explicitly specified
- Test code reuse (that is test logic reuse, test utilities are good)
- Test data reuse
- "Flickering" tests(tests with nondeterministic behavior)
- Interdependencies between tests (e.g., execution order)
- Long running tests
@Ingore'd or commented out tests
These smells often come together. For example, sharing test data can lead to tests' success depending on the execution order.
Unit Test Quality
How can we measure the quality of the unit tests in a system? One metric, which is used broadly, is test coverage. Some things to keep in mind:
- Test coverage
- %-ge of LOC, methods, classes covered by tests
- Does not guarantee the covered code is actually tested
- Does identify the code that is definitely not tested
- So having a coverage target in not entirely pointless, but...
- ...don't optimize just for coverage
- How do we make sure that the tests actually test? Mutation Testing is one way to ensure relevance of unit tests.
Mutation testing is a way to validate the quality of unit tests. It means introducing changes in the code and observing the behavior of the unit tests. Assuming that all the tests were passing before the mutation, some of the unit tests will either start failing (good) or all the tests will keep on passing (bad). The latter scenario means that the unit tests do not really validate outcomes of the code under test: the results for all intents and purposes become random, yet all the tests pass.
Test Driven Development
Adopting Test Driven Development (TDD) will result in better tests, better interfaces, less unnecessary code, and more confident and steady development process. Just follow these steps:
- Write a test
- Take the user's perspective: "What is the API that would make my job the easiest?"
- Think small increments
- Make the test pass
- Do whatever it takes: Duplication? Fine! Hardcoding the expected result? Fine!
- Remove duplication
- Repeat for all meaningfully different scenarios
- Reap the benefits
- Almost all code is tested
- You know when to stop coding
- User friendly interfaces
- Well factored code
- Just enough abstraction
- Just enough code
- Develop with confidence!
Sounds too good to be true? The secret is that TDD does require a lot of discipline from its practitioners to work in tiny increments, diligently following the steps above, and not cutting corners. Without the discipline it is likely you will end up with tests (and code under tests) of the usual "quality".