Good unit tests are FIRST. The I in FIRST stands for Isolation and is easily confused with the R, Repeatability. Ironically the I is itself not well isolated. I want to take a moment to focus on an often forgotten side of unit test isolation: test focus.
A good unit test focuses on testing just one thing and it doesn't overlap with other tests - it has high cohesion and low coupling. Conversely, if you change one rule in the production code, then only one unit test should fail. Together with well named tests this makes it easy to find the reason for a test failure (and giving tests meaningful names is easier when each of them focuses on just one thing).
I came up with a good example in Let's Code Jumi episode 200 (to be released around October 2012 - I have a big WIP ;). I'm showing here a refactored version of the code - originally the third test was all inline in one method and it might have been less obvious that what the problem was.
Example
The system under test is RunIdSequence, a factory for generating unique RunId instances. Here are the two unit tests which were written first:
@Test
public void starts_from_the_first_RunId() {
RunIdSequence sequence = new RunIdSequence();
RunId startingPoint = sequence.nextRunId();
assertThat(startingPoint, is(new RunId(RunId.FIRST_ID)));
}
@Test
public void each_subsequent_RunId_is_incremented_by_one() {
RunIdSequence sequence = new RunIdSequence();
RunId id0 = sequence.nextRunId();
RunId id1 = sequence.nextRunId();
RunId id2 = sequence.nextRunId();
assertThat(id1.toInt(), is(id0.toInt() + 1));
assertThat(id2.toInt(), is(id1.toInt() + 1));
}
These unit tests are well isolated. The first focuses on what is the first RunId in the sequence, the second focuses on what is the relative difference between subsequent RunIds. The second test is unaware of the absolute value of the first RunId, so the tests don't overlap. I can easily make just one of them fail and the other pass.
The RunIdSequence needs to be thread-safe, so here is the third test, with the relevant bits highlighted:
@Test
public void the_sequence_is_thread_safe() throws Exception {
final int ITERATIONS = 50;
List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);
assertThat("generating RunIds in parallel should have produced the same values as sequentially",
actualRunIds, is(expectedRunIds));
}
private static List<RunId> generateRunIdsSequentially(int count) {
List<RunId> results = new ArrayList<RunId>();
// XXX: knows what is the first ID (RunId.FIRST_ID, even worse would be to use the constant 1)
// XXX: knows how subsequent IDs are generated (increase by 1)
for (int id = RunId.FIRST_ID; id < RunId.FIRST_ID + count; id++) {
results.add(new RunId(id));
}
return results;
}
private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
final RunIdSequence sequence = new RunIdSequence();
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
for (int i = 0; i < count; i++) {
futures.add(executor.submit(new Callable<RunId>() {
@Override
public RunId call() throws Exception {
return sequence.nextRunId();
}
}));
}
List<RunId> results = new ArrayList<RunId>();
for (Future<RunId> future : futures) {
results.add(future.get(1000, TimeUnit.MILLISECONDS));
}
Collections.sort(results, new Comparator<RunId>() {
@Override
public int compare(RunId id1, RunId id2) {
return id1.toInt() - id2.toInt();
}
});
executor.shutdown();
return results;
}
This test is not isolated. It defines same things as those two other tests, so it has overlap with them: it knows what is the first RunId and how subsequent values are generated. If one of the two first tests fail, also this test will fail, even though this test is meant to focus on thread-safety just as its name says.
Here is an improved version of the same test, with changes highlighted:
@Test
public void the_sequence_is_thread_safe() throws Exception {
final int ITERATIONS = 50;
List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);
assertThat("generating RunIds in parallel should have produced the same values as sequentially",
actualRunIds, is(expectedRunIds));
}
private static List<RunId> generateRunIdsSequentially(int count) {
RunIdSequence sequence = new RunIdSequence();
List<RunId> results = new ArrayList<RunId>();
for (int i = 0; i < count; i++) {
results.add(sequence.nextRunId());
}
return results;
}
private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
final RunIdSequence sequence = new RunIdSequence();
ExecutorService executor = Executors.newFixedThreadPool(10);
List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
for (int i = 0; i < count; i++) {
futures.add(executor.submit(new Callable<RunId>() {
@Override
public RunId call() throws Exception {
return sequence.nextRunId();
}
}));
}
List<RunId> results = new ArrayList<RunId>();
for (Future<RunId> future : futures) {
results.add(future.get(1000, TimeUnit.MILLISECONDS));
}
Collections.sort(results, new Comparator<RunId>() {
@Override
public int compare(RunId id1, RunId id2) {
return id1.toInt() - id2.toInt();
}
});
executor.shutdown();
return results;
}
Now it doesn't anymore know about those two design decisions. It's focused on just testing thread-safety and doesn't duplicate the other tests nor the production code. I can change the production code to make any one of the three tests to fail while the other two still pass.
Conclusion
When I try to write as isolated/focused tests as possible, it makes it easy to find the cause of a test failure. If I don't know why some code exists, I can comment it out and see which tests fail - their names will tell why the code was written. When removing features, I can remove whole tests which define that feature, instead of having to update non-cohesive tests. When changing features, I need to update only a few tests.
P.S. I've been thinking that it might be possible to measure automatically the "unitness" of tests using mutation testing tools such as PIT. The fewer tests per mutation fail, the better. And if a test always fails together with other tests, then it's a bad thing. It might help in pinpointing tests which need some refactoring.