2012-05-13

Passing Contract Tests While Refactoring Them

In my last blog post I explained how I at one time created a new implementation to pass the same contract tests as another implementation, but due to having to refactor the tests at the same time (the two implementations have a different concurrency model, so the contract tests must be decoupled from it), I missed a problem (wrote some dead code). Since then I've retried that refactoring/implementation using a different approach, as I explained in the comments of that blog post.

One option would have been to refactor the whole contract test class before starting to implement it, but that goes against my principles of doing changes in small steps and having fast feedback loops. So the approach I tried is as follows:

  1. Extract a contract test from the old implementation's tests by extracting factory methods for the SUT, creating the abstract base test class and moving the old implementation's test methods there.
  2. Create a second concrete test class which extends the contract tests, but override all test methods from the contract test and mark them ignored. This avoids getting lots of failing tests appearing at once. Maybe mark each of the overridden versions with a "TODO: refactor tests" so as to not forget the next step.
  3. Inspect the contract test method which you plan to implement next and see if it requires some refactoring before it would work for both implementations. Refactor the test if necessary. This gives a systematic way for updating the contract tests in small tests and avoids refactoring while tests are red.
  4. Unignore one contract test method and implement the feature in the new implementation. This lets you focus on passing just one test at a time, as is normal in TDD.

I recorded my experiment of using this strategy as a screencast:

Download as MP4

More Let's Code Screencasts

2012-05-03

Declaring Pass or Fail - Handling Broken Assumptions

When using TDD, it's a good practice to declare - aloud or in your mind - whether the next test run will pass or fail (and in what way it will fail). Then when your assumption about the outcome happens to be wrong, you'll be surprised and you can start looking more closely at why on earth the code is not behaving as you thought it would.

I had one such situation in my Let's Code screencasts where I missed a mistake - I had written code that's not needed to pass any tests - and noticed it only five months later when analyzing the code with PIT mutation testing. You can see how that untested line of code was written in Let's Code Jumi #62 at 24:40, and how it was found in Let's Code Jumi #148 at 4:15 (the rest of the episode and the start of episode 149 goes into fixing it).

I would be curious to figure out a discipline which would help to avoid problems like that.

Here is what happened:

I was developing my little actors library. I already had a multi-threaded version of it working and now I was implementing a single-threaded version of it to make testing actors easier. I used contract tests to drive the implementation of the single-threaded version. Since the tests were originally written for the multi-threaded version, they required some tweaking to make them work for both implementations, with and without concurrency.

I was already so far that all but one contract test were passing, when I wrote the fateful line idle = false; and ran the tests - I had expected them to pass, but that one test was still failing. So then I investigated why the test did not pass and found out that I had not yet updated the test to work with the single-threaded implementation. After fixing the test, it started failing for another reason (a missing try-catch), so I implemented that - but I did not notice that the line I had added earlier did not contribute to passing the test. Only much later did I notice (thanks to PIT) that I was missing a test case to cover that one line.

So I've been thinking, how to avoid mistakes like this in the future? I don't yet have an answer.

Maybe some sort of mental checklist to use when I have written some production code but it doesn't make the test pass because of a bug in the test. Maybe if I would undo all changes to production code before fixing the test, would that avoid the problem? Maybe the IDE could help by highlighting suspicious code - the IDE could have two buttons for running tests, one where the assumption is that the tests will pass and another where they are expected to fail. Then when an assumption is broken, it would highlight all code that was written since the last time tests passed and/or assumptions were correct, which might help in inspecting the code.

Or maybe all problems like this can be found automatically with mutation testing and I won't need a special procedure to avoid introducing them?


UPDATE: In a following blog post I'm experimenting a better way of doing this refactoring.

2012-05-01

Unit Test Focus Isolation

Good unit tests are FIRST. The I in FIRST stands for Isolation and is easily confused with the R, Repeatability. Ironically the I is itself not well isolated. I want to take a moment to focus on an often forgotten side of unit test isolation: test focus.

A good unit test focuses on testing just one thing and it doesn't overlap with other tests - it has high cohesion and low coupling. Conversely, if you change one rule in the production code, then only one unit test should fail. Together with well named tests this makes it easy to find the reason for a test failure (and giving tests meaningful names is easier when each of them focuses on just one thing).

I came up with a good example in Let's Code Jumi episode 200 (to be released around October 2012 - I have a big WIP ;). I'm showing here a refactored version of the code - originally the third test was all inline in one method and it might have been less obvious that what the problem was.

Example

The system under test is RunIdSequence, a factory for generating unique RunId instances. Here are the two unit tests which were written first:

    @Test
    public void starts_from_the_first_RunId() {
        RunIdSequence sequence = new RunIdSequence();

        RunId startingPoint = sequence.nextRunId();

        assertThat(startingPoint, is(new RunId(RunId.FIRST_ID)));
    }

    @Test
    public void each_subsequent_RunId_is_incremented_by_one() {
        RunIdSequence sequence = new RunIdSequence();

        RunId id0 = sequence.nextRunId();
        RunId id1 = sequence.nextRunId();
        RunId id2 = sequence.nextRunId();

        assertThat(id1.toInt(), is(id0.toInt() + 1));
        assertThat(id2.toInt(), is(id1.toInt() + 1));
    }

These unit tests are well isolated. The first focuses on what is the first RunId in the sequence, the second focuses on what is the relative difference between subsequent RunIds. The second test is unaware of the absolute value of the first RunId, so the tests don't overlap. I can easily make just one of them fail and the other pass.

The RunIdSequence needs to be thread-safe, so here is the third test, with the relevant bits highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        List<RunId> results = new ArrayList<RunId>();
        // XXX: knows what is the first ID (RunId.FIRST_ID, even worse would be to use the constant 1)
        // XXX: knows how subsequent IDs are generated (increase by 1)
        for (int id = RunId.FIRST_ID; id < RunId.FIRST_ID + count; id++) {
            results.add(new RunId(id));
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

This test is not isolated. It defines same things as those two other tests, so it has overlap with them: it knows what is the first RunId and how subsequent values are generated. If one of the two first tests fail, also this test will fail, even though this test is meant to focus on thread-safety just as its name says.

Here is an improved version of the same test, with changes highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        RunIdSequence sequence = new RunIdSequence();

        List<RunId> results = new ArrayList<RunId>();
        for (int i = 0; i < count; i++) {
            results.add(sequence.nextRunId());
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

Now it doesn't anymore know about those two design decisions. It's focused on just testing thread-safety and doesn't duplicate the other tests nor the production code. I can change the production code to make any one of the three tests to fail while the other two still pass.

Conclusion

When I try to write as isolated/focused tests as possible, it makes it easy to find the cause of a test failure. If I don't know why some code exists, I can comment it out and see which tests fail - their names will tell why the code was written. When removing features, I can remove whole tests which define that feature, instead of having to update non-cohesive tests. When changing features, I need to update only a few tests.

P.S. I've been thinking that it might be possible to measure automatically the "unitness" of tests using mutation testing tools such as PIT. The fewer tests per mutation fail, the better. And if a test always fails together with other tests, then it's a bad thing. It might help in pinpointing tests which need some refactoring.