2012-12-26

New Testing Tools: Jumi 0.2 & Specsy 2

New versions of two of my projects have just been released. They both are testing tools for Java/JVM-based languages.

Jumi 0.2

Jumi is a new test runner for the JVM, which overcomes the JUnit test runner's limitations to better support all testing frameworks, and offers better performance and usability to end-users.

This release is ready for early adopters to start using Jumi. It adds support for running multiple test classes (though they must be listed manually). The next releases will add automatic test discovery (so you can identify all tests with e.g. "*Test") and JUnit backward compatibility (so that Jumi will run also JUnit tests).

The Specsy testing framework has been upgraded to run using Jumi. We hope that more testing frameworks will implement Jumi support in near future - please come to the Jumi mailing list if you're a tool vendor interested in implementing Jumi support.

Specsy 2

The Specsy testing framework supports now more languages than ever (Specsy 1 was Scala-only). For now it supports Scala (2.7.7 and higher), Groovy (any version) and Java (7 or higher; lambdas strongly recommended), but it's only a matter of adding one wrapper class to add support for a new JVM-based language.

Specsy 2 runs using the new Jumi test runner, fixing a bunch of issues that Specsy 1.x had with the JUnit test runner's limited expressiveness. Actually Specsy 2 was released already in September, but Jumi wasn't then ready for general use, but now it is.

2012-08-09

Continuous Delivery with Maven and Go into Maven Central

In continuous delivery the idea is to have the capability to release the software to production at the press of a button, as often as it makes sense from a business point of view, even on every commit (which would then be called continuous deployment). This means that every binary built could potentially be released to production in a matter of minutes or seconds, if the powers that be so wish.

Maven's ideology is opposed to continuous delivery, because with Maven it must be decided before building whether the next artifact will be a release or a snapshot version, whereas with continuous delivery it will be known only long after building the binaries (e.g. after it has passed all automated and manual testing) whether the binary is fit for release.

In this article I'll explain how I tamed Maven to support continuous delivery, and how to do continuous delivery into the Maven Central repository. For continuous integration and release management I'm using Go (not to be confused with Go, go or other uses of go). Git's distributed nature comes also into use, when creating new commits and tags during each build.

The project in question is Jumi, a new test runner for Java to replace JUnit's test runner. It's an open source library and framework which will be used by testing frameworks, build tools, IDEs and such, so the method of distribution is publishing it to Maven Central through Sonatype OSSRH.

Version numbering with Maven to support continuous delivery

Maven Release Plugin is diametrically opposed to continuous delivery, as is the use of snapshot version numbers in Maven, so I defined the project's version numbering scheme to use the MAJOR.MINOR.BUILD format. In the source repository the version numbers are snapshot versions, but before building the binaries the CI server will use the following script to read the current version from pom.xml and append the build number to it. For example, the latest (non-published) build is build number 134 and the version number in pom.xml is 0.1-SNAPSHOT, so the resulting release version number will be 0.1.134.

In the project's build script I'm then using Versions Maven Plugin to change the version numbers in all modules of this multi-module Maven project. The changed POM files are not checked in, because in continuous delivery each binary should anyways be built only once. To make the sources traceable I'm tagging every build, but I publish those tags only after releasing the build - more about that later.

P.S. versions:set requires the root aggregate POM to extend the parent POM, or else you will need to run it separately for both the root and the parent POMs by giving Maven the --file option. Also, to keep all plugin version numbers in the parent POM's dependencyManagement section, the root should extend parent, or else you will need to define the plugin version numbers on the command line and can't use the shorthand commands for running Maven plugins (unless you're OK with Maven deciding itself that which version to use, which can produce unrepeatable builds).

Managing the deployment pipeline with Go

Before going into further topics, I'll need to explain a bit about Go's artifact management and give an overview of Jumi's deployment pipeline.

Go is more than just a continuous integration server - it is designed for release and deployment management. (I find that Go fits that purpose better than for example Jenkins.) In Go the builds are organized into a deployment pipeline, as described in the Continuous Delivery book. All pipeline runs are logged forever* and configuration changes are version controlled (Go uses internally Git), so it also provides full auditability especially when it's used for deployment. The commercial version has additional deployment environment and access permission features, but the free version has been adequate for me for now (though being an open source project I could get the enterprise edition for free). Update 2012-09-20: Since Go 12.3 the free version has the same features as the commercial version - only the maximum numbers of users and remote agents differ. Update 2014-02-25: Go is now open source and fully free!

* For experimenting I recommend cloning a pipeline with a temporary name, to avoid the official pipeline's history from being filled with experimental runs. You can't remove things from the history (except by removing or hacking the database), but you can hide them by deleting the pipeline and not reusing the pipeline's name (if you create a new pipeline with the same name then the old history will become visible). Only build artifacts of old builds can be removed, and there's actually a feature for that to avoid the disk getting too full.

Go's central concepts are pipelines, stages and jobs. Each pipeline consists of one or more sequentially executed steps, and each step consists of one or more jobs (which can be executed in parallel if you have multiple build agents). A pipeline can depend on the stages of other pipelines and a stage can be triggered automatically or manually with a button press, letting you manage complex builds by chaining multiple pipelines together.

A pipeline may even depend on multiple pipelines, for example if the system is composed of multiple separately built applications, in which case one pipeline could be used to select that which versions to deploy together. Then further downstream pipelines or stages can be used to deploy the selected versions together into development, testing and finally into the production environment.

You can save artifacts produced by a job on the Go server and then access those artifacts in downstream jobs by fetching the artifacts into the working directory before running your scripts. Go uses environment variables to tell the build scripts about the build number, source control revision, identifiers of previous stages, custom parameters etc.

Here you can see two dependent pipelines from Jumi's deployment pipeline. Clicking the button in jumi-publish would trigger that pipeline using the artifacts from this particular build of the jumi pipeline. I can trigger the downstream pipeline using any previous build - it doesn't have to be the latest build.

Running a shell command in Go requires more configuration than in Jenkins (which has a single text area for inputting multiple commands), which has the positive side-effect that it drives you to store all build scripts in version control. I have one shell script for each Go job which in turn call a bunch of Ruby scripts and Maven commands.

Below is a diagram showing Jumi's deployment pipeline at the time of writing. The names of pipelines are in bold, stages underscored, and jobs are links to their respective shell scripts. For more details see the pipeline configuration.

Git
 |
 |
 |--> jumi (polling automatically)
      build         --> analyze
      build-release     coverage-report
         |
         |
         |--> jumi-publish (manually triggered)
              pre-check           --> ossrh           --> github
              check-release-notes     promote-staging     push-staging

The jumi/build stage builds the Maven artifacts and saves them on the Go server for use in later stages. It also tags the release and updates the release notes, but those commits and tags are not yet published, but they are saved on the Go server.

The jumi/analyze stage runs PIT mutation testing and produces line coverage and mutation coverage reports. They can be viewed in Go on their own tab.

The jumi-publish/pre-check stage makes sure that release notes for the release have been filled in (no "TBD" line items), or else it will fail the pipeline and prevent the release.

The jumi-publish/ossrh stage uploads the build artifacts from Go into OSSRH. It doesn't yet run the last Nexus command for promoting the artifacts from the OSSRH staging repository into Maven Central (I need to log in to OSSRH and click a button), because I haven't yet written smoke tests which would make sure that all artifacts were uploaded correctly.

The jumi-publish/github stage pushes to the official Git repository the tags and commits which were created in jumi/build. It will merge automatically if somebody has pushed there commits after this build was made.

Future plans for improving this pipeline include adding a new jumi-integration pipeline between jumi and jumi-publish. It will run consumer contract tests of programs using Jumi, against multiple versions of those programs to notice any backward incompatibility issues. This stage might eventually take hours to execute, in which case I may break it into multiple jobs and run them in parallel. I will also reuse a subset of those tests as smoke tests in the jumi-publish pipeline, after which I can automate the final step to promote from OSSRH to Maven Central.

Staging repository for Maven artifacts in Go

Creating publishable Maven artifacts happens with the Maven Deploy Plugin and the location of the Maven repository can be configured with altDeploymentRepository. I'm using -DaltDeploymentRepository="staging::default::file:staging" to create a staging repository with only this build's artifacts, which I then save on the Go server. (The file:staging path means the same as file://$PWD/staging but works also on Windows.)

That staging repository can be accessed from the Go server using HTTP, so it would be quite simple to let beta users access them (optionally using HTTP basic authentication). For example the URL to a build's staging repository could be http://build-server/go/files/jumi/134/build/1/build-release/staging/ Though inside Go jobs it's the easiest to fetch the staging repository to the working directory. That avoids the need to configure Maven's HTTP authentication settings.

When it is decided that a build can be published, I trigger the jumi-publish pipeline which uses the following script to upload the staging repository from a directory into a remote Maven repository. It uses curl to do HTTP PUT commands with HTTP basic authentication. (I wasn't able to find any documentation about the protocol of a Maven repository like Nexus, but was able to sniff it using Wireshark.)

In addition to the above uploading, the publish script uses Nexus Maven Plugin to close the OSSRH repository to which the artifacts were uploaded. It could also promote it to Maven Central, but I want to first create some smoke tests to make sure that all the necessary artifacts were uploaded. Until then I'll do a manual check before clicking the button in OSSRH to promote the artifacts to Maven Central.

Publishing to Maven Central puts some additional requirements on the artifacts. Since I'm not using Maven Release Plugin, I need to manually enable the sonatype-oss-release profile in Sonatype OSS Parent POM to generate all the required artifacts and to sign them. If you don't need to publish artifacts to Maven Central, then you might not need to do this signing. But if you do, it's good to know that the Maven GPG Plugin accepts as parameters the name and passphrase of the GPG key to use. They can be configured in the Go pipeline using secure environment variables which are automatically replaced with ******** in the console output. (For more security, don't save the passphrases on the Go server, but manually enter them when triggering the pipeline. Otherwise somebody with root access to the Go server could get the Go server's private key and decrypt the passphrase in Go's configuration files. Though using passphraseless SSH and GPG keys on the CI server is much simpler.)

Tagging the release and updating release notes with Git

When I do a release, I want it to be tagged and the version and date of the release added to release notes, which are in a text file in the root of the project. In order to get the release notes included in the tagged revision and for the tag to be on the revision which was built, that commit needs to be done before building (an additional benefit is that all GPG signing - the build artifacts and the tag - will be done in the build stage). Since I'm using Git, I can avoid the infinite loop, which would otherwise ensue from committing on every build, by pushing the commits only if the build is released.

The release notes for the next release can be read from the release notes file with a little regular expression (get-release-notes.rb). Writing the version number and date of the release into release notes is also solvable using regular expressions (prepare-release-notes.rb), as is preparing for the next release iteration by adding a placeholder for the future release notes (bump-release-notes.rb).

With the help of those helper scripts the build script, shown below, will be able to create a commit that contains the finalized release notes and tag it with a GPG signed tag (I'm including the release notes also in the tag message). It saves the release metadata into files, so that later stages of the pipeline would not need to recalculate them (for example in promote-staging.sh), and so that I could see them in a custom tab in Go (build-summary.html). Then the script does the build with Maven and after that does another commit which prepares the release notes for a future release.

At the end of the above script you will see what lets me get away with doing commits on build. I'm creating a new repository to the directory staging.git and saving that on the Go server the same way as all build artifacts.

Then when a release is published, the following script is used to merge those commits to the master branch and push them to the official Git repository:

Hopefully this article has given you some ideas for implementing continuous delivery using Maven.

2012-05-13

Passing Contract Tests While Refactoring Them

In my last blog post I explained how I at one time created a new implementation to pass the same contract tests as another implementation, but due to having to refactor the tests at the same time (the two implementations have a different concurrency model, so the contract tests must be decoupled from it), I missed a problem (wrote some dead code). Since then I've retried that refactoring/implementation using a different approach, as I explained in the comments of that blog post.

One option would have been to refactor the whole contract test class before starting to implement it, but that goes against my principles of doing changes in small steps and having fast feedback loops. So the approach I tried is as follows:

  1. Extract a contract test from the old implementation's tests by extracting factory methods for the SUT, creating the abstract base test class and moving the old implementation's test methods there.
  2. Create a second concrete test class which extends the contract tests, but override all test methods from the contract test and mark them ignored. This avoids getting lots of failing tests appearing at once. Maybe mark each of the overridden versions with a "TODO: refactor tests" so as to not forget the next step.
  3. Inspect the contract test method which you plan to implement next and see if it requires some refactoring before it would work for both implementations. Refactor the test if necessary. This gives a systematic way for updating the contract tests in small tests and avoids refactoring while tests are red.
  4. Unignore one contract test method and implement the feature in the new implementation. This lets you focus on passing just one test at a time, as is normal in TDD.

I recorded my experiment of using this strategy as a screencast:

Download as MP4

More Let's Code Screencasts

2012-05-03

Declaring Pass or Fail - Handling Broken Assumptions

When using TDD, it's a good practice to declare - aloud or in your mind - whether the next test run will pass or fail (and in what way it will fail). Then when your assumption about the outcome happens to be wrong, you'll be surprised and you can start looking more closely at why on earth the code is not behaving as you thought it would.

I had one such situation in my Let's Code screencasts where I missed a mistake - I had written code that's not needed to pass any tests - and noticed it only five months later when analyzing the code with PIT mutation testing. You can see how that untested line of code was written in Let's Code Jumi #62 at 24:40, and how it was found in Let's Code Jumi #148 at 4:15 (the rest of the episode and the start of episode 149 goes into fixing it).

I would be curious to figure out a discipline which would help to avoid problems like that.

Here is what happened:

I was developing my little actors library. I already had a multi-threaded version of it working and now I was implementing a single-threaded version of it to make testing actors easier. I used contract tests to drive the implementation of the single-threaded version. Since the tests were originally written for the multi-threaded version, they required some tweaking to make them work for both implementations, with and without concurrency.

I was already so far that all but one contract test were passing, when I wrote the fateful line idle = false; and ran the tests - I had expected them to pass, but that one test was still failing. So then I investigated why the test did not pass and found out that I had not yet updated the test to work with the single-threaded implementation. After fixing the test, it started failing for another reason (a missing try-catch), so I implemented that - but I did not notice that the line I had added earlier did not contribute to passing the test. Only much later did I notice (thanks to PIT) that I was missing a test case to cover that one line.

So I've been thinking, how to avoid mistakes like this in the future? I don't yet have an answer.

Maybe some sort of mental checklist to use when I have written some production code but it doesn't make the test pass because of a bug in the test. Maybe if I would undo all changes to production code before fixing the test, would that avoid the problem? Maybe the IDE could help by highlighting suspicious code - the IDE could have two buttons for running tests, one where the assumption is that the tests will pass and another where they are expected to fail. Then when an assumption is broken, it would highlight all code that was written since the last time tests passed and/or assumptions were correct, which might help in inspecting the code.

Or maybe all problems like this can be found automatically with mutation testing and I won't need a special procedure to avoid introducing them?


UPDATE: In a following blog post I'm experimenting a better way of doing this refactoring.

2012-05-01

Unit Test Focus Isolation

Good unit tests are FIRST. The I in FIRST stands for Isolation and is easily confused with the R, Repeatability. Ironically the I is itself not well isolated. I want to take a moment to focus on an often forgotten side of unit test isolation: test focus.

A good unit test focuses on testing just one thing and it doesn't overlap with other tests - it has high cohesion and low coupling. Conversely, if you change one rule in the production code, then only one unit test should fail. Together with well named tests this makes it easy to find the reason for a test failure (and giving tests meaningful names is easier when each of them focuses on just one thing).

I came up with a good example in Let's Code Jumi episode 200 (to be released around October 2012 - I have a big WIP ;). I'm showing here a refactored version of the code - originally the third test was all inline in one method and it might have been less obvious that what the problem was.

Example

The system under test is RunIdSequence, a factory for generating unique RunId instances. Here are the two unit tests which were written first:

    @Test
    public void starts_from_the_first_RunId() {
        RunIdSequence sequence = new RunIdSequence();

        RunId startingPoint = sequence.nextRunId();

        assertThat(startingPoint, is(new RunId(RunId.FIRST_ID)));
    }

    @Test
    public void each_subsequent_RunId_is_incremented_by_one() {
        RunIdSequence sequence = new RunIdSequence();

        RunId id0 = sequence.nextRunId();
        RunId id1 = sequence.nextRunId();
        RunId id2 = sequence.nextRunId();

        assertThat(id1.toInt(), is(id0.toInt() + 1));
        assertThat(id2.toInt(), is(id1.toInt() + 1));
    }

These unit tests are well isolated. The first focuses on what is the first RunId in the sequence, the second focuses on what is the relative difference between subsequent RunIds. The second test is unaware of the absolute value of the first RunId, so the tests don't overlap. I can easily make just one of them fail and the other pass.

The RunIdSequence needs to be thread-safe, so here is the third test, with the relevant bits highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        List<RunId> results = new ArrayList<RunId>();
        // XXX: knows what is the first ID (RunId.FIRST_ID, even worse would be to use the constant 1)
        // XXX: knows how subsequent IDs are generated (increase by 1)
        for (int id = RunId.FIRST_ID; id < RunId.FIRST_ID + count; id++) {
            results.add(new RunId(id));
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

This test is not isolated. It defines same things as those two other tests, so it has overlap with them: it knows what is the first RunId and how subsequent values are generated. If one of the two first tests fail, also this test will fail, even though this test is meant to focus on thread-safety just as its name says.

Here is an improved version of the same test, with changes highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        RunIdSequence sequence = new RunIdSequence();

        List<RunId> results = new ArrayList<RunId>();
        for (int i = 0; i < count; i++) {
            results.add(sequence.nextRunId());
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

Now it doesn't anymore know about those two design decisions. It's focused on just testing thread-safety and doesn't duplicate the other tests nor the production code. I can change the production code to make any one of the three tests to fail while the other two still pass.

Conclusion

When I try to write as isolated/focused tests as possible, it makes it easy to find the cause of a test failure. If I don't know why some code exists, I can comment it out and see which tests fail - their names will tell why the code was written. When removing features, I can remove whole tests which define that feature, instead of having to update non-cohesive tests. When changing features, I need to update only a few tests.

P.S. I've been thinking that it might be possible to measure automatically the "unitness" of tests using mutation testing tools such as PIT. The fewer tests per mutation fail, the better. And if a test always fails together with other tests, then it's a bad thing. It might help in pinpointing tests which need some refactoring.