Esko Luontola

After Rails Girls

2015-05-08T22:07:00.002+03:00

Rails Girls can be a very chaotic experience which throws you in the middle of an overwhelming pile of technology. But if you survived it and are willing to learn the things properly at your own pace, this post outlines some pointers.

It's easy to get started in making websites, but as soon as you enter the territory of programming logic, the learning curve rises sharply. You should be aware of your motives that why you are getting into software development. If you want to do it just because you've heard it pays well, it's probably not enough. Wanting to create a solution to some need that you and others have is more motivating and helps in focusing your efforts. Also it helps if you enjoy problem solving and figuring out how things work, because that's what you'll be doing every day.

The essence of programming (source)

Here is a roadmap of things you will need to learn sooner or later. The effort required for learning them is expressed using chili peppers. can be learned in a few days, requires some weeks of focused learning, and will take many months.

I want to create a web site or app

You will need to learn HTML, which is the language that every web site and web application is made out of. There is a good overview in MDN's Learning the Web guide.

Though you can learn HTML with all the files on just your own computer, eventually you will need a server for hosting your web site so that also others can see it. You can get started with GitHub Pages.

I want my site to look pretty

You will need to learn CSS, which is the language for describing the fonts, colors and layouts of web sites. You can learn bits of CSS easily as you go, though learning the intricacies of CSS layouts means at least worth of fiddling.

I want my site to do things

You will need to learn programming. The exact choice of language doesn't matter much. The hard part is learning to think like a programmer, but after you can program in one language, you can learn another language in a matter of days or weeks.

In order to change a web page while the user is on it, you will have to use JavaScript. In order to generate different HTML when the user enters or reloads a web page, any general purpose programming language can be used, such as Ruby.

I want my site to remember things

Almost every application uses a database for storing information. You will need to learn the basics of SQL, which is the language for communicating with a relational database (for example PostgreSQL). Another popular kind of database is a document database (for example CouchDB), in which case the query language is usually something other than SQL.

I can't understand the code I wrote last month

Congratulations, you've learned enough programming to be dangerous. Anybody can write code that the computer understands, but it requires skill to write code that other programmers can understand. You should start learning the principles of writing good code. Clean Code: A Handbook of Agile Software Craftsmanship is a good starting point and my site has additional resources.

My code doesn't always work right

To make sure that the code does what you think it should do, you should write automated tests for it. This can be done by learning Test-Driven Development which will guarantee that all the code you write is tested. TDD Tetris Tutorial is a good starting point.

I want to work on a project together with others

Version control is used for sharing code changes with other developers. It's also useful when working alone, because it lets you reliably return to earlier versions of your code and also works as a backup. The most popular choice is using the Git version control system and the GitHub repository hosting service.

I have a question

The first step in solving a problem is doing a Google search. If you can't find an answer, then for programming related questions you can post your question on Stack Overflow, but first learn how to make an SSCCE to get your code related questions answered. It's also good to visit meetups to get in touch with the local developer community and talk with other developers, especially for questions which have multiple correct answers (e.g. "which tool or language should I use for X").

I want a job

To get a job in technology, more important than your CV or degrees is that you have some projects which you can showcase. Create a complete application, make it available for others to use and put its code up on GitHub. Start a blog and post there regularly things you've learned and what you're thinking about. Attend local meetups regularly and discuss with other developers to get some contacts and to hear about open positions for a junior developer.

When to Refactor

2014-12-31T10:28:00.000+02:00

How to maintain the balance between adding new features and refactoring existing code? Here are some rules of thumb for choosing when to refactor and how much.

Refactoring is the process of improving the code's design without affecting its functionality. Is it possible to over-refactor? I don't think that code can ever be "too clean", and following the four elements of Simple Design should not result in over-engineering. But certainly there is code that needs more cleaning up than other cases, and we rarely have enough time to do all we want. That's why prioritization is needed.

When I refactor, it's usually in one of the following situations.

After getting a test to pass

At unit scale TDD (as opposed to system scale TDD), writing a test and making it pass takes only a few minutes (or you're working in too big steps). After getting a test to pass, it's good to take a moment to look at the code we just wrote and clean it up. Basically it comes down to removing duplication and improving names, i.e. Simple Design. This takes just a minute or two.

This is also a good time to fix any obvious design smells while they are still small. For example Primitive Obsession gets the harder to fix the more widespread it is. This usually takes just a few minutes and at most an hour. Very faint design smells I would leave lying around until they ripen enough for me to know how to fix them - but not too long, so that they begin to rot.

When adding a feature is hard

If the system's design does not make it easy to add a feature I'm currently working on, I would first refactor the system to make adding that feature easier. If this is the second or third instance of a similar feature [1], I would refactor the code to follow the Open-Closed Principle, so that in the future adding similar features will be trivial. This might take from half an hour up to a couple of hours.

When the difficulty of adding a feature hits you right away like a ton of bricks, then it's obvious to do the refactoring first. But what if a difficulty sneaks up on you during implementing the feature? Trying to refactor and implement features at the same time is a road to pain and suffering. Instead, retreat to the last time that all tests passed (either revert/stash your changes or disable the new feature's one failing test), after which you can better focus on keeping the tests green while refactoring.

When our understanding of what would be the correct design improves

When we start developing a program, we have only partial understanding of the problem being solved, but we'll do our best to make the code reflect our current understanding of the problem. As the program grows over months and years, we will learn more and inevitably there will be parts of the code that we would have designed differently, if we only had then known what we know today. This is the original definition of the Technical Debt metaphor and the ability to pay back the debt depends on how clean the code is.

For big refactorings, it is unpractical to block adding new features while the design is being changed. So working towards a new design should be done incrementally at the same time as developing new features. Whenever a developer needs to change code that does not yet conform to the target design, they should refactor that part of the codebase there and then, before implementing the feature at hand. This way it might take many weeks or months for the whole codebase to be refactored, but it is done incrementally in small steps, a class or a method at a time (which should not take more than a couple of hours), so that the software keeps working at all times.

When trying to understand what some piece of code does

If you need to understand some code, even if you're not going to change it, refactoring the code is one means for understanding it better. Extract methods and variables, give them better names and move things around until the code says clearly what it does. You may combine this with writing unit tests, which likewise helps to understand the code.

If the code has good test coverage, you might as well commit the changes you just did, in hopes of the next reader understanding the code faster [2]. But even if the code has no tests, you can do some refactoring to understand it and then throw away your changes - your understanding will remain. If you know that you're going to throw away your changes, you can even do the throwaway refactoring faster with less care. And for complex refactorings, when you're not sure about what sequence of steps would bring you safely to your goal, prodding around the code can help you to get a feel for the correct refactoring sequence.

TL;DR

Refactoring does not have to be, nor should be, its own development phase which takes weeks or months. Instead it can be done incrementally in small steps, interleaved with feature development.

Notes

[1]: If the shape of the code is developing into a direction that you've seen happen many times in the past, it's easy to know how to refactor it already when the second duplicate instance raises its head. But if you're uncertain of what the code should be like, it may be worthwhile to leave the second duplicate be and wait for the third duplicate before creating a generic solution, so that you can clearly see which parts are duplicated and which vary.

[2]: Sometimes I wonder whether a refactoring made the code better, or I just understand it better because of spending time refactoring it.

This article was first published in the Solita developer blog. There you can find also other articles like this.

Phase Change Pattern for Mutating Immutable Objects

2014-12-02T00:21:00.001+02:00

Here is a design pattern I've found useful for "mutating" immutable objects in non-functional languages, such as Java, which don't have good support for it (unlike for example Scala's copy methods). I shall call it the Phase Change Pattern, because of how it freezes and melts objects to make them immutable and back again mutable, the same way as water can be frozen and melted.

This pattern consist of the following parts:

An immutable class with some properties
A mutable class with the same properties
The mutable class has a freeze() method for converting it to the immutable class
The immutable class has a melt() method for converting it to the mutable class
Both of the classes have package-private copy constructors that take the other class' instance as parameter, copying all fields (making immutable/mutable copies of the field values when necessary)

The freeze method plays a similar role as the build method in the Builder pattern, but it's named after Ruby's freeze method. The name melt was chosen as a metaphor based on the thermodynamic phase changes. I think it has better connotations than the other alternatives I considered: "build - destroy", "save - edit", "persist - dispersist" (or can "transient" be made a verb?)

Code Example

The following code is taken from the Jumi Test Runner project where I originally invented this pattern.

Usage

The classes can be used like this to create nice immutable classes that may be freely passed around:

SuiteConfiguration config = new SuiteConfigurationBuilder()
        .addToClasspath(Paths.get("something.jar"))
        .addJvmOptions("-ea")
        .freeze();

But the pattern also makes it possible, in a method that takes the immutable object as parameter, to augment it with new values:

config = config.melt()
        .addJvmOptions("-javaagent:extra-agent.jar")
        .freeze();

This is useful in situations where the code that creates the original immutable object does not know all the arguments, but some of the arguments are known only much later by some other code.

Immutable Class

Here is the immutable class. Its default constructor sets all fields to their default values. The copy constructor will need to make immutable copies of all mutable properties (e.g. java.util.List). The copy constructor takes the builder as parameter, making it easier to match the field names than if each property had its own constuctor parameter. The cyclic dependency is a small price to pay for this convenience. There are getters for all properties. Also this class can be made a value object by overriding equals, hashCode and toString.

@Immutable
public class SuiteConfiguration {

    public static final SuiteConfiguration DEFAULTS = new SuiteConfiguration();

    private final List<URI> classpath;
    private final List<String> jvmOptions;
    private final URI workingDirectory;
    private final String includedTestsPattern;
    private final String excludedTestsPattern;

    public SuiteConfiguration() {
        classpath = Collections.emptyList();
        jvmOptions = Collections.emptyList();
        workingDirectory = Paths.get(".").normalize().toUri();
        includedTestsPattern = "glob:**Test.class";
        excludedTestsPattern = "glob:**$*.class";
    }

    SuiteConfiguration(SuiteConfigurationBuilder src) {
        classpath = Immutables.list(src.getClasspath());
        jvmOptions = Immutables.list(src.getJvmOptions());
        workingDirectory = src.getWorkingDirectory();
        includedTestsPattern = src.getIncludedTestsPattern();
        excludedTestsPattern = src.getExcludedTestsPattern();
    }

    public SuiteConfigurationBuilder melt() {
        return new SuiteConfigurationBuilder(this);
    }

    @Override
    public boolean equals(Object that) {
        return EqualsBuilder.reflectionEquals(this, that);
    }

    @Override
    public int hashCode() {
        return HashCodeBuilder.reflectionHashCode(this);
    }

    @Override
    public String toString() {
        return ToStringBuilder.reflectionToString(this, ToStringStyle.SHORT_PREFIX_STYLE);
    }


    // getters

    public List<URI> getClasspath() {
        return classpath;
    }

    public List<String> getJvmOptions() {
        return jvmOptions;
    }

    public URI getWorkingDirectory() {
        return workingDirectory;
    }

    public String getIncludedTestsPattern() {
        return includedTestsPattern;
    }

    public String getExcludedTestsPattern() {
        return excludedTestsPattern;
    }
}

Mutable Class

Here is the mutable class. Its constructors and freeze method are a dual to the immutable class' constructors and melt method. The defaults are written only once, in the immutable class. This class has both getters and setters for all the properties. All the mutator methods return this to enable method chaining.

@NotThreadSafe
public class SuiteConfigurationBuilder {

    private final List<URI> classpath;
    private final List<String> jvmOptions;
    private URI workingDirectory;
    private String includedTestsPattern;
    private String excludedTestsPattern;

    public SuiteConfigurationBuilder() {
        this(SuiteConfiguration.DEFAULTS);
    }

    SuiteConfigurationBuilder(SuiteConfiguration src) {
        classpath = new ArrayList<>(src.getClasspath());
        jvmOptions = new ArrayList<>(src.getJvmOptions());
        workingDirectory = src.getWorkingDirectory();
        includedTestsPattern = src.getIncludedTestsPattern();
        excludedTestsPattern = src.getExcludedTestsPattern();
    }

    public SuiteConfiguration freeze() {
        return new SuiteConfiguration(this);
    }


    // getters and setters

    public List<URI> getClasspath() {
        return classpath;
    }

    public SuiteConfigurationBuilder setClasspath(URI... files) {
        classpath.clear();
        for (URI file : files) {
            addToClasspath(file);
        }
        return this;
    }

    public SuiteConfigurationBuilder addToClasspath(URI file) {
        classpath.add(file);
        return this;
    }

    ...
}

Most of the mutators have been omitted for brevity. The full source code of these classes is in Github.

Continuous Discussion: Agile, DevOps and Continuous Delivery

2014-10-10T12:56:00.000+03:00

Two days ago I participated in a online panel discussing Agile, DevOps and Continuous Delivery. The panel consisted of 11 panelists and was organized by Electric Cloud. Topics discussed include:

What are/were your big Agile obstacles?
What does DevOps mean to you?
What is Continous Delivery? What does it take to get there?

You can watch a recording of the session at Electric Cloud's blog. There is also information about the next online panel that will be arranged.

Java 8 Functional Interface Naming Guide

2014-07-17T17:27:00.001+03:00

People have been complaining about the naming of Java 8's functional interfaces in the java.util.function package. Depending on the types of arguments and return values a function may instead be called a consumer, supplier, predicate or operator, which is further complicated by two-argument functions. The number of interfaces explodes because of specialized interfaces for some primitive types. In total there are 43 interfaces in that package. Compare this to Scala where just three interfaces (with compiler optimizations) cover all the same use cases: Function0, Function1 and Function2.

To make some sense out of this all, I wrote a program that generates the interface names based on the function's argument types and return type. The names it generates match those in the Java library. For this article I also made visualization which you can see below.

Visually

Click to make it bigger.

Textually

The interface name is determined by the following algorithm. The names are described as regular expressions; the generic interfaces have the shortest names, but specialized interfaces for functions taking or returning primitives include the primitive types in the interface name.

Does it return void?
- Arity 0: Runnable
- Arity 1: (|Int|Long|Double)Consumer
- Arity 2:
  - Both arguments are generic: BiConsumer
  - First argument is generic: Obj(Int|Long|Double)Consumer
Does it take no arguments?
- Arity 0: (|Int|Long|Double|Boolean)Supplier
Does it return boolean?
- Arity 1: (|Int|Long|Double)Predicate
- Arity 2: BiPredicate
Do all arguments have the same type as the return value?
- Arity 1: (|Int|Long|Double)UnaryOperator
- Arity 2: (|Int|Long|Double)BinaryOperator
Otherwise:
- Arity 1: (|Int|Long|Double)(|ToInt|ToLong|ToDouble)Function
- Arity 2: (|ToInt|ToLong|ToDouble)Function

The method names follow. When the return type is a primitive, the method name is a bit longer (this is apparently because the JVM supports method overloading also based on the method return type, but the Java language does not).

Runnable: run
Consumers: accept
Suppliers: get(|AsInt|AsLong|AsDouble|AsBoolean)
Predicates: test
Functions and operators: apply(|AsInt|AsLong|AsDouble)

I hope that helps some of you in remembering what interface to look for when browsing the API.

Lambda Expressions Backported to Java 7, 6 and 5

2013-07-23T15:36:00.001+03:00

Do you want to use lambda expressions already today, but you are forced to use Java and a stable JRE in production? Now that's possible with Retrolambda, which will take bytecode compiled with Java 8 and convert it to run on Java 7, 6 and 5 runtimes, letting you use lambda expressions and method references on those platforms. It won't give you the improved Java 8 Collections API, but fortunately there are multiple alternative libraries which will benefit from lambda expressions.

Behind the Scenes

A couple of days ago in a café it popped into my head to find out whether somebody had made this already, but after speaking into the air, I did it myself over a weekend.

The original plan of copying the classes from OpenJDK didn't work (LambdaMetafactory depends on some package-private classes and would have required modifications), but I figured out a better way to do it without additional runtime dependencies.

Retrolambda uses a Java agent to find out what bytecode LambdaMetafactory generates dynamically, and saves it as class files, after which it replaces the invokedynamic instructions to instantiate those classes directly. It also changes some private synthetic methods to be package-private, so that normal bytecode can access them without method handles.

After the conversion you'll have just a bunch of normal .class files - but with less typing.

P.S. If you hear about experiences of using Retrolambda for Android development, please leave a comment.

Refactoring Primitive Obsession

2013-03-02T13:57:00.001+02:00

Primitive Obsession means using a programming language's generic type instead of an application-specific domain object. Some examples are using an integer for an ID, a string for an address, a list for an address book etc. Others have explained why to fix it - this article is about how to fix it.

You can see an example of refactoring Primitive Obsession in James Shore's Let's Play TDD episodes 13-18. For a quick overview, you may watch episode #14 at 10-12 min and episode #15 at 0-3 min, to see him plugging in the TaxRate class.

The sooner the Primitive Obsession is fixed, the easier it is. In the above videos it takes just a couple of minutes to plug in the TaxRate class, but the Dollars class takes over half an hour. James does the code changes manually, without automated refactorings. For a big project with rampant Primitive Obsession it will easily take many hours, even days, to fix the problem of a missing core domain type.

Here I'm presenting some tips of using fully automated refactorings to solve Primitive Obsession. I'm using IntelliJ IDEA's Java refactorings, but the ideas should, to some extent, be applicable also to IDEs with inferior refactoring support.

The Example

Let's assume that we have a project that uses lots of "thingies" which are saved in a database. The thingies each have an ID that at the moment is just an integer. To avoid the thingy IDs getting mixed with other kinds of IDs, we create the following value object:

public final class ThingyId {

    private final int id;

    public ThingyId(int id) {
        this.id = id;
    }

    public int toInt() {
        return id;
    }

    @Override
    public boolean equals(Object obj) {
        if (!(obj instanceof ThingyId)) {
            return false;
        }
        ThingyId that = (ThingyId) obj;
        return this.id == that.id;
    }

    @Override
    public int hashCode() {
        return id;
    }

    @Override
    public String toString() {
        return getClass().getSimpleName() + "(" + id + ")";
    }
}

Creating such a class is easy, but putting it to use is not so when the primitive ID is used in a couple of hundred places…

Starting Small

Refactoring Primitive Obsession is quite mechanical, but because it requires cascading changes, it's very easy to mess things up. So it's best to start small and proceed in small steps.

It makes sense to start from a central place from where the change can be propagated to the whole application. For example by starting to use ThingyId inside this one class, without changing its public interface:

(Cannot see the video? Watch it as GIF)

This example refactoring had to be done manually, because the field was mutable (we must update all reads and writes to the field in one step), but the following refactorings can be done with the help of automatic refactorings.

Pushing Arguments Out

When there is a method which wraps one of its arguments into ThingyId, we can propagate it by pushing the act of wrapping outside the method. In IntelliJ IDEA this can be done with the Extract Parameter (Ctrl+Alt+P) refactoring:

(Cannot see the video? Watch it as GIF)

Pushing Return Values

When there is a method which unwraps its return value from ThingyId to int, we can propagate the unwrapping outside the method. There is no built-in refactoring for that, but it can be accomplished by combining Extract Method (Ctrl+Alt+M) and Inline (Ctrl+Alt+N).

First extract a method that does the same as the old method, but does not unwrap ThingyId. Then inline the original method and rename the new method to be the same as the original method.

(Cannot see the video? Watch it as GIF)

Pushing Return Values of Interface Methods

A variation of the previous refactoring is required when the method is part of an interface. IntelliJ IDEA 12 does not support inlining abstract methods (I would like it to ask that which of the implementations to inline), but since IDEA can refactor code that doesn't compile, we can copy and paste the implementation into the interface and then inline it:

(Cannot see the video? Watch it as GIF)

Pushing Arguments In

Instead of trying to refactor a method's arguments from the method caller's side, it's better to go inside the method and use Extract Parameter (Ctrl+Alt+P) as described earlier. Likewise for a method's return value. This leaves us with some redundant code, as can be seen in this example. We'll handle that next.

(Cannot see the video? Watch it as GIF)

Removing Redundancy

By following the above tips you will probably end up with some redundant wrapping and unwrapping such as new ThingyId(thingyId.toInt()) which is the same as thingyId. Changing one such thing manually would be easy, but the problem is that there are potentially tens or hundreds of places to change. In IntelliJ IDEA those can be fixed with one command: Replace Structurally (Ctrl+Shift+M).

In the following example we use the search template "new ThingyId($x$.toInt())" and replacement template " $x$ ". For extra type safety, the $x$ variable can be defined (under the Edit Variables menu) to be an expression of type ThingyId.

(Cannot see the video? Watch it as GIF)

Updating Constants

When there are constants (or final fields) of the old type, as is common in tests, those can be updated by extracting a new constant of the new ThingyId type, redefining the old constant to be an unwrapping of the new constant, and finally inlining the old constant:

(Cannot see the video? Watch it as GIF)

Finding the Loose Ends

The aforementioned refactorings must be repeated many times until the whole codebase has been migrated. To find out what refactoring to do next, search for the usages of the new type's constructor and its unwrapping method (e.g. ThingyId.toInt()). Use an appropriate refactoring to push that usage one step further. Repeat until all the usages are at the edges of the application (e.g. saving ThingyId to database) and cannot be pushed any further.

And as always, run all your tests after every step. If the tests fail and you cannot fix them within one minute, you're about to enter Refactoring Hell and it's the fastest that you revert your changes to the last time when all tests passed. Reverting is the easiest with IntelliJ IDEA's Local History which shows every time that you ran your tests and whether they passed or failed, letting you revert all your files to that time. The other option is to commit frequently, after every successful change (preferably rebased before pushing), and revert using git reset --hard.

This article was originally published at my company's blog.

Faster JUnit Tests with Jumi Test Runner and Class Loader Caching

2013-02-12T11:31:00.000+02:00

Jumi test runner version 0.4 adds JUnit backward compatibility, so that Jumi can run existing JUnit tests out-of-the-box. All testing frameworks that can be run with JUnit may also be run on Jumi, so there is a low barrier of entry.

One advantage for JUnit users to try out Jumi is faster test execution. In this article I'm showing some benchmark results of the current (non-optimized) Jumi version, and estimates of how much more faster it will at least get once I implement some performance optimizations.

Class Loader Overhead

According to earlier experiences of parallelizing JUnit, "for a fairly optimized unit-test set, expect little or no gain - maybe 15-20%." That's because of the overhead of class loading. With Java 7 the class loaders are at least parallel capable, but I'm still expecting it to affect the test run times considerably.

That's why in this benchmark I'm using a project that has exactly such CPU-bound unit tests. It'll be much more interesting than looking at slow or IO-bound integration tests that scale much more easily to multiple CPU cores. ;)

Benchmark Setup

As a test subject I'm using the unit tests from Dimdwarf's core module (mixed Java and Scala). It has over 800 unit tests, they all are CPU-bound and take just a few of seconds to run. Over half of the tests have been written using JDave, a Java testing framework that runs on JUnit. The rest have been written using Specsy, a testing framework for Java, Scala, Groovy and with little effort any other JVM based language. Specsy 1 ran on JUnit, but Specsy 2 runs on Jumi, which allows test method level parallelism and solves a bunch of issues Specsy had with JUnit's limited execution model. Running JUnit tests on Jumi is limited to test class level parallelism (until JUnit itself implements Jumi support).

All measurements were run on Core 2 Quad Q6600 @ 3.0 GHz, 4 GB RAM, jdk1.7.0_07 64bit, Windows 7. The measurements were repeated 11 times and the median run times are reported. The program versions used were: Jumi 0.4.317, JUnit 4.8.2, IntelliJ IDEA 12.0.3 UE, Maven Surefire Plugin 2.13.

For those measurements which were started from IntelliJ IDEA, the Java compiler and code coverage were disabled to avoid their latency and overhead. The measurement was started when IDEA's Run Tests button was clicked, and stopped when IDEA showed all tests finished. The time was measured at 1/30 second accuracy using a screen recorder (that recorded just a small screen area - barely noticable on CPU usage).

For those measurements which were started from Maven, the time was measured starting from when the text "maven-surefire-plugin:2.13:test" shows up, until the "Results" line shows up. The time was measured using a screen recorder, same as above.

For the synthetic Jumi benchmarks which estimate future optimizations, the time was measured using System.currentTimeMillis() calls inside the Jumi test runner daemon process. Also Jumi was modified to run the test suite inside the same JVM multiple times (because I haven't yet implemented connecting to an existing test runner daemon process). It's safe to assume that the inter-process communication latency is much smaller than 100 ms, so the results should be fairly accurate.

The Results

Click to see them bigger.

Benchmark 1: JUnit, IDEA-JUnit, 1 thread

As a baseline the tests were run with the JUnit test runner from within IDEA. JUnit doesn't support parallel execution, so this test run was single threaded. The result was 5.8 seconds.

Benchmark 2: JUnit, Maven Surefire, 1 thread

As another baseline the Maven Surefire Plugin was used, which gave 5.0 seconds. Surprisingly Maven was much faster than IDEA's JUnit integration. This is probably due to the initialization that IDEA does in the beginning of a test run, presumably to discover the test classes and create a list of the tests to be run, or then related to the real-time test reporting.

Though Surefire supports running tests in parallel, I wasn't able to make it work - it threw a NullPointerException inside the Surefire plugin, probably due to an incompatibility with the Specsy 1 testing framework. Since JUnit was not originally designed to run tests in parallel and JUnit is very relaxed in what kinds of events it accepts from testing frameworks, this kinds of incompatibilities are to be expected.

Benchmark 3: Jumi, IDEA-JUnit, 1 thread

This was run with the Jumi 0.4 test runner, but since it doesn't yet have IDE integration, the test run was bootstrapped from a JUnit test which was started using IDEA. So it has the overhead of one extra JVM startup and IDEA's test initialization.

The result was one second slower than running JUnit tests directly with IDEA. Based on my experiences, a bit over half a second of that is due to the time it takes to start a second JVM process for the Jumi daemon. The rest is probably due to the class loading on the Jumi launcher side - for example at the moment it uses for communication Netty, which is quite a big library (about 0.5 MB of class files), so loading its classes takes hundreds of milliseconds.

Benchmark 4: Jumi, IDEA-JUnit, 4 threads

As expected, running on multiple threads does not make unit tests much faster. Adding threads takes Jumi from 6.8 seconds to 5.1 seconds, only 25% less time, barely cancelling out the overhead of the extra JVM creation.

As we'll see, the majority of the time is spent in class loading. Also this Jumi version has not yet been optimized at all, so the class loading overhead is probably even more severe than it needs to be (one idea I have is to run in parallel threads tests that use a different subset of classes - that way they shouldn't be blocked on loading the same classes that much).

Benchmark 5: Jumi, IDE integration (estimated), 1 thread

This benchmark measures how fast Jumi would be with IDE integration. This was measured around the code that launches Jumi and shows the test results, as it would be done by an IDE. Since this code would run in the same process as the Java IDE, the measurement was done in a loop inside one JVM instance, to eliminate the overhead of class loading on the Jumi launcher side and to let the JIT compiler warm up.

This gives about the same results as running JUnit tests directly in IDEA. Though it's not as fast as Surefire, maybe because Surefire's test reporting is more minimal, or because Jumi's daemon side has more class loading overhead for its own internal classes (as mentioned abobe when discussing benchmark 3).

Benchmark 6: Jumi, IDE integration (estimated), 4 threads

The absolute speedup over 1 thread is about the same as before (under 2 seconds), but with the JVM startup overhead away we are now on the winning side.

Benchmark 7: Jumi, persistent daemon process (estimated), 1 thread

Jumi's daemon process, which runs the tests, will eventually be reused for multiple test suite runs. The benefits of that feature were estimated by modifying Jumi to run the same tests multiple times in the same JVM.

This avoids the JVM startup overhead completely on subsequent test suite runs, and the JIT compiler starts kicking in for Jumi's internal classes (the first suite run is about 1 second slower). At 4.5 seconds we are now better than JUnit also in single-threaded performance.

Benchmark 8: Jumi, persistent daemon process (estimated), 4 threads

With 4 threads we see the same about 2 second speedup over a single thread as before. The tests are still dominated by the same amount of class loader overhead as before, but now we are anyways down by 50%, or 2× faster than JUnit.

Benchmark 9: Jumi, class loader caching of dependencies (estimated), 1 thread

For this benchmark Jumi was modified further to reuse the class loader that loads all libraries used by the system under test. In this particular project this includes big libraries such as the Scala standard library, Google Guice, Apache MINA and CGLIB. The project's own production and test classes, as well as some testing libraries (that didn't work well with multiple class loaders), were not cached, but their class loader was re-created for each test suite run.

We see a 40% improvement over just the persistent daemon process, the same speedup as if we had used multiple threads. The JIT compiler starts now kicking in, so that the full speed is reached only on the third or fourth test suite run (1st run 5.2s, 2nd run 3.1s, 3rd run 3.0s, 4th run 2.7s) after it has had some time to optimize the library dependencies' code.

Benchmark 10: Jumi, class loader caching of dependencies (estimated), 4 threads

With 4 threads we are down to 1.4 seconds, an improvement of 75%, already 4× faster than JUnit!

Benchmark 11: Jumi, class loader caching of all classes (estimated), 1 thread

This is an estimate of the ideal situation of having no class loader overhead. In this benchmark Jumi was modified to create just a single class loader for both the dependencies and application classes, and then reuse that for all test suite runs. After the first run there is no more class loading to be done, and on the third or fourth run the JIT compiler has optimized it.

We see that under ideal circumstances, it takes only 1.6 seconds to run all the tests single-threadedly. It's arguable whether the class loading overhead can be eliminated this much with techniques such as reloading classes without hurting reliability.

Benchmark 12: Jumi, class loader caching of all classes (estimated), 4 threads

With 4 threads we get down to 0.9 seconds, which is about 50% less time than the single-threaded benchmark. The ideal speedup on 4 cores would have been 75%, down to 0.4 seconds. This shows us that there is some contention that needs to be optimized inside Jumi. One possible area of improvement is the message queues - now all test threads write to the same queue, which violates the single writer principle, and also it doesn't take advantage of batching.

Conclusions

In this benchmark we only looked at speeding up a fast unit test suite, which are notoriously hard to speed up on the JVM. Slow integration tests should get much better speed improvements when run on multiple CPU cores. In the Jumi wiki there are some tips on how to make integrations tests isolated, so that they can be run in parallel.

Jumi's JUnit compatibility gives the ability to run JUnit tests in parallel at the test class level. For test method level parallelism the testing frameworks must implement the Jumi Driver API. Right now you can get that full parallelism with Specsy for all JVM based languages (at the moment Scala/Groovy/Java, but creating a wrapper for a new languages is simple) and hopefully other testing frameworks will follow suit.

Once somebody* implements IDE integration for Jumi, its single-threaded speed will be on par with older test runners, and its native parallel test execution will push it ahead. In near future, when Jumi implements the persistent daemon process, class loader caching, test order priorization (run the most-likely-to-fail tests first, similar to JUnit Max) and other optimizations, it will be the fastest test runner ever. :)

* I'll need to rely on the expertise of others who have IDE plugin development experience. Three big IDEs is too much for one small developer... :(

You can get started on using Jumi through Jumi's documentation. You're welcome to ask questions on the Jumi mailing list. Please come there to tell us that which features you would like to see implemented first. Testing framework, IDE and build tool developers are especially encouraged to get in touch, to tell about their needs and to develop tool integration.

New Testing Tools: Jumi 0.2 & Specsy 2

2012-12-26T00:09:00.000+02:00

New versions of two of my projects have just been released. They both are testing tools for Java/JVM-based languages.

Jumi 0.2

Jumi is a new test runner for the JVM, which overcomes the JUnit test runner's limitations to better support all testing frameworks, and offers better performance and usability to end-users.

This release is ready for early adopters to start using Jumi. It adds support for running multiple test classes (though they must be listed manually). The next releases will add automatic test discovery (so you can identify all tests with e.g. "*Test") and JUnit backward compatibility (so that Jumi will run also JUnit tests).

The Specsy testing framework has been upgraded to run using Jumi. We hope that more testing frameworks will implement Jumi support in near future - please come to the Jumi mailing list if you're a tool vendor interested in implementing Jumi support.

Specsy 2

The Specsy testing framework supports now more languages than ever (Specsy 1 was Scala-only). For now it supports Scala (2.7.7 and higher), Groovy (any version) and Java (7 or higher; lambdas strongly recommended), but it's only a matter of adding one wrapper class to add support for a new JVM-based language.

Specsy 2 runs using the new Jumi test runner, fixing a bunch of issues that Specsy 1.x had with the JUnit test runner's limited expressiveness. Actually Specsy 2 was released already in September, but Jumi wasn't then ready for general use, but now it is.

Continuous Delivery with Maven and Go into Maven Central

2012-08-09T01:33:00.000+03:00

In continuous delivery the idea is to have the capability to release the software to production at the press of a button, as often as it makes sense from a business point of view, even on every commit (which would then be called continuous deployment). This means that every binary built could potentially be released to production in a matter of minutes or seconds, if the powers that be so wish.

Maven's ideology is opposed to continuous delivery, because with Maven it must be decided before building whether the next artifact will be a release or a snapshot version, whereas with continuous delivery it will be known only long after building the binaries (e.g. after it has passed all automated and manual testing) whether the binary is fit for release.

In this article I'll explain how I tamed Maven to support continuous delivery, and how to do continuous delivery into the Maven Central repository. For continuous integration and release management I'm using Go (not to be confused with Go, go or other uses of go). Git's distributed nature comes also into use, when creating new commits and tags during each build.

The project in question is Jumi, a new test runner for Java to replace JUnit's test runner. It's an open source library and framework which will be used by testing frameworks, build tools, IDEs and such, so the method of distribution is publishing it to Maven Central through Sonatype OSSRH.

Version numbering with Maven to support continuous delivery

Maven Release Plugin is diametrically opposed to continuous delivery, as is the use of snapshot version numbers in Maven, so I defined the project's version numbering scheme to use the MAJOR.MINOR.BUILD format. In the source repository the version numbers are snapshot versions, but before building the binaries the CI server will use the following script to read the current version from pom.xml and append the build number to it. For example, the latest (non-published) build is build number 134 and the version number in pom.xml is 0.1-SNAPSHOT, so the resulting release version number will be 0.1.134.

In the project's build script I'm then using Versions Maven Plugin to change the version numbers in all modules of this multi-module Maven project. The changed POM files are not checked in, because in continuous delivery each binary should anyways be built only once. To make the sources traceable I'm tagging every build, but I publish those tags only after releasing the build - more about that later.

P.S. versions:set requires the root aggregate POM to extend the parent POM, or else you will need to run it separately for both the root and the parent POMs by giving Maven the --file option. Also, to keep all plugin version numbers in the parent POM's dependencyManagement section, the root should extend parent, or else you will need to define the plugin version numbers on the command line and can't use the shorthand commands for running Maven plugins (unless you're OK with Maven deciding itself that which version to use, which can produce unrepeatable builds).

Managing the deployment pipeline with Go

Before going into further topics, I'll need to explain a bit about Go's artifact management and give an overview of Jumi's deployment pipeline.

Go is more than just a continuous integration server - it is designed for release and deployment management. (I find that Go fits that purpose better than for example Jenkins.) In Go the builds are organized into a deployment pipeline, as described in the Continuous Delivery book. All pipeline runs are logged forever* and configuration changes are version controlled (Go uses internally Git), so it also provides full auditability especially when it's used for deployment. The commercial version has additional deployment environment and access permission features, but the free version has been adequate for me for now (though being an open source project I could get the enterprise edition for free). Update 2012-09-20: Since Go 12.3 the free version has the same features as the commercial version - only the maximum numbers of users and remote agents differ. Update 2014-02-25: Go is now open source and fully free!

* For experimenting I recommend cloning a pipeline with a temporary name, to avoid the official pipeline's history from being filled with experimental runs. You can't remove things from the history (except by removing or hacking the database), but you can hide them by deleting the pipeline and not reusing the pipeline's name (if you create a new pipeline with the same name then the old history will become visible). Only build artifacts of old builds can be removed, and there's actually a feature for that to avoid the disk getting too full.

Go's central concepts are pipelines, stages and jobs. Each pipeline consists of one or more sequentially executed steps, and each step consists of one or more jobs (which can be executed in parallel if you have multiple build agents). A pipeline can depend on the stages of other pipelines and a stage can be triggered automatically or manually with a button press, letting you manage complex builds by chaining multiple pipelines together.

A pipeline may even depend on multiple pipelines, for example if the system is composed of multiple separately built applications, in which case one pipeline could be used to select that which versions to deploy together. Then further downstream pipelines or stages can be used to deploy the selected versions together into development, testing and finally into the production environment.

You can save artifacts produced by a job on the Go server and then access those artifacts in downstream jobs by fetching the artifacts into the working directory before running your scripts. Go uses environment variables to tell the build scripts about the build number, source control revision, identifiers of previous stages, custom parameters etc.

Here you can see two dependent pipelines from Jumi's deployment pipeline. Clicking the button in jumi-publish would trigger that pipeline using the artifacts from this particular build of the jumi pipeline. I can trigger the downstream pipeline using any previous build - it doesn't have to be the latest build.

Running a shell command in Go requires more configuration than in Jenkins (which has a single text area for inputting multiple commands), which has the positive side-effect that it drives you to store all build scripts in version control. I have one shell script for each Go job which in turn call a bunch of Ruby scripts and Maven commands.

Below is a diagram showing Jumi's deployment pipeline at the time of writing. The names of pipelines are in bold, stages underscored, and jobs are links to their respective shell scripts. For more details see the pipeline configuration.

Git
 |
 |
 |--> jumi (polling automatically)
      build         --> analyze
      build-release     coverage-report
         |
         |
         |--> jumi-publish (manually triggered)
              pre-check           --> ossrh           --> github
              check-release-notes     promote-staging     push-staging

The jumi/build stage builds the Maven artifacts and saves them on the Go server for use in later stages. It also tags the release and updates the release notes, but those commits and tags are not yet published, but they are saved on the Go server.

The jumi/analyze stage runs PIT mutation testing and produces line coverage and mutation coverage reports. They can be viewed in Go on their own tab.

The jumi-publish/pre-check stage makes sure that release notes for the release have been filled in (no "TBD" line items), or else it will fail the pipeline and prevent the release.

The jumi-publish/ossrh stage uploads the build artifacts from Go into OSSRH. It doesn't yet run the last Nexus command for promoting the artifacts from the OSSRH staging repository into Maven Central (I need to log in to OSSRH and click a button), because I haven't yet written smoke tests which would make sure that all artifacts were uploaded correctly.

The jumi-publish/github stage pushes to the official Git repository the tags and commits which were created in jumi/build. It will merge automatically if somebody has pushed there commits after this build was made.

Future plans for improving this pipeline include adding a new jumi-integration pipeline between jumi and jumi-publish. It will run consumer contract tests of programs using Jumi, against multiple versions of those programs to notice any backward incompatibility issues. This stage might eventually take hours to execute, in which case I may break it into multiple jobs and run them in parallel. I will also reuse a subset of those tests as smoke tests in the jumi-publish pipeline, after which I can automate the final step to promote from OSSRH to Maven Central.

Staging repository for Maven artifacts in Go

Creating publishable Maven artifacts happens with the Maven Deploy Plugin and the location of the Maven repository can be configured with altDeploymentRepository. I'm using -DaltDeploymentRepository="staging::default::file:staging" to create a staging repository with only this build's artifacts, which I then save on the Go server. (The file:staging path means the same as file://$PWD/staging but works also on Windows.)

That staging repository can be accessed from the Go server using HTTP, so it would be quite simple to let beta users access them (optionally using HTTP basic authentication). For example the URL to a build's staging repository could be http://build-server/go/files/jumi/134/build/1/build-release/staging/ Though inside Go jobs it's the easiest to fetch the staging repository to the working directory. That avoids the need to configure Maven's HTTP authentication settings.

When it is decided that a build can be published, I trigger the jumi-publish pipeline which uses the following script to upload the staging repository from a directory into a remote Maven repository. It uses curl to do HTTP PUT commands with HTTP basic authentication. (I wasn't able to find any documentation about the protocol of a Maven repository like Nexus, but was able to sniff it using Wireshark.)

In addition to the above uploading, the publish script uses Nexus Maven Plugin to close the OSSRH repository to which the artifacts were uploaded. It could also promote it to Maven Central, but I want to first create some smoke tests to make sure that all the necessary artifacts were uploaded. Until then I'll do a manual check before clicking the button in OSSRH to promote the artifacts to Maven Central.

Publishing to Maven Central puts some additional requirements on the artifacts. Since I'm not using Maven Release Plugin, I need to manually enable the sonatype-oss-release profile in Sonatype OSS Parent POM to generate all the required artifacts and to sign them. If you don't need to publish artifacts to Maven Central, then you might not need to do this signing. But if you do, it's good to know that the Maven GPG Plugin accepts as parameters the name and passphrase of the GPG key to use. They can be configured in the Go pipeline using secure environment variables which are automatically replaced with ******** in the console output. (For more security, don't save the passphrases on the Go server, but manually enter them when triggering the pipeline. Otherwise somebody with root access to the Go server could get the Go server's private key and decrypt the passphrase in Go's configuration files. Though using passphraseless SSH and GPG keys on the CI server is much simpler.)

Tagging the release and updating release notes with Git

When I do a release, I want it to be tagged and the version and date of the release added to release notes, which are in a text file in the root of the project. In order to get the release notes included in the tagged revision and for the tag to be on the revision which was built, that commit needs to be done before building (an additional benefit is that all GPG signing - the build artifacts and the tag - will be done in the build stage). Since I'm using Git, I can avoid the infinite loop, which would otherwise ensue from committing on every build, by pushing the commits only if the build is released.

The release notes for the next release can be read from the release notes file with a little regular expression (get-release-notes.rb). Writing the version number and date of the release into release notes is also solvable using regular expressions (prepare-release-notes.rb), as is preparing for the next release iteration by adding a placeholder for the future release notes (bump-release-notes.rb).

With the help of those helper scripts the build script, shown below, will be able to create a commit that contains the finalized release notes and tag it with a GPG signed tag (I'm including the release notes also in the tag message). It saves the release metadata into files, so that later stages of the pipeline would not need to recalculate them (for example in promote-staging.sh), and so that I could see them in a custom tab in Go (build-summary.html). Then the script does the build with Maven and after that does another commit which prepares the release notes for a future release.

At the end of the above script you will see what lets me get away with doing commits on build. I'm creating a new repository to the directory staging.git and saving that on the Go server the same way as all build artifacts.

Then when a release is published, the following script is used to merge those commits to the master branch and push them to the official Git repository:

Hopefully this article has given you some ideas for implementing continuous delivery using Maven.

Passing Contract Tests While Refactoring Them

2012-05-13T12:29:00.000+03:00

In my last blog post I explained how I at one time created a new implementation to pass the same contract tests as another implementation, but due to having to refactor the tests at the same time (the two implementations have a different concurrency model, so the contract tests must be decoupled from it), I missed a problem (wrote some dead code). Since then I've retried that refactoring/implementation using a different approach, as I explained in the comments of that blog post.

One option would have been to refactor the whole contract test class before starting to implement it, but that goes against my principles of doing changes in small steps and having fast feedback loops. So the approach I tried is as follows:

Extract a contract test from the old implementation's tests by extracting factory methods for the SUT, creating the abstract base test class and moving the old implementation's test methods there.
Create a second concrete test class which extends the contract tests, but override all test methods from the contract test and mark them ignored. This avoids getting lots of failing tests appearing at once. Maybe mark each of the overridden versions with a "TODO: refactor tests" so as to not forget the next step.
Inspect the contract test method which you plan to implement next and see if it requires some refactoring before it would work for both implementations. Refactor the test if necessary. This gives a systematic way for updating the contract tests in small tests and avoids refactoring while tests are red.
Unignore one contract test method and implement the feature in the new implementation. This lets you focus on passing just one test at a time, as is normal in TDD.

I recorded my experiment of using this strategy as a screencast:

Download as MP4

More Let's Code Screencasts

Declaring Pass or Fail - Handling Broken Assumptions

2012-05-03T22:55:00.000+03:00

When using TDD, it's a good practice to declare - aloud or in your mind - whether the next test run will pass or fail (and in what way it will fail). Then when your assumption about the outcome happens to be wrong, you'll be surprised and you can start looking more closely at why on earth the code is not behaving as you thought it would.

I had one such situation in my Let's Code screencasts where I missed a mistake - I had written code that's not needed to pass any tests - and noticed it only five months later when analyzing the code with PIT mutation testing. You can see how that untested line of code was written in Let's Code Jumi #62 at 24:40, and how it was found in Let's Code Jumi #148 at 4:15 (the rest of the episode and the start of episode 149 goes into fixing it).

I would be curious to figure out a discipline which would help to avoid problems like that.

Here is what happened:

I was developing my little actors library. I already had a multi-threaded version of it working and now I was implementing a single-threaded version of it to make testing actors easier. I used contract tests to drive the implementation of the single-threaded version. Since the tests were originally written for the multi-threaded version, they required some tweaking to make them work for both implementations, with and without concurrency.

I was already so far that all but one contract test were passing, when I wrote the fateful line idle = false; and ran the tests - I had expected them to pass, but that one test was still failing. So then I investigated why the test did not pass and found out that I had not yet updated the test to work with the single-threaded implementation. After fixing the test, it started failing for another reason (a missing try-catch), so I implemented that - but I did not notice that the line I had added earlier did not contribute to passing the test. Only much later did I notice (thanks to PIT) that I was missing a test case to cover that one line.

So I've been thinking, how to avoid mistakes like this in the future? I don't yet have an answer.

Maybe some sort of mental checklist to use when I have written some production code but it doesn't make the test pass because of a bug in the test. Maybe if I would undo all changes to production code before fixing the test, would that avoid the problem? Maybe the IDE could help by highlighting suspicious code - the IDE could have two buttons for running tests, one where the assumption is that the tests will pass and another where they are expected to fail. Then when an assumption is broken, it would highlight all code that was written since the last time tests passed and/or assumptions were correct, which might help in inspecting the code.

Or maybe all problems like this can be found automatically with mutation testing and I won't need a special procedure to avoid introducing them?

UPDATE: In a following blog post I'm experimenting a better way of doing this refactoring.

Unit Test Focus Isolation

2012-05-01T22:13:00.001+03:00

Good unit tests are FIRST. The I in FIRST stands for Isolation and is easily confused with the R, Repeatability. Ironically the I is itself not well isolated. I want to take a moment to focus on an often forgotten side of unit test isolation: test focus.

A good unit test focuses on testing just one thing and it doesn't overlap with other tests - it has high cohesion and low coupling. Conversely, if you change one rule in the production code, then only one unit test should fail. Together with well named tests this makes it easy to find the reason for a test failure (and giving tests meaningful names is easier when each of them focuses on just one thing).

I came up with a good example in Let's Code Jumi episode 200 (to be released around October 2012 - I have a big WIP ;). I'm showing here a refactored version of the code - originally the third test was all inline in one method and it might have been less obvious that what the problem was.

Example

The system under test is RunIdSequence, a factory for generating unique RunId instances. Here are the two unit tests which were written first:

    @Test
    public void starts_from_the_first_RunId() {
        RunIdSequence sequence = new RunIdSequence();

        RunId startingPoint = sequence.nextRunId();

        assertThat(startingPoint, is(new RunId(RunId.FIRST_ID)));
    }

    @Test
    public void each_subsequent_RunId_is_incremented_by_one() {
        RunIdSequence sequence = new RunIdSequence();

        RunId id0 = sequence.nextRunId();
        RunId id1 = sequence.nextRunId();
        RunId id2 = sequence.nextRunId();

        assertThat(id1.toInt(), is(id0.toInt() + 1));
        assertThat(id2.toInt(), is(id1.toInt() + 1));
    }

These unit tests are well isolated. The first focuses on what is the first RunId in the sequence, the second focuses on what is the relative difference between subsequent RunIds. The second test is unaware of the absolute value of the first RunId, so the tests don't overlap. I can easily make just one of them fail and the other pass.

The RunIdSequence needs to be thread-safe, so here is the third test, with the relevant bits highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        List<RunId> results = new ArrayList<RunId>();
        // XXX: knows what is the first ID (RunId.FIRST_ID, even worse would be to use the constant 1)
        // XXX: knows how subsequent IDs are generated (increase by 1)
        for (int id = RunId.FIRST_ID; id < RunId.FIRST_ID + count; id++) {
            results.add(new RunId(id));
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

This test is not isolated. It defines same things as those two other tests, so it has overlap with them: it knows what is the first RunId and how subsequent values are generated. If one of the two first tests fail, also this test will fail, even though this test is meant to focus on thread-safety just as its name says.

Here is an improved version of the same test, with changes highlighted:

    @Test
    public void the_sequence_is_thread_safe() throws Exception {
        final int ITERATIONS = 50;
        List<RunId> expectedRunIds = generateRunIdsSequentially(ITERATIONS);
        List<RunId> actualRunIds = generateRunIdsInParallel(ITERATIONS);

        assertThat("generating RunIds in parallel should have produced the same values as sequentially",
                actualRunIds, is(expectedRunIds));
    }

    private static List<RunId> generateRunIdsSequentially(int count) {
        RunIdSequence sequence = new RunIdSequence();

        List<RunId> results = new ArrayList<RunId>();
        for (int i = 0; i < count; i++) {
            results.add(sequence.nextRunId());
        }
        return results;
    }

    private static List<RunId> generateRunIdsInParallel(int count) throws Exception {
        final RunIdSequence sequence = new RunIdSequence();
        ExecutorService executor = Executors.newFixedThreadPool(10);

        List<Future<RunId>> futures = new ArrayList<Future<RunId>>();
        for (int i = 0; i < count; i++) {
            futures.add(executor.submit(new Callable<RunId>() {
                @Override
                public RunId call() throws Exception {
                    return sequence.nextRunId();
                }
            }));
        }

        List<RunId> results = new ArrayList<RunId>();
        for (Future<RunId> future : futures) {
            results.add(future.get(1000, TimeUnit.MILLISECONDS));
        }
        Collections.sort(results, new Comparator<RunId>() {
            @Override
            public int compare(RunId id1, RunId id2) {
                return id1.toInt() - id2.toInt();
            }
        });

        executor.shutdown();
        return results;
    }

Now it doesn't anymore know about those two design decisions. It's focused on just testing thread-safety and doesn't duplicate the other tests nor the production code. I can change the production code to make any one of the three tests to fail while the other two still pass.

Conclusion

When I try to write as isolated/focused tests as possible, it makes it easy to find the cause of a test failure. If I don't know why some code exists, I can comment it out and see which tests fail - their names will tell why the code was written. When removing features, I can remove whole tests which define that feature, instead of having to update non-cohesive tests. When changing features, I need to update only a few tests.

P.S. I've been thinking that it might be possible to measure automatically the "unitness" of tests using mutation testing tools such as PIT. The fewer tests per mutation fail, the better. And if a test always fails together with other tests, then it's a bad thing. It might help in pinpointing tests which need some refactoring.

Why Use Inferior Tools

2011-12-17T00:31:00.000+02:00

I've been developing on the JVM this test runner called Jumi, and out of curiosity I've been screencasting its development since day one [1]. One of my listeners made a comment, whose point I understood to be that with functional programming I could have avoided a bunch of problems which I now need to tackle (i.e. thread safety).

Jumi is fundamental software and thus it has some special requirements regarding reliability, compatibility and dependencies. That limits what solutions I can use, and causes extra work for things which would be free in another language (for example episodes 75-102 would not been necessary if I could have used Scala's case classes and pattern matching). But I can survive without those nice-to-haves.

tl;dr

Using leading-edge programming languages can give some nice safety guarantees, but I can cope without them. Sometimes it's more important to use old and proven tools instead of the latest and greatest.

A Craftsman's Programming Language

Java's development started in 1991 [2] and Java became one of the dominant languages some time around 1997-2000 [3]. Compared to that, Scala's design started in 2001, Clojure's design started in 2007 and neither of them even shows up yet in the top 50 of TIOBE index.

In 2001 Pete McBreen wrote in Software Craftsmanship (pages 88-89) that COBOL is a craftsman's language because of its long history, whereas back then Java was too new a language. Nowadays Java is considered to be in the same position as COBOL was back then [4].

That's why even though I prefer writing things in Scala, Clojure or some other modern language, when there are high requirements for stability and long life I prefer a language which has been in mainstream use for over 10 years, has a proved history about its stability, and which can be trusted to be still in use after 10 or even 20 years.

It will take still many years for functional programming languages to enter the mainstream, and then it will take many more years to see how reliable they are. Java has a long history of backward compatibility, whereas Scala has a short history of backward incompatibility (not even minor releases are binary compatible). Lisp has a long history, but it has never entered mainstream, and Clojure is still very new. It will take at least 10 more years to see whether they will succeed.

Referential Transparency and Other Language Features

Some arguments in the original post were that tests can be a maintenance burden and by making the system more simple there is need for less tests. He also implied that higher-level abstractions and referential transparency would solve my problems much simpler.

I'm already trying to make my systems as simple as I can make them, given all the restrictions imposed on them. And I'm already using higher-level abstractions in Jumi - I'm using the actor model for concurrency management, so most of the code is single-threaded. I've been bitten by shared-state concurrency enough many times to have learned to avoid it.

What I don't agree is that referential transparency would solve my problems. For one, the JVM is not referentially transparent. It would be perilous to try to ignore the existence of side-effects on an impure platform, when creating fundamental software which does not have control over the code it executes (i.e. testing frameworks and test code). Even when using the actor model [5], I could not rely on all the single-threaded objects being used only from one thread, so I started writing the thread-safety checker which the commenter objected to.

As a counter example to the claim that referential transparency would cause simplicity, let's consider Scala's testing frameworks. The most popular testing frameworks on Scala (specs, specs2, ScalaTest) are implemented mostly in functional style. Each of their core - the part which controls how the tests are executed - is a couple thousand lines of code (in total with all matchers and test runners they are over 10k lines).

In contrast Specsy is written in imperative-style Scala and its core is only about 300 lines of code (with adapters for JUnit it's in total about 600 lines, because of some impedance mismatch, but after Jumi is ready that superfluous code can be removed). But with those 300 lines it provides more expressiveness, better isolation of side-effects and less bugs than any of those other testing frameworks. Quite soon I'll convert Specsy from Scala into Java, because of compatibility and dependency reasons, which should increase its size by only about 20% because I'm not using much of Scala's features.

The programming languages and paradigms are insignificant factors in achieving simplicity - the way that they are used is much more important.

Amount of Tests

If circumstances require so, I don't mind missing some nice language features, because I can to some extent get the same safety benefits by writing some more tests, and the lack of some syntactic sugar just causes more typing - and typing is not the bottleneck. As is said regarding the debate of dynamic vs. static typing, "the compiler is a unit test." The compiler gives some guarantees of correctness, but even if it doesn't, I can cover the same problem areas with a little bit more tests.

Regarding the number of tests, I agree on what Kent Beck said in an interview: "I don't think people really know what too much really is. -- I've never achieved it [having too many tests], and I've tried." Kent also gave an inspiring anecdote: "The guy I learned the most about testing from was a compiler writer, and he had five lines of test for every line of compiler that he wrote, and he was the most productive guy that I've ever seen."

When in episodes 103-123 of Let's Code Jumi I write a bytecode enhancer which makes sure that an object annotated with @NotThreadSafe is only accessed from one thread during its lifetime (a stronger requirement than no concurrent access, but easier to check), it actually finds one concurrency bug, so my sense of insecurity was warranted. I don't think that those 10 hours were wasted - I've spent much longer times hunting weird bugs - and in the future it lets me worry less that from which thread an object might be called from. I can then better focus on only those few places that need to be thread-safe, even writing a proof of correctness if necessary. Of course my default is to make everything immutable, but sometimes mutability produces simpler systems (simple in design, not easy to implement).

Also the build tests which the commenter was complaining about, the ones that also have tests which operate on just the test data of other tests, are one of the tests which have failed many times throughout this project, almost every time that I add a new dependency to the project. They are some the most valuable tests in reminding me about things that I forget. When I write each of the system's requirements as a test, then I don't anymore need to keep that requirement in my mind at all times, and I can focus on just the problem at hand.

Though "every test is a point of coupling in a system we want as decoupled as possible," I don't think that all coupling is bad. I try to keep my tests as much as possible coupled to the requirements of the system (see Three Styles of Naming Tests), and decoupled from the implementation, to the extent reasonable (fully decoupled code does nothing).

[1] I would have wanted to see something similar from how JUnit was developed, but unfortunately Kent Beck and Erich Gamma did not use a screen recorder in 1997 when they opened their laptops on a plane going to Uppsala.

[2] Java was first known as Oak: The Java History Timeline, The Prehistory of Java, Oak (Wikipedia)

[3] In 1997 JavaOne became the world's largest developer conference and in 2000 Java was 5th in the TIOBE index.

[4] Quote 1: "If I remember correctly, McBreen's book mentions something about COBOL being the craftsman's language, I guess Java would be the modern equivalent."
Quote 2: "Java is the new COBOL"

[5] In this application I wrote the actor library myself because it's a core part of the system, and because of my reliability requirements I want to know exactly how it works. And since I'm writing fundamental software, I need to minimize external dependencies. Why use a 1MB Akka Actors library when I can do just what I need in 20KB?

Let's Code Dimdwarf

2010-09-22T01:44:00.011+03:00

I'm starting a new screencast series, Let's Code, where I will be recording myself developing some open source projects. This was inspired by James Shore's Let's Play TDD series and I will try doing something similar. My goal is not to teach the basics of how to do TDD, but to show how one developer does it - in the hope that something can be learned from it. Each episode will be about 25 minutes long ("one pomodoro") and I will try to release a new episode every couple of days, but no promises about that.

The first episode can be seen at my new blog where I will announce all new Let's Code episodes.

P.S. I'll be writing a blog article about my screencast toolchain and experiences about different video hosting providers. The video quality of YouTube and Vimeo was not good enough for high resolution text (these screencasts are 1440x1080 resolution with font size 16), but Blip.tv was just perfect since they won't re-encode my videos.

Design for Integrability

2010-07-28T21:17:00.011+03:00

There is already the term Design for Testability - it's easy to write tests for the software. I would like to coin a new term, Design for Integrability - it's easy to integrate the system with its external environment. (Yes, integrability is a word.)

Designing for testability is closely linked with how good the design of the code is. A good way to design for testability is to write the tests first, in short cycles, which leads to all code by definition being tested. As a result, the developers will need to early on go through the pains of improving the design to be testable, because otherwise it would be hard for them to write tests for it.

Designing for integrability is possible with a similar technique. The book Growing Object-Oriented Software, Guided by Tests (GOOS) presents a style of doing TDD where the project is started by writing end-to-end tests, which are then used in driving the design and getting early feedback (see pages 8-10, 31-37, 84-88 and the code examples). Also the "end-to-end" of the GOOS authors is probably more end-to-end than the "end-to-end" of many others. Quoted from page 9:

For us, "end-to-end" means more than just interacting with the system from the outside - that might be better called "edge-to-edge" testing. We prefer to have the end-to-end tests exercise both the system and the process by which it's build and deployed. An automated build, usually triggered by someone checking code into the source repository, will: check out the latest version; compile and unit-test the code; integrate and package the system; perform a production-like deployment into a realistic environment; and, finally, exercise the system through its external access points. This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software's lifetime. Many of the steps might be fiddly and error-prone, so the end-to-end build cycle is an ideal candidate for automation. You'll see in Chapter 10 how early in a project we get this working.

A system's interaction with its external environment is often one of the riskiest areas in its development, so the authors of the GOOS book prefer to expose the uncertainties early by starting with a walking skeleton, which contains the basic infrastructure and integrates with the external environment. On page 32 they define walking skeleton as "an implementation of the thinnest possible slice of real functionality that we can automatically build, deploy, and test end-to-end. It should include just enough of the automation, the major components, and communication mechanisms to allow us to start working on the first feature." This forces the team to address the "unknown unknown" technical and organization risks at the beginning of the project, while there is still time to do something, instead of starting the integration only at the end of the project.

Starting with the integration also means that the chaos related to solving the uncertainties moves from the end of the project to the beginning of the project. But once the end-to-end infrastructure is in place, the rest of the project will be easier. On page 37 there is a nice illustration of this:

I also perceive the end-to-end tests to be helpful in guiding the design of the system towards integrability. When writing software with an inside-out approach to TDD, starting with low-level components and gluing them together until all components are ready, it's possible that once the development reaches high-level components which need to interact with external systems and libraries, the design of the low-level components makes the integration hard. So then you will need to change the design to make integration easier. But when developing outside-in and starting with end-to-end tests, those integration problems will be solved before the rest of the system is implemented - when changing the design is easier.

Listening to the feedback from the end-to-end tests can also improve the management interfaces of the system. Nat Pryce writes in TDD at the System Scale that the things that make writing reliable end-to-end tests hard, are also what makes managing a system hard. He writes: "If our system tests are unreliable, that's a sign that we need to add interfaces to our system through which tests can better observe, synchronise with and control the activity of the system. Those changes turned out to be exactly what we need to better manage the systems we built. We used the same interfaces that the system exposed to the tests to build automated and manual support tools."

By starting with end-to-end tests it's possible to get early feedback and know whether we are moving in the right direction. Also the system will be by definition integrable, because it has been integrated since the beginning.

Note however, that what J.B. Rainsberger says in Integration Tests Are a Scam still applies. You should not rely on the end-to-end tests for the basic correctness of the system, but you should have unit-level tests which in themselves provide good coverage. End-to-end tests take lots of time to execute, so it's impractical to execute them all the time while refactoring (my personal pain threshold for recompiling and running all tests after a one-liner change is less than 5-10 seconds). In the approach of the GOOS authors the emphasis is more on "test-driving" end-to-end than on "testing" end-to-end. See the discussion at Steve Freeman's blog post on the topic (also the comments).

Experience 1

The first project where I have tried this approach is Dimdwarf - a distributed application server for online games. I started by writing an end-to-end test the way I would like to write it (ClientConnectionTest). I configured Maven to unpack the distribution package into a sandbox directory (end-to-end-tests/pom.xml) against which I will then run my end-to-end tests in Maven's integration-test phase. The test double applications which I deploy on the server are in the main sources of the end-to-end-tests module, and I deploy them by copying the JAR file and writing the appropriate configuration file (ServerRunner.deployApplication()). It takes about half a second for the server to start (class loading is what takes most of the time), so the tests will wait until the server prints to the logs that it is ready (ServerRunner.startApplication()). The server is launched in a separate process using ProcessRunner and its stdout/stderr are redirected to the test runner's stdout/stderr and to a StreamWatcher which allows the tests to examine the output using ProcessRunner.waitForOutput(). There is a similar test driver for a client, which connects to the server via a socket, and it has some helper methods for sending messages to the server and checking the responses (ClientRunner). When the test ends, the server process is killed by calling Process.destroy() - after all it is meant to be crash only software.

Getting the end-to-end tests in place went nicely. It took 9.5 hours to write the end-to-end test infrastructure (and the tests are now deterministic), plus about 8 hours to build the infrastructure for starting up the server, some reorganizing of the project, and enough of the network layer to get the first end-to-end test to pass (the server sends a login failure message when any client tries to login). The walking skeleton does not yet integrate with all third-party components that will be part of the final system. For example the system has not yet been integrated with an on-disk database (although the system can be almost fully implemented without it, because the system anyways relies primarily on its internal in-memory database).

It takes 0.8 seconds to run just one end-to-end test, which is awfully slow compared to the unit tests (I could run the rest of the 400+ tests in the same time if JUnit just would run the tests in parallel on 4 CPU cores), in addition to which it takes 13 seconds to package the project with Maven, so the end-to-end tests won't be of much use while refactoring, but they were very helpful in getting the deployment infrastructure ready. I will probably write end-to-end tests for all client communication and a couple of tests for some internal features (for example that persistence works over restarts and that database garbage collection deletes the garbage). The test runner infrastructure should also be helpful in writing tests for non-functional requirements, such as robustness and scalability.

Experience 2

In another project I was coaching a team of 10 university graduate students during a 7 week course (i.e. they had been 3-5 years at university studying computer science - actually also I was a graduate student and had been there for 8 years). We were building a web application using Scala + Lift + CouchDB as our stack. The external systems to which the application connects are its own database and an external web service. We started by writing an end-to-end test which starts up the application and the external web service in their own HTTP servers using Jetty, puts some data - actually just a single string - to the external web service, the application fetches the data from the web service and saves it to the database, after which the test connects to the application using Selenium's HtmlUnitDriver and checks whether the data is shown on the page. All applications were run inside the same JVM and the CouchDB server was assumed to be already running in localhost without any password.

It took a bit over one week (30 h/week × 10 students) to get the walking skeleton up and ~~walking~~ crawling. I was helping with some things, such as getting Maven configured and tests running, but otherwise I was trying to keep away from the keyboard and focus on instructing others that how to do things. I also code reviewed (and refactored) almost all code. Before getting started with the walking skeleton, we had spent about 2 weeks learning TDD, Scala, Lift, CouchDB and evaluating some JavaScript testing frameworks.

The end-to-end tests had lots of undeterminism and were flaky. Parsing the HTML pages produced by the application made writing tests hard, especially when some of that HTML was generated dynamically with JavaScript and updated with Ajax/Comet. There were conflicts with port numbers and database names, which were found out when the CI server ran two builds in parallel. There were also issues with the testing framework, ScalaTest, which by default creates only one instance of the test class and reuses it for all tests - it took some time hunting weird bugs until we noticed it (the solution is to mix in the OneInstancePerTest trait). It would have been better to start the application-under-test in its own process, because reusing the JVM might also have been the cause for some of the side-effects between the tests, and during the project we did not yet get all deployment infrastructure ready (for example some settings were passed via System.setProperty()).

We were also faced with far too many bugs (2-3 in total, I think) in the Specs framework, which ignited me to write a ticket for "DIY/NIH testing framework", later named Specsy, which I have been working on slowly since then. Because none of the "after blocks" in Specs really worked after every test execution, I had to use shutdown hooks to write a hack which deletes the temporary CouchDB databases after the tests are finished and the JVM exits. We used to have hundreds of stale databases with randomly generated names, because the code which was supposed to clean up after an integration test was not being executed.

The test execution times also increased towards the end to the project. One problem was that Scala is slow to compile and the end-to-end tests did a full build with Maven, which took over a minute. Another (smaller) problem was that some of the meant-to-be unit tests were needlessly using the database when it should have been faked (IIRC, it took over 10 seconds to execute the non-end-to-end tests). Let's hope that the Scala compiler will be parallelized in the near future (at least it's on a TODO list), so that the compile speeds would be more tolerable.

All in all, I think the end-to-end tests were effective in finding problems with the design of the system and the tests themselves. It requires much from the development team to write good, reliable tests. The system should now have quite good test coverage, so that its development can continue - starting with some cleaning up of the design and improving the tests.

Choice of Words in Testing Frameworks (...and how many get it wrong, including RSpec)

2010-05-08T00:24:00.015+03:00

One [word] to rule them all... and in the darkness bind them.

I want my testing framework to be able to express the ideas that I have in my mind in the best possible way. This includes giving the tests the best possible names. Unfortunately, lots of testing frameworks force the developer to start or end his test names with predefined words - such as "define", "it", "should", "given", "when", "then". This can be harmful, because they incline the user to write his test names always following the same structure, even in situations where that way of structuring tests is suboptimal.

Predefined words produce twisted sentences

Here are some trivial counterexamples of specification-style test names, to prove that requiring the test to start with a predefined word will sometimes lower the quality of the test names. Here is a specification of Fibonacci numbers which is taken straight from their Wikipedia article:

Fibonacci numbers:
- The first two Fibonacci numbers are 0 and 1
- Each remaining number is the sum of the previous two

RSpec requires the test fixtures to start with the word "define" and the tests with "it", in addition to which RSpec's documentation encourages the tests to start with "it should". Let's try to twist the above specification of Fibonacci numbers into that style. Here is my best attempt which still holds the same information:

Define Fibonacci numbers:
- it should have the first two numbers be 0 and 1
- it should have each remaining number be the sum of the previous two

Urgh. Totally unnatural way of saying the same information. Lots of unnecessary words need to be added to make the test names full sentences.

If we use example-style, then it's possible to write the test names, but valuable information is lost:

Define Fibonacci numbers:
- it should calculate the first two numbers
- it should calculate each remaining number

The problem appears to be that "it" is forced to be the subject of the sentence. We can get around that restriction by adding more "define" elements, but then the tests become awfully verbose without adding any new information:

Define Fibonacci numbers:
- Define the first number:
  - it should be 0
- Define the second number:
  - it should be 1
- Define each remaining number:
  - it should be the sum of the previous two

Here is another example, written in a slightly different style:

Stack:
- An empty stack
    - is empty
    - After a push, the stack is no longer empty
- When objects have been pushed onto a stack
    - the object pushed last is popped first
    - the object pushed first is popped last
    - After popping all objects, the stack is empty

And the same using RSpec's predefined words:

Define stack:
- Define an empty stack
    - it should be empty
    - it should, after a push, be no longer empty
- Define a stack onto which objects have been pushed
    - it should pop first the object pushed last
    - it should pop last the object pushed first
    - it should, after popping all objects, be empty

It was necessary to change the order of some of the sentences for them to make sense, and it was not natural to write the "it should, after..." tests - their sentence order should have been changed to make them more natural, but then the effect would have been before the cause, which is neither good. Also the subject of some sentences had to be changed from "the object pushed last" to "it" (i.e. stack) and the subject of the old sentence became the object of the new sentence.

The testing framework should obey the developer, not the other way around! The developer is the one knows best that how to make a sentence convey his intent. A testing framework, which forces the developer to use a predefined style of writing his sentences, is immoral!

Predefined words do not improve the test names

What about example-style test names? Predefined words are equally bad for them. Here are Uncle Bob's Bowling Game Kata's test names in RSpec format:

Define bowling game:
- it should score gutter game
- it should score all ones
- it should score one spare
- it should score one strike
- it should score perfect game

Adding "it should" does not improve the test names. They don't make the intent any clearer. It just adds lots of duplication and becomes background noise. The framework will not magically make a person who writes example-style tests to suddenly start writing specification-style tests.

What about implementation-style tests? I have seen lots of implementation-style tests written in an "it should have" pseudo-specification-style like this:

Define person:
- it should have name
- it should not allow null name
- it should have age
- it should have address
- it should save
- it should load
- it should calculate pay

Writing implementation-style tests is still perfectly possible. A framework alone can't make the developer better. He must first understand the philosophy behind the framework and how to write expressive tests, before his tests will get any better.

Many testing frameworks get it wrong

behaviour-driven.org says that "Getting the words right" was the starting point for the development of BDD, so it is absurdly ironic that lots of BDD frameworks get the words wrong.

First and foremost, RSpec gets its words wrong by forcing the tests to start with "describe" and "it", as described above. And because RSpec has become popular and was one of the first BDD frameworks, lots of other BDD frameworks copy RSpec and use the same predefined words. They just repeat mindlessly what others have done, without stopping to think why the things were done that way. They become angry monkeys and cargo-cults, which annoy me very much.

Many BDD acceptance testing frameworks force the use of words "given", "when", "then". For example Cucumber does this. Decomposing actions into those three parts gives you state transition tables. This is a very explicit way of defining actions, but also very verbose. Added verbosity does not always make things easier to read; on the contrary, it can make it harder to see what is really important. As said in a famous quote:

In anything at all, perfection is finally attained not when there is no longer anything to add, but when there is no longer anything to take away.
- Antoine de Saint-Exupéry

And another one:

Many attempts to communicate are nullified by saying too much.
- Robert Greenleaf

I have even found a framework which adds predefined words as suffixes to the test names. In the Specs framework for Scala, the top-level test names end with "should", which is even printed in all the reports (unlike RSpec's "it"). Thankfully there is a workaround to avoid that suffix.

Frameworks should not limit the developer

A good testing framework will allow the developer to choose himself in what way he writes his test names. Some examples of testing frameworks, which do not force the use of predefined words, are JUnit 4 and JDave. But those two still force a fixed level of test nesting - JUnit 4 has no nesting and JDave has one level of nesting. Of the frameworks that I know, the least restrictive one is GoSpec, which I wrote myself with that as a goal.

When designing GoSpec, my goals were to allow unlimited levels of nesting, and to not force the test names to start with any predefined words. In Scala and other similarly expressive languages it would be easy to cope without any predefined words. For example I like in Specs the "test name" in { ... } style, which can also be written using an unpronounceable symbol "test name" >> { ... }. Unfortunately Go's syntax is not as flexible, so I was satisfied with prefixing each test name with "Specify". I chose that word to be such that starting a sentence with it would be totally unnatural, so that the developers would be inclined to just ignore it. Also all the examples of using that framework are written so that they do not include that word in the test names.

In a future article I will write about my current ideals for a unit testing framework. One of the primary goals is allowing the developer to use any style that is best for the situation, but there are also other goals (for a sneak peek, see the project goals in GoSpec's README).

Direct and Indirect Effects of TDD

2010-04-10T23:25:00.008+03:00

Some time ago @unclebobmartin tweeted about the direct and indirect effects of TDD:

TDD guarantees test coverage.
3:17 PM Jan 31st

TDD helps with, but does not guarantee, good design & good code. Skill, talent, and expertise remain necessary.
3:25 PM Jan 31st

I agree with the above, but I also felt that there was still something missing, because I did not see a direct relation from good test coverage to good design - it's possible to get high test coverage even with test-last approaches, but that does not help with the design similarly to TDD. So that made me think about what are the direct effects of TDD, and how do they indirectly help with the design.

Here are some effects that I've noticed TDD to have, divided into direct and indirect effects. If you have noticed some more direct or indirect effects, please leave a comment.

Direct effects, given just following the three rules of TDD:

Guarantees code coverage
Amplifies the pain caused by bad code

Indirect effects, given skilled enough programmers using TDD:

Enables changing the code without breaking it
Improves the quality of the code

Direct: Guarantees code coverage

From the three rules of TDD it's easy to see that no application logic will come to existence, unless some test first covers it. So if somebody just follows these rules with discipline, code coverage is guaranteed. That result is irrelevant of the skills of the developer*.

* Although it will be hard for a very unskilled and undisciplined developer to follow the rules, but that's beside the point. ;)

Direct: Amplifies the pain caused by bad code

In TDD, after writing a failing test, the next step is to make it pass with the simplest possible change - the code does not need to be elegant, because it's meant to be refactored later. Generally there is very little thinking before writing some code (just a rough idea where the project is heading), but instead most of code design is meant to be done after writing the code. Also, to be able to test something, the code needs to be testable - in other words low coupling and high cohesion.

The above leads to TDD requiring existing code to be changed continuously - it's like being in maintenance mode all the time. So if the code is not maintainable, it will be hard to change. Also if the code is not testable, it will be hard to write tests for it. If the code would not be written iteratively and no tests would be written for it, life would be much easier to the developer. In other words, TDD increases the pain caused by bad code, because it's not possible to avoid changing the code, nor avoid writing tests for it.

Amplifying pain might seem like a bad idea, but actually that's one of TDD's best assets. :) Keep on reading...

Indirect: Enables changing the code without breaking it

The direct effect of code coverage makes it possible to notice when something breaks. But that alone is not enough for changing the system safely. It requires skill to be able to modify the code in small, safe steps. The developer needs the ability to do even big design changes by combining multiple small refactorings*, so that the tests pass after every refactoring. This requires skill and discipline. An unskilled or undisciplined developer would get stuck in refactoring hell, or would give up and abandon the test suite.

* Programming can be thought of as the process of solving a big problem by combining multiple small elementary pieces (such as conditionals, statements and libraries). In refactoring the elementary pieces are small transformations of the source code's structure which preserve its observed behaviour (rename, extract method, move field etc.). In this sense also mathematics requires similar thinking (for example prove a conjecture by combining theorems and axioms). This kind of problem solving requires first creativity and intuition to get an idea of the solution, and then discipline and attention to detail to implement the solution; two opposite personality traits.

Indirect: Improves the quality of the code

The direct effect of amplified pain and the indirect effect of making safe changes enable the improving of the code quality. The important point is "listening to the tests". When something is painful while doing TDD, you should be sensitive to notice the pain and then react to it by fixing whatever was causing that pain.

Growing Object-Oriented Software, Guided by Tests says on page 245 under the subheading "What the Tests Will Tell Us (If We're Listening)", commenting on somebody who was suffering from unreadable tests, up to 1000 lines long test classes, and refactoring leading to massive changes in test code:

Test-driven development can be unforgiving. Poor quality tests can slow development to a crawl, and poor internal quality of the system being tested will result in poor quality tests. By being alert to the internal quality feedback we get from writing tests, we can nip this problem in the bud, long before our unit tests approach 1000 lines of code, and end up with tests we can live with. Conversely, making an effort to write tests that are readable and flexible gives us more feedback about the internal quality of the code we are testing. We end up with tests that help, rather than hinder, continued development.

Also Michael Feathers says in an interview:

It's something that people don't talk about enough and it seems like particularly in TDD, there is a really great thing that you notice that if something hurts when you are doing TDD, it often means that it's an indication of something wrong with the design. Since people are so drawn and they say "OK, this is kind of painful. It must mean that the TDD sucks." In fact, there is a way of going and getting feedback about a thing you are really working on, if you pay attention to the pain.

Noticing the pain as soon as possible and then fixing the problem - whether it is a rigid design, fragile tests or something else - requires skill. Not everybody is alert to the pain, but instead they keep on writing bad code until making changes becomes too expensive and a rewrite is needed. Not everybody fixes the problem when they feel the pain, but instead they implement a quick hack and leave an even bigger mess for the next developer. But for those who have the necessary skills and discipline, TDD can be a powerful tool and they can use it to write better code.

Three Styles of Naming Tests

2010-02-13T01:21:00.042+02:00

I have now used TDD for about 3 years, during which time I've come to notice three different styles of naming and organizing tests. In this article I'll explain the differences between what I call specification-style, example-style and implementation-style.

Tests as a specification of the system's behaviour

Specification-style originates from Behaviour-Driven Development (BDD) and it's the style that I use 90% of the time. It can be found among practitioners of BDD (for some definition of BDD), so it could also be called "BDD-style". However, just using a BDD framework does not mean that you write your tests in this style. For example the examples on Cucumber's front page (no pun intended) are more in example-style than in specification-style. (By the way, I don't buy into the customer-facing requirements-analysis side of BDD, because in my opinion interaction design is much better suited for it.)

In specification-style the tests are considered to be a specification of the system's behaviour. The test names should be sentences which describe what the system should do - what are the system's features. Just by reading the names of the tests, it should be obvious that what the system does, even to such an extent that somebody can implement a similar system just by looking at the test names and not their body.

When a test fails, there are three options: (1) the implementation is broken and should be fixed, (2) the test is broken and should be fixed, (3) the test is not anymore needed and should be removed. If the test has been written in specification-style, then knowing what to do is simple. Just read the name of the test and decide whether that piece of behaviour is still needed. If it is, then you keep the same test name, but change the implementation or test code. If it is not, for example if the specified behaviour conflicts with some new desired behaviour, then you can remove the test and double-check all other tests in the same file, in case some of them should also be updated.

Here are some examples of specification-style tests (using Go and GoSpec). A test for Fibonacci numbers could look like this:

func FibSpec(c gospec.Context) {
    fib := NewFib().Sequence(10)

    c.Specify("The first two Fibonacci numbers are 0 and 1", func() {
        c.Expect(fib[0], Equals, 0)
        c.Expect(fib[1], Equals, 1)
    })
    c.Specify("Each remaining number is the sum of the previous two", func() {
        for i := 2; i < len(fib); i++ {
            c.Expect(fib[i], Equals, fib[i-1] + fib[i-2])
        }
    })
}

If you look at the Wikipedia entry for Fibonacci numbers, you will notice that the above test names are directly taken from there. This is how Wikipedia defines the Fibonacci numbers: "By definition, the first two Fibonacci numbers are 0 and 1, and each remaining number is the sum of the previous two. Some sources omit the initial 0, instead beginning the sequence with two 1s." The test names should document the same specification.

Each test focuses on a single piece of behaviour

One more example, in the same language and same framework, this time on stacks (ignore the comments for now). This is the style how I typically organize my tests:

func StackSpec(c gospec.Context) {
    stack := NewStack()

    c.Specify("An empty stack", func() { // Given

        c.Specify("is empty", func() { // Then
            c.Expect(stack.Empty(), IsTrue)
        })
        c.Specify("After a push, the stack is no longer empty", func() { // When, Then
            stack.Push("foo")
            c.Expect(stack.Empty(), IsFalse)
        })
    })

    c.Specify("When objects have been pushed onto a stack", func() { // Given, (When)
        stack.Push("one")
        stack.Push("two")

        c.Specify("the object pushed last is popped first", func() { // (When), Then
            x := stack.Pop()
            c.Expect(x, Equals, "two")
        })
        c.Specify("the object pushed first is popped last", func() { // (When), Then
            stack.Pop()
            x := stack.Pop()
            c.Expect(x, Equals, "one")
        })
        c.Specify("After popping all objects, the stack is empty", func() { // When, Then
            stack.Pop()
            stack.Pop()
            c.Expect(stack.Empty(), IsTrue)
        })
    })
}

(Note that GoSpec isolates the child specs from their siblings, so that they can safely mutate common variables. This was one of the design principles for GoSpec which enables it to be used the way that I prefer writing specification-style tests. The other important ones are: allow unlimitedly nested tests, and do not force the test names to begin or end with some predefined word.)

Each test has typically three parts: Arrange, Act, Assert. In BDD vocabulary they are often identified by the words Given, When, Then.

I've found it useful to arrange the tests so, that the Arrange and Act parts are in the parent fixture, and then have multiple Asserts each in its own test. Organizing the tests like this follows the spirit of the One Assertion Per Test principle (more precisely, one concept per test). When each test tests only one behaviour, it makes the reason for a test failure obvious. When a test fails, you will know exactly what is wrong (it isolates the reason for failure) and you will know whether the behaviour specified by the test is still needed, or whether it is obsolete and the test should be removed.

Quite often I use the words Given, When and Then in the test names, because they are part of BDD's ubiquitous language. But I always put more emphasis on making the tests readable and choosing the best possible words. So when it is obvious from the sentence, I may choose to

omit the Given/When/Then keywords,
group the Given and When parts together,
group the When and Then parts together, or even
group all three parts together.

In the above stack example, I have marked with comments that which of the specs is technically a Given, When or Then. As you can see, there is a distinct structure, but also much flexibility. The "should" word I dropped long time ago, after my second TDD project, because it was just adding noise to the test names without adding value. The value is in focusing on the behaviour, not in using some predefined words.

The specification should be decoupled from the implementation

Specification-style tests focus on the desired behaviour, or feature, at the problem domain's level of abstraction, and try to be as decoupled from the implementation as possible. The tests should not contain any implementation details (for example method names and parameters), because those implementation details are what will be designed after the test's name has been written. If the test's name already fixes the use of some implementation details (for example whether a method accepts null parameters), then refactoring the code will be harder, because it will force us to update the tests. Coupling tests to the implementation leads to implementation-style tests.

When the tests focus on the desired behaviour, then when refactoring, you won't need to change the name of the test, but only the body of the test (when refactoring affects the implementation's public interface). If you're doing a rewrite, then you may even be able to reuse the old test names - which is helpful, because thinking of the test name is what usually takes the most time in writing a test, because that is when you think about what the system should do. (If choosing the name does not take the most time, then you're not thinking about it enough, or you're writing too complex test code, which is a test smell that the production code is too complex.)

For example have a look at SequentialDatabaseAccessSpec and ConcurrentDatabaseAccessSpec. These are tests which I wrote 1½ years ago and in the near future the subsystem that those tests specify will be rewritten, as the application's architecture will be changed from being based on shared-state to message-passing and also the programming language will be mostly changed from Java to Scala. Here are the names of those tests:

SequentialDatabaseAccessSpec:

When database connection is opened
  - the connection is open
  - only one connection exists per transaction
  - connection can not be used after prepare
  - connection can not be used after commit
  - connection can not be used after rollback
  - connection can not be used after prepare and rollback

When entry does not exist
  - it does not exist
  - it has an empty value

When entry is created
  - the entry exists
  - its value can be read

When entry is updated
  - its latest value can be read

When entry is deleted
  - it does not exist anymore
  - it has an empty value


ConcurrentDatabaseAccessSpec:

When entry is created in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

When entry is updated in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

When entry is deleted in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

If two transactions create an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed

If two transactions update an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed
  - the key may be updated in a later transaction

If two transactions delete an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed

When the above components are rewritten using a new architecture, new language and different programming paradigm, most of those test names will stay the same, because they are based on the problem domain of transactional database access, and not any implementation details such as the architecture, programming language, or *gasp* individual classes and methods.

In the above tests there will be only minor changes:

The first fixture of SequentialDatabaseAccessSpec may be removed, or moved to some different test, because in the new architecture opening a database connection will be quite different (implicit instead of explicit). Actually it should have been put into its own test class, named DatabaseConnectionSpec, already when it was written, because it is very much different from the focus of the rest of SequentialDatabaseAccessSpec.
In ConcurrentDatabaseAccessSpec, the test saying "on prepare and rollback the locks are released" will be removed, because the new architecture will not need any locks. The use of locks is an implementation detail and these specs were not fully decoupled from it.

What is a "unit test"?

The above example also raises the question about the size of a "unit" in a unit test. For me "a unit" is always "a behaviour". It never is "a class" or "a method" or similar implementation detail. Although following the Single Responsibility Principle often leads to one class dealing with one behaviour, that is a side-effect of following SRP and not something that would affect the way the tests are structured.

For those two test classes in the above example, the number of production classes being exercised by the tests is about 15 concrete classes (excluding JDK classes) from 2 subsystems (transactions and database). The lower-level components of those 15 classes have been tested individually (the transaction subsystem has its own tests, as well as do the couple of data structures which were used as stepping stones for the in-memory database) because the higher-level tests will not cover the lower-level components thoroughly, and I anyways wrote those tests to drive the design of the lower-level components with TDD. So the new code produced by those two test classes is about 5 production classes (originally it was about 3 production classes, but they were split to follow SRP).

From TDD's point of view, it's very important to be able to run all tests quickly, in a couple of seconds (more than 10-20 seconds will make TDD painful). On my machine SequentialDatabaseAccessSpec takes about 15 ms to execute (1.2 ms/test) and ConcurrentDatabaseAccessSpec about 120 ms (5.5 ms/test). I prefer tests which execute in 1 ms or less. If it takes much longer, then I'll try to decouple the system so that I can test it in smaller parts using test doubles. So to me the "unit" in a "unit test" is one behaviour, with the added restriction that its tests can be executed quickly.

More on specification-style

To learn more about how to write tests in specification-style, do the tutorial at http://github.com/orfjackal/tdd-tetris-tutorial and also have a look at its reference implementation and tests.

Update 2010-03-17: I just started reading Growing Object-Oriented Software, Guided by Tests and I'm happy to notice that also the authors of that book prefer specification-style. In chapter 21 "Test Readability", page 249, under the subheading "Test Names Describe Features", they have the following tests names:

ListTests:

- holds items in the order they were added
- can hold multiple references to the same item
- throws an exception when removing an item it doesn't hold

If you notice more books which use specification-style, please leave a comment.

Update 2015-11-01: Also the excellent presentation What We Talk About When We Talk About Unit Testing by Kevlin Henney recommends specification-style.

Tests as examples of system usage

Example-style is perhaps the most popular among TDD'ers, maybe because many books, tutorials and proponents of TDD use this style. It's also quite easy to name tests with example-style, while at the same time being much better than implementation-style. I use this style maybe 10% of the time, usually to cover a corner case for which I have too hard a time to give a name using specification-style, or the specification-style name would be too verbose without added value as documentation. For some situations one style fits better than the other.

In example-style the tests are considered to be examples of system usage, or scenarios of using the system. The test names tell what is the scenario, and you will need to read the body of the test to find out how the system will behave in that scenario. The test names are not a direct specification, but instead to arrive at the specification, you will need to read the tests and reverse-engineer and generalize the behaviour that is happening on those tests.

A famous example which is written in example-style is Uncle Bob's Bowling Game Kata. There the test names are:

testGutterGame
testAllOnes
testOneSpare
testOneStrike
testPerfectGame

Now that you have read the test names, can you tell me the scoring rules of bowling? You can't? Exactly! That is what sets example-style apart from specification-style. In example-style you would need to reverse-engineer the scoring rules of bowling from the test code. In specification-style the test names would tell the scoring rules directly.

My domain knowledge about bowling is not good enough for me to write good specification-style tests for it, but it might look something like below. I took the scoring rules from the Bowling Game Kata's page 2 and reworded them some.

The game has 10 frames

In each frame the player has 2 opportunities (rolls) to knock down 10 pins

When the player fails to knock down some pins
  - the score is the number of pins knocked down

When the player knocks down all pins in two tries
  - he gets spare bonus: the value of the next roll

When the player knocks down all pins on his first try
  - he gets strike bonus: the value of the next two rolls

When the player does a spare or strike in the 10th frame
  - he may roll an extra ball to complete the frame

Here is another example of example-style, this time from JUnit's AssertionTest class.

arraysExpectedNullMessage
arraysActualNullMessage
arraysDifferentLengthMessage
arraysDifferAtElement0nullMessage
arraysDifferAtElement1nullMessage
arraysDifferAtElement0withMessage
arraysDifferAtElement1withMessage
multiDimensionalArraysAreEqual
multiDimensionalIntArraysAreEqual
oneDimensionalPrimitiveArraysAreEqual
oneDimensionalDoubleArraysAreNotEqual
...

This shows well a situation where example-style is useful: corner cases. In English it would be possible to describe the behaviour specified by AssertionTest with one sentence. Even though there are lots of corner cases, they all are semantically very similar. Writing these tests in specification-style would be impractically verbose. Here are the specification-style tests for a generic assertion:

When the expected and actual value are equal
  - the assertion passes and does nothing

When the expected and actual value differ
  - the assertion fails and throws an exception
  - the exception has the actual value
  - the exception has the expected value
  - the exception has an optional user-defined message

Repeating that specification for every corner case is not practical, because it would just be more verbose but without any added documentation value. That's why in this case it would make more sense to write one use case with specification-style and the rest of the use cases in example-style (this is how I did it with GoSpec's matchers). Or since this particular problem domain is quite simple, just leave out the specifications and use only example-style.

Tests reflecting the implementation of the system

Implementation-style is typical in test-last codebases and with people new to TDD who are still thinking about the implementation before the test. I never use this style. It was only in my very first TDD project that I wrote also implementation-style tests (for example it had tests for setters), but at least I knew about BDD and was aware of my shortcomings and tried to aim for specification-style. (It took about one year and seven projects to fine-tune my style of writing tests, after which I wrote tdd-tetris-tutorial.)

In implementation-style the tests are considered to be verifying the implementation - i.e. the tests are considered to be just tests. There is a direction relation from the implementation classes and methods to the test cases. By reading the test names you will be able to guess that what methods a class has.

Typically the test names start with the name of the method being tested. Since nearly always more than one test case is needed to cover a method, people tend to append the method parameters to the test name, or append a sequential number, or *gasp* put all test cases into one test method.

As an example of implementation-style, here are some of the test cases from Project Darkstar's TestDataServiceImpl class. I know for sure that Darkstar has been written test-last, in addition to which mosts of its tests are integration tests (it takes 20-30 minutes to run them all, which makes it painful for me to make my changes with TDD).

testConstructorNullArgs
testConstructorNoAppName
testConstructorBadDebugCheckInterval
testConstructorNoDirectory
...
testGetName
testGetBindingNullArgs
testGetBindingEmptyName
testGetBindingNotFound
testGetBindingObjectNotFound
testGetBindingAborting
testGetBindingAborted
testGetBindingBeforeCompletion
testGetBindingPreparing
testGetBindingCommitting
testGetBindingCommitted
...

From the above test names it's possible to guess that there is a class DataServiceImpl which has a constructor which takes as parameters at least an app name, a debug check interval and some directory. It's not clear which are the valid values for them and whether null arguments are allowed or not. Also we can guess that the DataServiceImpl class has methods getName and getBinding, the latter which probably takes a name as parameter. With getBinding it's possible that "something is not found" or "an object is not found". The getBinding method's behaviour also appears to depend on the state of the current transaction. It's not clear how it should behave in any of those states.

Implementation-style is bad compared to example-style and specification-style, because implementation-style is not useful as documentation - it does not tell how the system should behave or how to use it - which in turn makes it hard to know what to do when a test fails. Also implementation-style couples the tests to the implementation, which makes it hard to refactor the code; if you rename a method, you need to also rename the tests. If you do big structural refactorings, you must rewrite the tests. And when you rewrite the tests, the old tests are of little benefit in knowing which new tests to write.

Summary

Specification-style test names describe how the system will behave in different situations. By reading the test names it will be possible to implement the system. When a test fails, the test name will tell which behaviour is specified by that test, after which it's possible to decide whether that test is still needed. The test names use the problem domain's vocabulary and do not depend on implementation details.

Example-style test names describe which special cases or scenarios the system should handle. You will need to read the body of the test to find out how the system should behave in those situations.

Implementation-style test names tell what methods and classes the system has. It will be very hard to find out from the tests that which situations the system should handle and how it should behave in those situations. Refactorings require you to change also the tests.

Text-Based Communication Considered Harmful

2009-11-10T19:17:00.012+02:00

Discussing important matters is best done face-to-face or at least over the phone. Text-based communication methods such as email and text messages are limited in expression, prone to misunderstandings and often lacking delivery guarantees.

I was having a discussion with another person over Facebook's chat and there was some visible lag in the way that his messages were popping up on my screen. At the end of the conversation, I had made a question but did not receive an answer before he quit, so I was puzzled. It had happened, that he indeed had written an answer and apparently it had showed up on his screen, but it had not reached the server and did not appear on my screen. Luckily it was possible to resolve the situation through text messages, but given the limitations of text messaging, not everything could be yet resolved (140 characters is not enough for everyone) and there was lots of room for misunderstandings (in his reply one word had two interpretations, although from the context it was possible to guess what it meant). A telephone call or a face-to-face discussion would have been needed.

Another case: There was an event where I should have been. One evening, a couple of days before the event, I received a text message from a friend asking whether I would be coming there. Knowing his habit of confirming things like this a couple of days beforehand, I replied "yes" without thinking anything special about it. But actually, the event had been moved to the beginning of the week and I had not heard about it, and now my friend was in reality asking, whether I would be there in 15 minutes because it was already starting. But his text message did not mention the time, that it was today, nor that the situation was urgent. So I was completely oblivious and missed the appointment that day. On the other hand, if he would have made a phone call, the reality would have become apparent from his tone of voice and the background noises. Then I could at least have said right away, that I won't be able to make it, and they could have prepared a Plan B.

What is wrong with text-based communication

The biggest problem with text-based communication is that it does not convey the tone of voice nor facial expressions. This makes it hard for the reader to interpret the intentions and motives of the writer correctly. It also makes it easier have arguments, especially when combined with relative anonymity. I can't even count how many misunderstandings and arguments I've seen happening over the internet during the last 10 years. It happens even to experienced people all the time.

High latency makes it easier to have misunderstandings. When speaking face-to-face or over the phone, the latency is zero. But when writing a message, it will take many seconds, minutes, hours or even days before the writer will receive a reply. This leads to people writing longer messages, so as to optimize the number of messages which need to be sent. But this has the negative effect of reducing the feedback per statement - the reply will give feedback about the message as a whole, but not about every statement, which in turn makes it possible for some of the statements to be misunderstood without anybody noticing.

And because many statements are made before receiving feedback about them, if there is a fundamental misunderstanding in the first statement of the message, the rest of the message only amplifies that misunderstanding because the rest of the message relies on the same false assumptions. For example if the writer criticizes the other person unwarrantably, then a long message will amplify the critique and the other person will get offended. But if he would speak just one statement and then get a reply that the critique is unwarranted, it would only be a small displeasure which could be solved quickly by apologizing, before the other person has time to get offended.

The overhead of writing hinders effective communication by reducing the amount of communication. When more effort is needed to communicate, then people try to reduce the amount of communication and they will use less words. This is especially true for text messages, which are very hard to write using a phone's number keys. Also if the cost of sending a message is non-zero, such as when sending a text message, people will try to send less messages both by reducing the number of words they use and by squeezing more information into one message. But using less words has the negative effect of reducing the amount of details in the communication, which in turn leads to communication which leaves out important things or makes the words subject to misinterpretation.

Still one more problem is unreliable delivery. Most of the communication mediums do not guarantee that the recipient of the message will receive and read it. Email messages can disappear into thin air, get caught in a spam filter, or take many days before arriving. If you're lucky, you will get an "Undelivered Mail Returned to Sender" message, but even that is not guaranteed. Text messages likewise can just disappear or take a long time to arrive. The email and text message infrastructures are fundamentally wrong, and there is no easy way to solve these problems with them.

Apparently also Facebook's chat does not guarantee delivery and gives the user misleading visual feedback. I haven't read the code, but it might be that when somebody sends a message, it is immediately processed on the client-side and shown in the chat log, after which the message update is sent asynchronously to the server. The right way would be to send the message first to the server and show the message in the chat log only after the server notifies the client about a new message. (If Facebook already uses the latter approach, then they must have some buggy code, because otherwise the issue I mentioned above would not have happened.)

How spoken communication avoids many of these issues

Just as it is said, "the most efficient and effective method of conveying information [--] is face-to-face conversation." Face-to-face communication is superior to other forms of communication. If face-to-face conversation is 90% effective, then phone calls would be maybe 50% effective, emails about 30% effective and text messages some 5% effective (statistics based on the Stetson-Harrison method).

When speaking face-to-face, it's possible to see from that person's facial expressions and tone of voice whether his intent was to give advice, to insult, to joke or something else. The words can be exactly the same, but the way they are said can completely change the meaning. But over the internet none of those cues exist, and people tend to misunderstand the writer's intent. Often the words come out very direct, even insulting. People are used to softening their words with their tone of voice and expressions, so in spoken communication the words don't come out that directly, but very few are so good writers that they can express the same softness and feeling in their writing.

When one person says something that the other person does not understand, the speaker can notice it in face-to-face discussion just by looking at the other person's facial expressions, and then he can refine his words and offer further explanation, and the misunderstanding will be fixed before it even happens. The same error-correction mechanism works also when speaking over the phone: a short pause, an interjection or a filled pause (uh, er, um) can signify that the other person did not understand something. It's also common for a person to repeat what the other said, so confirming that they have understood each other. Because spoken communication has little overhead, people are inclined to talk things through until all apparent issues have been solved.

In written communication these error-correction mechanisms don't exist. No facial expressions can be seen over the keyboard. It's not possible to detect half a second pauses in the other person's writing. People don't use filled pauses in their writing, because their use happens naturally without thinking, whereas written text always goes through some thinking (though rarely enough thinking). Because there is some overhead to writing, people are less inclined to making clarifying questions and repeating the other person's thoughts in their own words, to make sure that everything was understood correctly. Written communication removes such things as "unnecessary", which in turn leads to written communication being more error-prone.

Conclusions

The next time that you need to communicate something important, primarily try to say it face-to-face, secondarily make a phone call, tertiarily write a long email message or letter (it's easier to explain yourself through many words) and only as a last resort use a text message or other space-limited text-based means of communication.

Tidy rewritten histories with Git

2009-10-18T02:15:00.004+03:00

I imported some of my old projects from CVS to Git. I had the CVS repository of an old student project as a tarball. That one repository contained the sources of two programs - the main project and one small utility. I was able to import them into two separate Git repositories and also rewrite their version history so, that it would seem as if the utility program had always been a separate project and been using Maven (neither of which was true).

Importing the CVS repository to Git did not succeed with git cvsimport (it failed with "fatal error - cmalloc would have returned NULL"), but cvs2git worked and it was also orders of magnitude faster. It was necessary to edit the example options file provided with cvs2git - the CVS repository path and author names had to be configured. If some of the authors have non-ascii characters in their names, it's best to save the options file in UTF-8 format and use the u'Námè' format for the author names. See cvs2git's usage instructions for details on how to do the conversion.

Now that I had a Git repository with the history of both of the programs, it was time to separate the utility program's version history with git filter-branch (the main project's history did not need to be modified). It's best to take a temporary clone of the original repository before messing with filter-branch. That way it's easier to revert all changes and try again by just deleting and recreating the temporary repository.

I made a clone of the repository and in that clone I used --subdirectory-filter to remove everything else except the source codes of the utility program:

git filter-branch --subdirectory-filter src/hourparser -- --all

Originally it did not use Maven, but I wanted to modify the history to look like it had always used Maven. So then I used --tree-filter to move all the source files to the right directory structure. I also remove the manifest file, because Maven will generate it automatically. When removing files, it's best to use --prune-empty, or you may have problems for example during rebasing (I learned it the hard way). Also make sure that the last command in the filter will aways exit successfully with error code 0, or otherwise the whole filtering process will fail.

git filter-branch --prune-empty --tree-filter '
mkdir -p src/main/java/hourparser
mv *.java src/main/java/hourparser
rm -rf META-INF
' -- --all

After that was done, I had to insert the pom.xml and other Maven files to the version history. That I was able to do by making multiple commits with the initial project files and all the version number incrementing changes to them (the version number in pom.xml needs to be changed when a release is made) so that those commits were last in the history. Then I used git rebase to reorder the commits, so that the changes to pom.xml would be in the right places in the history. Changing the initial commit was more complicated, but I was able to do it by creating a new repository with that initial commit, and then rebasing the rest of the history from the other repository on top of it.

After this I had the right commits in place, but their dates were not consistent. The commits for the Maven files were dated in 2009, but everything else was dated 2005. That I was able to fix by exporting the repository into patches, editing the authors and author dates in the patches with a text editor, and finally importing the patches into a blank repository. Temporary patches are a powerful tool in editing the history.

git format-patch -M -C -k --root master
[edit the patches and move them to a new directory]
git init
git am -k 00*

After all this the authors and author dates were fine, but the committer and commit date information still needed fixing. I was able to change the committers to be the same as the author with the following command:

git filter-branch -f --env-filter '
export GIT_COMMITTER_NAME="$GIT_AUTHOR_NAME"
export GIT_COMMITTER_EMAIL="$GIT_AUTHOR_EMAIL"
export GIT_COMMITTER_DATE="$GIT_AUTHOR_DATE"
' -- --all

After this I could publish Git repositories of the main project and the utility project with nice clean histories.

TDD is not test-first. TDD is specify-first and test-last.

2009-10-10T23:36:00.019+03:00

Recently there has been some discussion about TDD at the Object Mentor blog. In one of my comments I brought forth the idea in this article's title. It was such a nice oxymoron that I decided to elaborate here that what I mean by saying that "TDD is not test-first".

The TDD Process

Because Test-Driven Development has the word "test" in its name, and the people doing TDD speak about "writing tests", there is much confusion about TDD, because frankly, the big benefits of TDD have very little to do with testing. That's what brought about Behaviour-Driven Development (BDD) which is the same as TDD done right, but without the word "test". Because BDD does not talk about testing, it helps many to focus on the things that TDD is really about.

Here is a diagram of how I have come to think about the TDD process:

When you look at that diagram, it probably seems quite similar to traditional software development methods, even quite waterfallish. Let's remind ourselves what a waterfall looks like:

The waterfall model is "Specify - Design - Implement - Verify - Maintenance". The TDD process is otherwise the same, except that it loops very quickly (one cycle usually takes a couple of minutes), it has a new "Cleanup" step, all of it is considered "Design", and all of it is also considered "Maintenance".

Step 1: Specify

The first step in TDD is to write ~~a test~~ a specification of the desired behaviour. Here the developer thinks about what the system should do, before thinking about how it should be implemented. The developer focuses on just one thing at a time - separate the what from the how.

When the developer has decided that "what is the next important behaviour that the system does not yet do", then he will document the specification of that behaviour. The specifications are documented in a very formal language (i.e. a programming language), so formal that they can be executed and verified automatically (not to be confused with formal verification).

Writing this executable specification will save lots of time, because the developer does not need to do the verification manually. It will also communicate the original developer's intent to other developers, because anybody can have a look at the specification and see what the original developer had in his mind when he wrote some code. It will even help the original developer to remember, when he returns to code that he wrote a couple of weeks ago, that what he was thinking at the time of writing it. And best of all, anybody can verify the specifications at any moment, so any change that breaks the system will be noticed early.

Step 2: Implement

After the specification has been written, it's time to think about how to implement it, and then just implement it. The developer will focus on passing just one tiny specification at a time. This is the most easy step in the whole TDD process.

If this step isn't easy, then the developer tried to make a too big step and specified too much new behaviour. In that case he should go back and write a smaller specification. With experience, the developer will learn that what kind of steps are not too big (so that the step would be hard) and not too small (so that the progress would be slow).

If this step isn't easy, it could also be that the code that needs to be changed is not maintainable enough for this change. In that case the developer should first clean up and reorganize the code, so that making the change will be easy. If the code is already very clean, then only a little reorganizing is needed. If the code is dirty, then it will take more time. Little by little, as the code is being changed, the codebase will get cleaner and stay clean, because otherwise the TDD process will soon grind to a halt.

Step 3: Verify

Now the developer has implemented a couple of lines of code, which he believes will match the specification. Then he needs to verify that the code fulfills its specification. Thanks to the executable specifications, he can just click a button and after a couple of seconds his IDE will report whether the specification has been met.

This step is so quick and easy, that it totally changes the way that code can be written. It will make the developers fearless in making changes to code that they do not know, because they can trust that if they break something, they will find it out in a couple of seconds. So whenever they see some bad code, they can right away clean it up, without fear of breaking something. This difference is so overwhelming, that it even made Michael Feathers (in his book "Working Effectively with Legacy Code") to define "legacy code" as code without such executable specifications.

Step 4: Cleanup

When the code meets all its specifications, it's time to clean up the code. As Uncle Bob says, "the only way to go fast is to go well". We need to keep the code at top shape, so that making future changes will be easier. We can do this by following the boy scout rule: Always check-in code cleaner than when you checked it out.

So when the developer has written some code that works, he will spend a few seconds or minutes in removing duplicated code, choosing more descriptive names, dividing big methods into many smaller methods and so on. Every now and then the developer will notice new structures emerging from the code, so he adjusts his original plans about the design and extracts a new class or reorganizes some existing classes.

Steps 1-4: Design

The specification, implementation and cleanup steps all include designing the code, although in each step the focus in designing slightly different aspects of the code. As Kent Beck says in his book "Extreme Programming Explained" (2nd Ed. page 105), "far from design nothing, the XP strategy is design always."

In the specification step, the developer is first designing the behaviour of the system, what the system should do. When he is writing the specification, he is designing how the API of the code being implemented will be used.

In the implementation step, the developer is designing the structure of the code, how the code should be structured so that it will do what it should do. In this step the amount of design is quite low, because the goal is to just make the simplest possible change that will achieve the desired behaviour. It is acceptable to write dirty code just to meet the specification, because the code will be cleaned immediately after writing it.

In the cleanup step, the developer is designing that what is the right way to structure the code, how to make the code cleaner, more maintainable. This is where the majority of the design takes place, which also makes the cleanup step the hardest step in the whole TDD process. Thanks to the automatic verification of the specifications, it is possible to evolve the design and architecture of the system in small, safe steps. When improving the design of the system, the system will be working at all times, so it is possible to do even big changes incrementally, without a grand redesign.

Steps 1-4: Maintenance

When using TDD, we are at all times in maintenance mode, because we are all the time changing existing code. Only the first cycle, the first couple of minutes, is purely greenfield code.

This continuous maintenance forces the system to be maintainable, because if it would not be maintainable, the TDD process would grind to a halt very soon. On the other hand, waterfall does not force the system to be maintainable, because the maintenance mode comes only after everything else has been done, which means that with waterfall it's possible to write unmaintainable code.

Maybe this is one of the reasons why TDD produces better code, more maintainable code. If some piece of code is not maintainable, it will become apparent very quickly, even before that piece of code has been completed. This early feedback in turn will drive the developer into changing the code to be more maintainable, because he can feel the pain of changing non-maintainable code.

Updated 2009-10-15:

Somebody posted this at Reddit and in the comments the appears to be some confusion about the kinds of specs that I'm referring to in this article and which are useful in TDD. To find out in what style my specs are written, have a look at the TDD tutorial which I have created. To see TDD in action in a non-trivial application, have a look at my current project.

And of course the executable specs are not the only kinds of specifications that a real-life project needs. Just as I said above, they are "a specification of the desired behaviour", not the only specification. TDD specs are written at the level of individual components, which makes them useful for driving the design of the code in the components. They are the lowest level specifications that a system has. But before diving into the code, first the project should have high-level requirements and specifications describing from a user's point of view that what the system should do. A high-level architectural description is also useful.

I'm also into user interface design, so whenever the system being built will be used by human users, the first thing I'll do in such a project is to gather the goals and tasks of the users, based on which I will design a user interface specification in the form of a paper prototype, but that would be the topic for a whole another article...

New Architecture for Dimdwarf

2009-06-11T16:06:00.014+03:00

In a previous post I gave an introduction to Dimdwarf - how the project got started and what are its goals. In this post I will explain the planned architecture for Dimdwarf, which should be scalable enough to evolve the system into a distributed application server.

Background

In January 2009 I got tipped by dtrott at GitHub about a white paper called The End of an Architectural Era. It discusses how traditional RDBMS are outdated and that it's time to create database systems which are designed for current needs and hardware. In the paper they describe how they built a distributed high-availability in-memory DBMS which beats a commercial RDBMS in the TPC-C benchmark by almost two orders of magnitude. It keeps all the data in main memory (today's servers have lots of it), thus avoiding the greatest bottleneck in RDBMSs - writing transaction logs on hard disk (today's HDDs are not significantly faster than in the past). It uses a single-threaded execution model, which makes the implementation simpler and avoids the need for locking, yielding a more reliable system with better performance. Failover is achieved by replicating the data on multiple servers. Scalability is achieved by partitioning the data on multiple servers.

I thought that some of these ideas could be used in Darkstar, so I posted a thread about it on Darkstar forums. After thinking about it for a day, I came up with a proposal for how to apply the ideas to Darkstar's multi-node database. And after still a couple more days, the architecture appeared to be so simple that I added the making of a multi-node version to Dimdwarf's roadmap. Using ideas from that paper, it should be relatively simple to implement a distributed application server.

Issues with the current Dimdwarf architecture

Currently Dimdwarf uses locking-based concurrency in its implementation. For example its database and task scheduler contain shared mutable data and use locking for synchronization. As a result, their code is at times quite complex, especially in the database which needs to keep track of consistent views to the data for all active transactions. Also committing the transactions (two-phase commit protocol is used) requires some careful coordination and locking.

There have been some concurrency bugs in the system (one way to find them is to start 20-50 test runs in parallel to force more thread context switches), both in the database [1][2] and the task scheduler [3]. While all found concurrency bugs have been fixed, their existence in the first place is a code smell that the system is too complex and needs to be simplified. As it is said in The Art of Agile Development, in the No Bugs chapter, one must "eliminate bug breeding grounds" and solve the underlying cause:

Don't congratulate yourself yet—you've fixed the problem, but you haven't solved the underlying cause. Why did that bug occur? Discuss the code with your pairing partner. Is there a design flaw that made this bug possible? Can you change an API to make such bugs more obvious? Is there some way to refactor the code that would make this kind of bug less likely? Improve your design.

Some of the tests for the concurrent code are long and complex, which in turn is a test smell that the system is too complex. Lots of effort had to be put into making the tests repeatable [4][5][6][7][8][9][10], for example using CountDownLatch instances to force concurrent threads to proceed in a predictable order. Some of the tests even need comments, because the test code is so complex and inobvious.

All of this indicates that something is wrong with the current architecture. Even though Dimdwarf applications have a simple single-threaded programming model, the Dimdwarf server itself is far from being simple. Of course, the problem being solved by Dimdwarf is complex, but that does not mean that the solution also needs to be complex. It's just a matter of skill to create a simple solution to a complex problem.

Ideas for the new architecture

The paper The End of an Architectural Era gave me lots of ideas on how to simplify Dimdwarf's implementation. The database that was described in the paper, H-Store, is in many ways similar to Dimdwarf and Darkstar. For example all its transactions are local, so as to avoid expensive two-phase commits over the network, and it executes the application logic inside the database itself. But H-Store has also some new ideas that could be applied to Dimdwarf, the main points being:

The system is single-threaded, which makes its implementation simpler and avoids the need for locking.
All data is stored in memory, which avoids slow disk I/O. High-availability is achieved through replication on multiple servers.

Single-threadedness

Each H-Store server node is single-threaded, and to take advantage of multiple CPU cores, many server nodes need to be run on the same hardware. This results in simple data structures and good performance, because it will be possible to use simple non-thread-safe data structures and no locking. I liked the idea and thought about how to apply it to Dimdwarf.

I considered having only one thread per Dimdwarf server node, but it would not work because of one major difference between Dimdwarf and H-Store: data partitioning. In H-Store the data is partitioned over server nodes so that each transaction has all the data that it needs on one server node. Dimdwarf has also data partitioning and strives to make the data locally available, but in Dimdwarf the data will move around the cluster as the players of a MMO game move in the game, so the data partitioning needs to be changed all the time. In H-Store the data access patterns are stable, but in Dimdwarf they are fluctuating.

What does data partitioning have to do with the server being single-threaded? When a transaction tries to read data that is not available locally, it will need to request the data from another server node. While waiting for the data, that server node will be blocked and unable to proceed. Also the other server node, that has the data, is already executing some transaction, so it will not be able to reply with the requested data until the current transaction has ended. If Dimdwarf would be completely single-threaded, the latencies would be too high (and low latency is one of the primary goals). Because Dimdwarf can not guarantee full data locality, it needs to do have some internal concurrency to be able to respond quickly to requests from other servers.

But there is one way to make Dimdwarf's internals mostly single-threaded: one main thread and multiple worker threads. The main thread will do all database access, communicating with other server nodes, committing transactions and other core services. All actions in the main thread must execute quickly, in the order of thousands per second. The worker threads will execute the application logic. The application logic is divided into tasks, each task running in its own transaction. It is recommendable for the tasks to be short, in the order of ten milliseconds or less, but also much longer tasks will be allowed (if they do not write data that is modified concurrently by other tasks).

The communication between the main thread and worker threads, and also the communication between server nodes, will happen through message passing (like in Erlang). This will allow each component to be single-threaded, which will simplify the implementation and testing. It will also make low server-to-server response times possible, because each server node's main thread will execute only very short actions, so it will be able to respond quickly to incoming messages. It will also make it easier to take advantage of multiple cores by increasing the number of worker threads. Also no data copying needs to be done when a worker thread requests data from the main thread, because inside the same JVM it's possible to pass just a reference to some immutable data structure instead of copying the whole data structure over a socket.

In-memory database

The second main idea, keeping all data in memory, requires the data to be replicated over multiple server nodes. H-Store implements its replication by relying on deterministic database queries. H-Store executes the same queries (actually "transaction classes" containing SQL statements and program logic) on multiple server nodes in the same order. It does not replicate the actual modified data over the network, but replicates the tasks that do the modifications, and trusts that the tasks execute deterministically, which will result in the same data modifications to be made on the master and backup server nodes.

The determinism of tasks is a too high requirement to Dimdwarf, as it can not trust that the application programmers are careful enough to write deterministic Java code. Determinism is much easier to reach with SQL queries and very little program logic, than with untrusted imperative program code. So Dimdwarf will need to execute the task on one server node and replicate the modified data to a backup server node. Fortunately Dimdwarf's goals (an application server optimized for low latency, for the needs of online games) allow the relaxing on transaction durability, so we can do the replication asynchronously. This helps to minimize the latency from the user's point of view, but permits the loss of recent changes (within the last second) in case of a server failure.

Other ideas

The paper has also other good ideas, for example that the database should be "self-everything" - self-healing, self-maintaining, self-tuning etc. Computers are cheaper than people, so computers should do most of the work without need for human intervention. The database should be able to optimize its performance automatically, without the need for a DBA manually tuning the server parameters. The database should monitor its own state and heal itself automatically, without the need for a server administrator to keep an eye on the system constantly.

I also read the paper Time, Clocks, and the Ordering of Events in a Distributed System, about which I heard from waldo at the Darkstar forums. That paper taught me how to maintain a global ordering of events in a distributed system using Lamport timestamps. Dimdwarf will apply it so that together with each server-to-server message there is a timestamp of when the message was sent, and the receiving server node will update his clock's timestamp to be equal or greater than the message's send timestamp. The timestamp contains a sequentially increasing integer and a server node ID. This scheme may also be used to generate cluster-wide unique ID numbers for database entries.

Overview of Dimdwarf-HA

Dimdwarf will come in two editions - a single-node Dimdwarf and a multi-node Dimdwarf-HA. Here I will give an overview of the architecture for Dimdwarf-HA, but the same architecture will work for both editions. In the single-node version all components just run on the same server node and possibly some of the components may be disabled or changed.

An application will run on one server cluster. The server cluster will contain multiple server nodes (the expected cluster size is up to some tens of server nodes per application). There are a couple of different types of server nodes: gateway nodes, backend nodes, directory nodes and one coordinator node. A client will connect to a gateway node, and the gateway will forward messages from the client to a backend node for processing (and send the replies back to the client). The backend nodes contain the database and they execute all application logic. The directory nodes contain information that in which backend nodes each database entry is, and they may also contain information needed by the system's database garbage collector. The coordinator node does things that are best done by a single authoritative entity, for example signaling all nodes when the garbage collection algorithm's stage changes.

The system will automatically decide that which services will run on which server nodes. Automatic load balancing will try to share the load evenly over all server nodes in the cluster. When some server nodes fail, the other server nodes will do automatic failover and recover the data from backup copies.

A backend node contains one main thread and multiple worker threads. The threads and server nodes communicate through message passing. The main thread takes messages from an event queue one at a time, processes them, and sends messages to its worker threads and to other server nodes. The worker threads, which execute the application logic, communicate with messages only with their main thread. The same is true for all plugins and other components that run inside a server node - the main thread is the only one that can send messages to other server nodes, and all inter-component communication goes through the main thread.

The database is stored as an in-memory data structure in the main thread. Since it is the only thread that can access the database directly, the data structures don't need to be thread-safe and can be simpler. This makes the system much easier to implement and to test, which will result in more reliable software.

The main thread will do things like give database entries for the worker threads to read, request for database entries from other server nodes, commit transactions to the database, ensure that each database entry is replicated on enough many backup nodes, execute parts of the database garbage collection algorithm etc. All actions in the main thread should execute very quickly, thousands per second, so that the system would stay responsive and have low latency at all times. All slow actions must be executed in the worker threads or in plugins that have their own thread. For example the main thread will do no I/O, but if the database needs to be persisted in a file, it will be done asynchronously in a background thread.

The worker threads do most of the work. When a task is given to a worker thread, the worker thread will deserialize the task object and begin executing it. When the task tries to read objects that have not yet been loaded from the database, the worker thread will request for the database entry from the main thread, and after receiving it the worker thread will deserialize it and continue executing the task. When the task ends, the worker thread will serialize all loaded objects and send to the main thread everything that needs to be committed (modified data, new tasks, messages to clients).

The system is crash-only software:

Crash-only software is software that crashes safely and recovers quickly. The only way to stop it is to crash it, and the only way to start it is to recover. A crash-only system is composed of crash-only components which communicate with retryable requests; faults are handled by crashing and restarting the faulty component and retrying any requests which have timed out. The resulting system is often more robust and reliable because crash recovery is a first-class citizen in the development process, rather than an afterthought, and you no longer need the extra code (and associated interfaces and bugs) for explicit shutdown. All software ought to be able to crash safely and recover quickly, but crash-only software must have these qualities, or their lack becomes quickly evident.

Dimdwarf will probably use a System.exit(0) call in a bootstrapper's shutdown hook and will fall back to using kill -9 if necessary. As one of Dimdwarf's goals is to be a reliable high-availability application server, it needs to survive crashes well. Creating it as crash-only software is a good way to make any deficiencies apparent, so that they can be noticed and fixed early.

Executing tasks

When a client sends a message to a gateway node, the gateway will determine based on the client's session that on which backend node the message should be processed. If the client sends multiple messages, they are guaranteed to be processed in the order that they were sent. The gateway will create a task for processing the message and will send that task to a backend node for execution. The system will try to execute tasks on a node that has locally available most of the data needed by the tasks, and a layer of gateway nodes allows changing the backend node without the client knowing about it. (In Darkstar there are no gateway nodes, but the tasks are executed on the node to which the client is connected. Changing the node requires co-operation from clients.)

The backend node receives the task and begins executing it in one of its worker threads. As the worker thread executes, it will request the main thread for database entries to be read. If a database entry is not available locally, it needs to be requested from another backend node over the network. When the worker thread finishes executing the task, it will commit the transaction by sending a list of all modified data to the main thread. The main thread checks that there were no transaction conflicts, saves the changes to its database and replicates the data by sending the modifications of the transaction to another backend node for backup. If some messages to clients were created during the transaction, the messages are sent to the gateway nodes to which those clients are connected, and the gateway nodes will forward the messages to the clients.

If committing the transaction failed due to a transaction conflict, the task will be retried until it passes. If a task fails due to a programming error that throws an exception, then the task will be added to a list of failed tasks together with debug information (such as all database entries read and written by the task), so that a programmer may debug the reason for task failure. A failed task may then be cancelled or retried after fixing the bug.

Tasks may schedule new tasks for later execution. When a task commits, the commit contains a list of new scheduled tasks, in addition to modified database entries and messages to clients. The system will analyze the parameters of a task and will use heuristics to predict that what database entries will be modified by the task. Then when the scheduled time for the task to be executed comes, it will be executed on a backend node that contains locally most of the data that will be accessed by the task. The backend node will also try to ensure that concurrently executing worker threads will not modify the same database entries (tasks that modify the same entries will be run sequentially on the same worker thread). The decisions, that on which backend node a task should be executed, are done on a per-task basis, so each task that originated from a particular user may possibly be executed on a different backend node. (This is different from Darkstar, which has a notion of an "identity" that owns a task, and the task will be executed on the server node to which the task owner's identity is assigned. Also Darkstar supports repeated tasks, but Dimdwarf will probably simplify it by implementing task repetition in application code level, because then the system won't need to have native support for cancelling tasks, but supporting one-time tasks will be enough.)

Database entries

Each database entry has the following: unique ID, owner, modification timestamp and data. Each database entry is owned by one server node, and only that server node is allowed to write the entry. The other server nodes may only read the entry. For some other node to write the entry, it first needs to request for the ownership of the entry, and only after becoming the new owner can it write the entry.

The database uses multiversion concurrency control, so that each task works with a snapshot view of the database's data. When a task commits its modifications, the system will check the modification timestamps of the database entries to make sure that no other task modified them concurrently. This does not require locking, which may in some cases improve and in some cases lower performance (if there is much contention and the system's heuristics do not compensate for it well enough). The transaction isolation level is snapshot isolation.

When a task running in a worker thread needs to read a database entry, it will send a read request to the main thread. The main thread will check its local database, whether the requested entry is there. If it is, the main thread will respond to the worker thread with the requested data. If the entry is not in the local database or cache, the main thread will ask from a directory node that which backend node is the current owner of the entry. Then the main thread will ask for that backend node to send it a copy of the database entry. When it receives the copy, it will forward it to the worker thread that originally requested it.

When a task running in a worker thread commits, it will create a list of all database entries that were modified during the task. This includes also tasks that were created, messages that were sent to clients and whatever other data needs to be committed. When the main thread receives the commit request, it will check that none of the database entries were modified concurrently. This is done by comparing the last modified timestamps of the database entries. The main thread will also make sure that the task read a consistent snapshot view of the database. If there is a transaction conflict, the commit request is discarded and the task is retried. If there are no transaction conflicts, the main thread will store the changes to its database, send any messages for clients to the gateway nodes, and send the modified database entries to the current server node's backup node for replication. It will also send the updated database entries to other server nodes that have previously requested for a copy of that database entry, so that they would have the latest version of the entry.

When committing, the current server node needs to be the owner node of all modified database entries. If this is not so, the main thread will need to request for the ownership of the entries from their current owner. First it needs to find out who is the current owner. Each database entry contains information that which server node is the owner of that entry version. The information can also be received from the directory nodes. When the ownership of a database entry is transferred, the old owner will tell about the ownership transfer to all other server nodes that it knows have a copy of the database entry. Then those server nodes can decide to ask the new owner to send them updated versions of the database entry, in case it's an entry that they will read often.

It is not possible to delete database entries manually. A database garbage collector will check for unreachable database entries periodically and will delete entries that are not anymore used. The garbage collector algorithm will probably be based on the paper An Efficient On-the-Fly Cycle Collection. A number of different algorithms can be implemented to find out which one of them suits Dimdwarf and different types of applications the best.

Failover

Each backend node has one or more other backend nodes assigned as its backups. The server node that is the owner of a database entry is called the master node and it contains the master copy of the database entry. The server nodes that contain backup copies of the database entry are called backup nodes.

When the master node modifies some master copies, the master node sends to its backup nodes a list of all updates done during the transaction. Then the backup nodes update their backup copies to reflect the latest version from the master node. To ensure consistency, the updates of a transaction are always replicated as an atomic unit.

When a server node crashes, the first server node to notice it will signal the other server nodes about the crash and they will coordinate the failover. One of the crashed node's backup nodes takes up the responsibility of replacing the crashed node and then promotes its backup copies to master copies. The whole cluster is notified about the failover, that which backup node replaced which master node, so that the other server nodes may update their cached information about where each master copy is.

If there are multiple backup nodes, they may coordinate with each other that which one of them has the latest backup copies of the failed node's database entries. Also, because the owner of a master copy may change at any time, the backup nodes need to be notified about ownership transfers, so that they would not think that they are still the backup node of some database entry, even though its ownership has been transferred to a new master node which has different backup nodes. A suitable failover algorithm needs to be designed. It might be necessary to have additional checks that which node in the cluster has the latest backup copy, maybe by collecting that information in the directory nodes.

Although also other server nodes than backup nodes may contain copies of a database entry, those copies will not be promoted to master copies, because they are not guaranteed to contain a consistent view of the data that was committed. If a transaction modifies database entries X and Y, at failover the same version of both of them needs to be recovered. The backup node is guaranteed to have the same version of both X and Y, because the master node always sends it a list of all updates within a transaction, but other nodes may receive an updated copy of either X or Y if they are interested in only one of them.

The other server node types (gateway, directory, coordinator) may also have backup nodes if they contain information that would be slow to rebuild.

Session and application contexts

When a client connects to a gateway node, a session is created for it. The sessions are decoupled from authentication and user accounts. The application will need to authenticate the users itself and decide how to handle cases where the same user connects to the server multiple times.

Each session has a map of objects associated with it. It can be used to bind objects to a session, for example to store information about whether the session has been authenticated. It will also be used by the dependency injection container (Guice) to implement a session scope. The whole application has a similar map of objects, which will be used to implement an application scope.

Objects in session and application scopes will be persisted in the database. It will also be possible to have non-persisted scopes, such as a server node specific singleton scope, in case the application code needs additional services that can not be implemented as normal tasks.

Session messages and multicast channels

When application code knows the session ID of some client, it can send messages to that client. As in Darkstar, there are two categories of sending messages: session messages for one client and multicast channels for multiple clients.

Messages from a session are guaranteed to be processed in the same order as they were sent. The cluster might use an algorithm similar to the Quake 3 networking model, so that the gateway will forward to the backend nodes a list of messages from the client, which have not yet been acknowledged to have been executed. In the backend side, the processing of session messages will modify a variable in the session's database entry to acknowledge the last executed message. Transactions will make sure that all session messages are processed once and in the right order.

Multicast channels will operate the same way as session messages, except that the messages are sent to multiple sessions and it will be possible to have channels with an unreliable transport. When application code sends a message to a channel, the system will list all sessions that are subscribed to that channel. It will partition the sessions based on the gateway node to which the clients are connected and will forward the message to those gateway nodes. The gateway nodes in turn will forward the messages to individual clients.

Receiving messages from clients through session messages or channels is done using message listeners, similar to Darkstar. The application code will implement a message listener interface and register it to a session or channel. Then the method of that listener will be called in a new task when messages are received from clients.

Supporting services

A Dimdwarf cluster requires also some additional services: A tracker keeps a list of all server nodes in a cluster, so that it would be possible to connect to a cluster without knowing the IPs and ports of the server nodes. A bootstrap process runs on each physical machine and it has the power to start and to kill server nodes on that machine. The trackers and bootstrappers can be used by management tools to control the cluster.

There will be command line tools for managing the cluster. There will be commands for installing an application in a new cluster, for adding and removing servers in the cluster, for upgrading the application version, for shutting down a cluster etc.

Application upgrades will happen on-the-fly. First server nodes which have the new application code are started alongside the existing server nodes. Then the new server nodes begin to mirror the data in the old server nodes the same way as backup nodes. Finally, in one cluster-wide move, the new server nodes take over and begin executing the tasks instead of the old server nodes. The serialized data in the database will be upgraded on-the-fly as it is read by tasks on the new server nodes.

Dimdwarf may be extended by writing plugins. There will be need for advanced management, monitoring and profiling tools. For example I'm planning on creating a commercial profiler that will give detailed information about all tasks and server-to-server messages, so that it would be possible to know exactly what is happening in the cluster and in which order. It will be possible to record all events in the cluster and then use the profiler to step through the recorded events, moving forwards and backwards in time.

Converting confused SVN repositories into Git repositories

2009-05-09T01:34:00.007+03:00

I've been spending this evening converting the repositories of my old projects from SVN to Git. I used to have the repositories hosted on my home server, but now I've moved them to GitHub (see my GitHub profile). Here I have outlined the procedures that I used to convert my source code repositories.

Preparations

First I installed svn2git, because it handles tags and branches much better than the basic git svn clone command. I run Git under Cygwin, so first I had to install the ruby package using Cygwin Setup. And since Cygwin's Ruby does not come with RubyGems, I downloaded and installed it manually using these instructions.

When RubyGems was installed, I was able to type the following commands to finally install svn2git from GitHub:

gem sources -a http://gems.github.com
gem install nirvdrum-svn2git

Most of my SVN repositories were already running on my server, so accessing them was easy. But for some projects I had just a tarballed version of the repository. For those it was best to run svnserve locally, because git-svn is was not able to connect to a SVN repository through the file system. So I unpacked the repository tarballs into a directory X (so that the individual repositories are subdirectories of X), after which I started svnserve with the command "svnserve --daemon --foreground --root X". Then I could access the repositories through "svn://localhost/name-of-repo" URLs.

You will also need to write an authors file which lists all usernames in the SVN repositories and what their corresponding Git author names should be. The format is as follows, one user per line:

loginname = Joe User <user@example.com>

I placed the authors.txt file into my working directory, where I could easily point to it when doing the conversions.

Simple conversions

When the SVN repository uses the standard layout and its version history does not have anything weird happening, then the following commands could be used to convert the repository.

First make an empty directory and use svn2git to clone the SVN repository:

mkdir name-of-repo
cd name-of-repo
svn2git svn://localhost/name-of-repo --authors ../authors.txt --verbose

When that is finished, check that all branches, tags and version history were imported correctly:

git branch
git tag
gitk --all

You will probably want to publish the repository, so create a new repository (in this example I use GitHub) and push your repository there. Remember to include all branches and tags:

git remote add origin git@github.com:username/git-repo-name.git
git push --all
git push --tags

After that you better clone the published repository from the central server, the way you normally do (cd /my/projects ; git clone git@github.com:username/git-repo-name.git), and delete the original repository which was used when importing from SVN, to get rid of all the SVN related files in the .git directory.

You might also want to add .gitignore file into your project. For my projects I use the following to keep Maven's build artifacts and IntelliJ IDEA's workspace file out of version control:

/*.iws
/target/
/*/target/

Complex conversions

I had one SVN repository where the repository layout had been changed in the middle of the project. At first all project files had been in the root of the repository ("/"), after which they had been moved into /trunk. This caused that when I imported the SVN repository using the standard layout options, the history stopped where that move was made, because before that point in history there was no /trunk. I wanted to import a clean history, so that this mess would not be reflected in the resulting Git repository's history.

What I did, was that first I imported the latter part of the history which used the standard layout:

mkdir messy-repo.2
cd messy-repo.2
svn2git svn://localhost/messy-repo/trunk --rootistrunk --authors ../authors.txt --verbose

Then I imported the first part of the history which used the trunkless layout. This also includes the latter part of the history, but with all files moved under a /trunk directory:

mkdir messy-repo.1
cd messy-repo.1
svn2git svn://localhost/messy-repo --rootistrunk --authors ../authors.txt --verbose

Then I created a new repository where I would be combining the history from those two repositories. I cloned it from the repository with the first part of the clean history.

git clone file:///tmp/svn2git/messy-repo.1/.git messy-repo.combined
cd messy-repo.combined

Then I would start a branch "old_master" from the current master, just to be sure not to lose it. I would also make a tag "after_mess" for the commit that changed the SVN repository layout, and a tag "before_mess" for the commit just before that, where all project files were still cleanly in the repository root.

Did I mention, that the layout changing commit did also add one file, in addition to just changing the repository layout? So I had to recover that change from the otherwise pointless commit. First I had do get a patch with the desirable changes. So I hand-copied from SVN the desired file, checked out the version in Git just before the mess, made the desired change to the working copy, committed it and tagged it so that it would not be lost.

cd messy-repo.combined
git checkout before_mess
git add path/to/the/DesiredFile.java
git commit -m "Recovered the desired file from the mess"
git tag desired_changes

Then I would make a patch with just that once change:

git format-patch -M -C -k -1 desired_changes

Which then created the file 0001-desired-changes.patch.

I needed also clean patches for the latter part of the version history. So I created patches for all changes in the messy-repo.2 repository.

cd messy-repo.2
git format-patch -M -C -k --root master

Then I would hand-edit the 0001-desired-changes.patch file to contain the same date and time as the original commit that messed up the repo. I would also remove the patch for that commit from the patches produced by messy-repo.2.

Then it was time to merge the patches into the first part of the history:

cd messy-repo.combined
git checkout before_mess
git am -k 0001-desired-changes.patch
git am -k patches-from-repo-2/00*
git branch fixed_master
git checkout fixed_master

That way all the history was saved, even the author dates were unchanged (commit dates did however change to current time when using patches - it's possible to rewrite the commit dates using git filter-branch). After that I could just clean up the branches and push it to the central repository as normally.

Version number management for multi-module Maven projects

2009-05-08T18:12:00.015+03:00

I've been thinking about how to best organize the Maven modules in Dimdwarf. My requirements are that (1) the version number of the public API module must stay the same, unless there are changes to the public API, (2) opening and developing the project should be easy, so that I can open the whole project with all its modules by opening the one POM in IntelliJ IDEA, and (3) all code for the project should be stored in one Git repository, so that the version history for all modules is combined and checking out the whole project can be done with one command.

The project structure is currently as follows (these nice graphs were produced with yEd).

I have one POM module, "dimdwarf", at the root of the project directory. It is the parent of all other modules (that's where dependencyManagement and the common plugins are configured) and it also has as submodules all other modules. The "dimdwarf-api" module is what all users of my framework will depend on, so I want its version numbers to change very rarely - only when the API is changed, not every time that I release just a new version of the server implementation. The "dimdwarf-aop" and "dimdwarf-agent" modules handle the bytecode manipulation and they are needed as part of the bootstrap process. "dimdwarf-core" does not use the AOP classes directly, but it has a dependency to "dimdwarf-aop" for testing purposes. The module "dimdwarf-dist" assembles all other modules together and builds a redistributable ZIP file.

Yesterday I was looking for a solution for reaching my requirements. StackOverflow did not have any existing questions which would have touched exactly this problem, but in one of the answers there was a link to Oliver's blog post which matched my situation perfectly (also read the follow-up). He proposed a solution that checks for consistency in the project structure and fails the build if the modules have dependencies with a wrong version.

After thinking about that some, I came up with a possibly better way to manage the version numbers. It would be a tool (possibly implemented as a Maven plugin) that helps in updating the module version numbers. The tool would be called "module version bumper" or similar. Its commands should be run the directory that contains the project's "workspace POM" (one that has as submodules all modules of the project, but none of the modules depend on it), so that the tool can find all modules that are part of the project.

For the version bumper to work with Dimdwarf, the project structure needs to be refactored:

All the common settings (dependencyManagement, plugins etc.) are in the "parent" POM file, which the other modules then extend. I decided to make "dimdwarf-api" independent from it, because I don't want library version upgrades to be reflected in the API's version number. (I could also have created "parent-common" and "parent-deps" which extends "parent-common", but let's keep it simple for now and tolerate some duplication in the API's POM.) The workspace POM, "dimdwarf", does not anymore have the added responsibility of being also the parent POM, which helps the project get rid of cyclic dependencies between the POMs.

To explain how the version bumper would work, let's start with an example of the workflow of making changes to the project. In the beginning, version 1.0.0 of Dimdwarf has recently been released and all modules have "1.0.0" as their version number.

parent 1.0.0 dimdwarf-api 1.0.0 dimdwarf-api-internal 1.0.0 dimdwarf-core 1.0.0 dimdwarf-aop 1.0.0 dimdwarf-agent 1.0.0 dimdwarf-dist 1.0.0 dimdwarf 1.0.0

I notice a bug in the "dimdwarf-aop" module, so I need to make changes to it. Since "dimdwarf-aop" now has a release version (i.e. one that does not end with "-SNAPSHOT"), I need to bump its version to be the next development version (i.e. a "-SNAPSHOT" version higher than the previous release version).

In the project's root directory, I run the version bumper tool's command: "mvn version-bump dimdwarf-aop". This command reads the version number of all modules in the project and determines that "1.0.0" is the highest version number in use. Since it is a release version number, the tool prompts me for the next development version, offering "1.0.1-SNAPSHOT" as the default. I accept the default. Then the tool changes that to be the version number of "dimdwarf-aop" and of all modules that depend on "dimdwarf-aop" at runtime ("dimdwarf-core" has only a test-time dependency, so it is not changed). So now the version numbers are as follows, with changes highlighted in blue:

parent 1.0.0 dimdwarf-api 1.0.0 dimdwarf-api-internal 1.0.0 dimdwarf-core 1.0.0 dimdwarf-aop 1.0.1-SNAPSHOT dimdwarf-agent 1.0.1-SNAPSHOT dimdwarf-dist 1.0.1-SNAPSHOT dimdwarf 1.0.1-SNAPSHOT

Then I make some changes in "dimdwarf-aop" to fix the bug and commit it to version control.

Some days after that, I begin making some bug fixes to the "dimdwarf-core" module. I change the code, but forget that I have not bumped that module's version to be next development version. I commit the changes to version control (I use Git), but thankfully I have a pre-commit hook that verifies that all modules with changes use a development version (or a release version that is strictly higher than the version in the previous commit - otherwise you couldn't commit a new release). The commit fails with a message:

The following files were changed in module "dimdwarf-core" which has the release version "1.0.0". Update the module to use a development version with the command "mvn version-bump dimdwarf-core" or recommit with the --no-verify option to bypass this version check. dimdwarf-core/src/main/java/x/y/z/SomeFile.java dimdwarf-core/src/main/java/x/y/z/AnotherFile.java

I realize my mistake, so I run the command "mvn version-bump dimdwarf-core". This command reads the version number of all modules in the project and determines that "1.0.1-SNAPSHOT" is the highest version number in use. Since it is a development version number, the tool prompts me for the development version for "dimdwarf-core" module, offering "1.0.1-SNAPSHOT" as the default. I accept the default. Then the tool changes that to be the version number of "dimdwarf-core" and of all modules that depend on "dimdwarf-core" at runtime (only "dimdwarf-dist" and "dimdwarf" depend on it, but since they already have version "1.0.1-SNAPSHOT", they don't need to be updated). So now the version numbers are as follows:

parent 1.0.0 dimdwarf-api 1.0.0 dimdwarf-api-internal 1.0.0 dimdwarf-core 1.0.1-SNAPSHOT dimdwarf-aop 1.0.1-SNAPSHOT dimdwarf-agent 1.0.1-SNAPSHOT dimdwarf-dist 1.0.1-SNAPSHOT dimdwarf 1.0.1-SNAPSHOT

Now I want to publish the new release, so I run a tool that changes all the development versions to release versions (is there already a Maven plugin that does it?). After that the versions numbers are:

parent 1.0.0 dimdwarf-api 1.0.0 dimdwarf-api-internal 1.0.0 dimdwarf-core 1.0.1 dimdwarf-aop 1.0.1 dimdwarf-agent 1.0.1 dimdwarf-dist 1.0.1 dimdwarf 1.0.1

I commit the changes to version control and tag it as "dimdwarf-1.0.1". I checkout the tag to a clean directory, build it and deploy all the 1.0.1 artifacts to the central Maven repository (the already deployed 1.0.0 version may not be redeployed). I also collect the newly built redistributable ZIP file from the /dimdwarf-dist/target directory and upload it to the web site for download.

So that is my idea for managing version numbers in multi-module Maven projects. What do you think, would a workflow such as this work in practice? Do you think that there will be problems with this version numbering scheme (mixed development and release versions) when using continuous integration or when deploying to a Maven repository (where overwriting previously deployed versions is not allowed)? Would somebody with experience in Maven plugin development be willing to help in implementing this?