Esko Luontola: 2010-07

There is already the term Design for Testability - it's easy to write tests for the software. I would like to coin a new term, Design for Integrability - it's easy to integrate the system with its external environment. (Yes, integrability is a word.)

Designing for testability is closely linked with how good the design of the code is. A good way to design for testability is to write the tests first, in short cycles, which leads to all code by definition being tested. As a result, the developers will need to early on go through the pains of improving the design to be testable, because otherwise it would be hard for them to write tests for it.

Designing for integrability is possible with a similar technique. The book Growing Object-Oriented Software, Guided by Tests (GOOS) presents a style of doing TDD where the project is started by writing end-to-end tests, which are then used in driving the design and getting early feedback (see pages 8-10, 31-37, 84-88 and the code examples). Also the "end-to-end" of the GOOS authors is probably more end-to-end than the "end-to-end" of many others. Quoted from page 9:

For us, "end-to-end" means more than just interacting with the system from the outside - that might be better called "edge-to-edge" testing. We prefer to have the end-to-end tests exercise both the system and the process by which it's build and deployed. An automated build, usually triggered by someone checking code into the source repository, will: check out the latest version; compile and unit-test the code; integrate and package the system; perform a production-like deployment into a realistic environment; and, finally, exercise the system through its external access points. This sounds like a lot of effort (it is), but has to be done anyway repeatedly during the software's lifetime. Many of the steps might be fiddly and error-prone, so the end-to-end build cycle is an ideal candidate for automation. You'll see in Chapter 10 how early in a project we get this working.

A system's interaction with its external environment is often one of the riskiest areas in its development, so the authors of the GOOS book prefer to expose the uncertainties early by starting with a walking skeleton, which contains the basic infrastructure and integrates with the external environment. On page 32 they define walking skeleton as "an implementation of the thinnest possible slice of real functionality that we can automatically build, deploy, and test end-to-end. It should include just enough of the automation, the major components, and communication mechanisms to allow us to start working on the first feature." This forces the team to address the "unknown unknown" technical and organization risks at the beginning of the project, while there is still time to do something, instead of starting the integration only at the end of the project.

Starting with the integration also means that the chaos related to solving the uncertainties moves from the end of the project to the beginning of the project. But once the end-to-end infrastructure is in place, the rest of the project will be easier. On page 37 there is a nice illustration of this:

I also perceive the end-to-end tests to be helpful in guiding the design of the system towards integrability. When writing software with an inside-out approach to TDD, starting with low-level components and gluing them together until all components are ready, it's possible that once the development reaches high-level components which need to interact with external systems and libraries, the design of the low-level components makes the integration hard. So then you will need to change the design to make integration easier. But when developing outside-in and starting with end-to-end tests, those integration problems will be solved before the rest of the system is implemented - when changing the design is easier.

Listening to the feedback from the end-to-end tests can also improve the management interfaces of the system. Nat Pryce writes in TDD at the System Scale that the things that make writing reliable end-to-end tests hard, are also what makes managing a system hard. He writes: "If our system tests are unreliable, that's a sign that we need to add interfaces to our system through which tests can better observe, synchronise with and control the activity of the system. Those changes turned out to be exactly what we need to better manage the systems we built. We used the same interfaces that the system exposed to the tests to build automated and manual support tools."

By starting with end-to-end tests it's possible to get early feedback and know whether we are moving in the right direction. Also the system will be by definition integrable, because it has been integrated since the beginning.

Note however, that what J.B. Rainsberger says in Integration Tests Are a Scam still applies. You should not rely on the end-to-end tests for the basic correctness of the system, but you should have unit-level tests which in themselves provide good coverage. End-to-end tests take lots of time to execute, so it's impractical to execute them all the time while refactoring (my personal pain threshold for recompiling and running all tests after a one-liner change is less than 5-10 seconds). In the approach of the GOOS authors the emphasis is more on "test-driving" end-to-end than on "testing" end-to-end. See the discussion at Steve Freeman's blog post on the topic (also the comments).

Experience 1

The first project where I have tried this approach is Dimdwarf - a distributed application server for online games. I started by writing an end-to-end test the way I would like to write it (ClientConnectionTest). I configured Maven to unpack the distribution package into a sandbox directory (end-to-end-tests/pom.xml) against which I will then run my end-to-end tests in Maven's integration-test phase. The test double applications which I deploy on the server are in the main sources of the end-to-end-tests module, and I deploy them by copying the JAR file and writing the appropriate configuration file (ServerRunner.deployApplication()). It takes about half a second for the server to start (class loading is what takes most of the time), so the tests will wait until the server prints to the logs that it is ready (ServerRunner.startApplication()). The server is launched in a separate process using ProcessRunner and its stdout/stderr are redirected to the test runner's stdout/stderr and to a StreamWatcher which allows the tests to examine the output using ProcessRunner.waitForOutput(). There is a similar test driver for a client, which connects to the server via a socket, and it has some helper methods for sending messages to the server and checking the responses (ClientRunner). When the test ends, the server process is killed by calling Process.destroy() - after all it is meant to be crash only software.

Getting the end-to-end tests in place went nicely. It took 9.5 hours to write the end-to-end test infrastructure (and the tests are now deterministic), plus about 8 hours to build the infrastructure for starting up the server, some reorganizing of the project, and enough of the network layer to get the first end-to-end test to pass (the server sends a login failure message when any client tries to login). The walking skeleton does not yet integrate with all third-party components that will be part of the final system. For example the system has not yet been integrated with an on-disk database (although the system can be almost fully implemented without it, because the system anyways relies primarily on its internal in-memory database).

It takes 0.8 seconds to run just one end-to-end test, which is awfully slow compared to the unit tests (I could run the rest of the 400+ tests in the same time if JUnit just would run the tests in parallel on 4 CPU cores), in addition to which it takes 13 seconds to package the project with Maven, so the end-to-end tests won't be of much use while refactoring, but they were very helpful in getting the deployment infrastructure ready. I will probably write end-to-end tests for all client communication and a couple of tests for some internal features (for example that persistence works over restarts and that database garbage collection deletes the garbage). The test runner infrastructure should also be helpful in writing tests for non-functional requirements, such as robustness and scalability.

Experience 2

In another project I was coaching a team of 10 university graduate students during a 7 week course (i.e. they had been 3-5 years at university studying computer science - actually also I was a graduate student and had been there for 8 years). We were building a web application using Scala + Lift + CouchDB as our stack. The external systems to which the application connects are its own database and an external web service. We started by writing an end-to-end test which starts up the application and the external web service in their own HTTP servers using Jetty, puts some data - actually just a single string - to the external web service, the application fetches the data from the web service and saves it to the database, after which the test connects to the application using Selenium's HtmlUnitDriver and checks whether the data is shown on the page. All applications were run inside the same JVM and the CouchDB server was assumed to be already running in localhost without any password.

It took a bit over one week (30 h/week × 10 students) to get the walking skeleton up and ~~walking~~ crawling. I was helping with some things, such as getting Maven configured and tests running, but otherwise I was trying to keep away from the keyboard and focus on instructing others that how to do things. I also code reviewed (and refactored) almost all code. Before getting started with the walking skeleton, we had spent about 2 weeks learning TDD, Scala, Lift, CouchDB and evaluating some JavaScript testing frameworks.

The end-to-end tests had lots of undeterminism and were flaky. Parsing the HTML pages produced by the application made writing tests hard, especially when some of that HTML was generated dynamically with JavaScript and updated with Ajax/Comet. There were conflicts with port numbers and database names, which were found out when the CI server ran two builds in parallel. There were also issues with the testing framework, ScalaTest, which by default creates only one instance of the test class and reuses it for all tests - it took some time hunting weird bugs until we noticed it (the solution is to mix in the OneInstancePerTest trait). It would have been better to start the application-under-test in its own process, because reusing the JVM might also have been the cause for some of the side-effects between the tests, and during the project we did not yet get all deployment infrastructure ready (for example some settings were passed via System.setProperty()).

We were also faced with far too many bugs (2-3 in total, I think) in the Specs framework, which ignited me to write a ticket for "DIY/NIH testing framework", later named Specsy, which I have been working on slowly since then. Because none of the "after blocks" in Specs really worked after every test execution, I had to use shutdown hooks to write a hack which deletes the temporary CouchDB databases after the tests are finished and the JVM exits. We used to have hundreds of stale databases with randomly generated names, because the code which was supposed to clean up after an integration test was not being executed.

The test execution times also increased towards the end to the project. One problem was that Scala is slow to compile and the end-to-end tests did a full build with Maven, which took over a minute. Another (smaller) problem was that some of the meant-to-be unit tests were needlessly using the database when it should have been faked (IIRC, it took over 10 seconds to execute the non-end-to-end tests). Let's hope that the Scala compiler will be parallelized in the near future (at least it's on a TODO list), so that the compile speeds would be more tolerable.

All in all, I think the end-to-end tests were effective in finding problems with the design of the system and the tests themselves. It requires much from the development team to write good, reliable tests. The system should now have quite good test coverage, so that its development can continue - starting with some cleaning up of the design and improving the tests.

2010-07-28

Design for Integrability

Experience 1

Experience 2