2010-02-13

Three Styles of Naming Tests

I have now used TDD for about 3 years, during which time I've come to notice three different styles of naming and organizing tests. In this article I'll explain the differences between what I call specification-style, example-style and implementation-style.

Tests as a specification of the system's behaviour

Specification-style originates from Behaviour-Driven Development (BDD) and it's the style that I use 90% of the time. It can be found among practitioners of BDD (for some definition of BDD), so it could also be called "BDD-style". However, just using a BDD framework does not mean that you write your tests in this style. For example the examples on Cucumber's front page (no pun intended) are more in example-style than in specification-style. (By the way, I don't buy into the customer-facing requirements-analysis side of BDD, because in my opinion interaction design is much better suited for it.)

In specification-style the tests are considered to be a specification of the system's behaviour. The test names should be sentences which describe what the system should do - what are the system's features. Just by reading the names of the tests, it should be obvious that what the system does, even to such an extent that somebody can implement a similar system just by looking at the test names and not their body.

When a test fails, there are three options: (1) the implementation is broken and should be fixed, (2) the test is broken and should be fixed, (3) the test is not anymore needed and should be removed. If the test has been written in specification-style, then knowing what to do is simple. Just read the name of the test and decide whether that piece of behaviour is still needed. If it is, then you keep the same test name, but change the implementation or test code. If it is not, for example if the specified behaviour conflicts with some new desired behaviour, then you can remove the test and double-check all other tests in the same file, in case some of them should also be updated.

Here are some examples of specification-style tests (using Go and GoSpec). A test for Fibonacci numbers could look like this:

func FibSpec(c gospec.Context) {
    fib := NewFib().Sequence(10)

    c.Specify("The first two Fibonacci numbers are 0 and 1", func() {
        c.Expect(fib[0], Equals, 0)
        c.Expect(fib[1], Equals, 1)
    })
    c.Specify("Each remaining number is the sum of the previous two", func() {
        for i := 2; i < len(fib); i++ {
            c.Expect(fib[i], Equals, fib[i-1] + fib[i-2])
        }
    })
}

If you look at the Wikipedia entry for Fibonacci numbers, you will notice that the above test names are directly taken from there. This is how Wikipedia defines the Fibonacci numbers: "By definition, the first two Fibonacci numbers are 0 and 1, and each remaining number is the sum of the previous two. Some sources omit the initial 0, instead beginning the sequence with two 1s." The test names should document the same specification.

Each test focuses on a single piece of behaviour

One more example, in the same language and same framework, this time on stacks (ignore the comments for now). This is the style how I typically organize my tests:

func StackSpec(c gospec.Context) {
    stack := NewStack()

    c.Specify("An empty stack", func() { // Given

        c.Specify("is empty", func() { // Then
            c.Expect(stack.Empty(), IsTrue)
        })
        c.Specify("After a push, the stack is no longer empty", func() { // When, Then
            stack.Push("foo")
            c.Expect(stack.Empty(), IsFalse)
        })
    })

    c.Specify("When objects have been pushed onto a stack", func() { // Given, (When)
        stack.Push("one")
        stack.Push("two")

        c.Specify("the object pushed last is popped first", func() { // (When), Then
            x := stack.Pop()
            c.Expect(x, Equals, "two")
        })
        c.Specify("the object pushed first is popped last", func() { // (When), Then
            stack.Pop()
            x := stack.Pop()
            c.Expect(x, Equals, "one")
        })
        c.Specify("After popping all objects, the stack is empty", func() { // When, Then
            stack.Pop()
            stack.Pop()
            c.Expect(stack.Empty(), IsTrue)
        })
    })
}

(Note that GoSpec isolates the child specs from their siblings, so that they can safely mutate common variables. This was one of the design principles for GoSpec which enables it to be used the way that I prefer writing specification-style tests. The other important ones are: allow unlimitedly nested tests, and do not force the test names to begin or end with some predefined word.)

Each test has typically three parts: Arrange, Act, Assert. In BDD vocabulary they are often identified by the words Given, When, Then.

I've found it useful to arrange the tests so, that the Arrange and Act parts are in the parent fixture, and then have multiple Asserts each in its own test. Organizing the tests like this follows the spirit of the One Assertion Per Test principle (more precisely, one concept per test). When each test tests only one behaviour, it makes the reason for a test failure obvious. When a test fails, you will know exactly what is wrong (it isolates the reason for failure) and you will know whether the behaviour specified by the test is still needed, or whether it is obsolete and the test should be removed.

Quite often I use the words Given, When and Then in the test names, because they are part of BDD's ubiquitous language. But I always put more emphasis on making the tests readable and choosing the best possible words. So when it is obvious from the sentence, I may choose to

  • omit the Given/When/Then keywords,
  • group the Given and When parts together,
  • group the When and Then parts together, or even
  • group all three parts together.

In the above stack example, I have marked with comments that which of the specs is technically a Given, When or Then. As you can see, there is a distinct structure, but also much flexibility. The "should" word I dropped long time ago, after my second TDD project, because it was just adding noise to the test names without adding value. The value is in focusing on the behaviour, not in using some predefined words.

The specification should be decoupled from the implementation

Specification-style tests focus on the desired behaviour, or feature, at the problem domain's level of abstraction, and try to be as decoupled from the implementation as possible. The tests should not contain any implementation details (for example method names and parameters), because those implementation details are what will be designed after the test's name has been written. If the test's name already fixes the use of some implementation details (for example whether a method accepts null parameters), then refactoring the code will be harder, because it will force us to update the tests. Coupling tests to the implementation leads to implementation-style tests.

When the tests focus on the desired behaviour, then when refactoring, you won't need to change the name of the test, but only the body of the test (when refactoring affects the implementation's public interface). If you're doing a rewrite, then you may even be able to reuse the old test names - which is helpful, because thinking of the test name is what usually takes the most time in writing a test, because that is when you think about what the system should do. (If choosing the name does not take the most time, then you're not thinking about it enough, or you're writing too complex test code, which is a test smell that the production code is too complex.)

For example have a look at SequentialDatabaseAccessSpec and ConcurrentDatabaseAccessSpec. These are tests which I wrote 1½ years ago and in the near future the subsystem that those tests specify will be rewritten, as the application's architecture will be changed from being based on shared-state to message-passing and also the programming language will be mostly changed from Java to Scala. Here are the names of those tests:

SequentialDatabaseAccessSpec:

When database connection is opened
  - the connection is open
  - only one connection exists per transaction
  - connection can not be used after prepare
  - connection can not be used after commit
  - connection can not be used after rollback
  - connection can not be used after prepare and rollback

When entry does not exist
  - it does not exist
  - it has an empty value

When entry is created
  - the entry exists
  - its value can be read

When entry is updated
  - its latest value can be read

When entry is deleted
  - it does not exist anymore
  - it has an empty value


ConcurrentDatabaseAccessSpec:

When entry is created in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

When entry is updated in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

When entry is deleted in a transaction
  - other transactions can not see it
  - after commit new transactions can see it
  - after commit old transactions still can not see it
  - on rollback the modifications are discarded
  - on prepare and rollback the locks are released

If two transactions create an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed

If two transactions update an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed
  - the key may be updated in a later transaction

If two transactions delete an entry with the same key
  - only the first to prepare will succeed
  - only the first to prepare and commit will succeed

When the above components are rewritten using a new architecture, new language and different programming paradigm, most of those test names will stay the same, because they are based on the problem domain of transactional database access, and not any implementation details such as the architecture, programming language, or *gasp* individual classes and methods.

In the above tests there will be only minor changes:

  • The first fixture of SequentialDatabaseAccessSpec may be removed, or moved to some different test, because in the new architecture opening a database connection will be quite different (implicit instead of explicit). Actually it should have been put into its own test class, named DatabaseConnectionSpec, already when it was written, because it is very much different from the focus of the rest of SequentialDatabaseAccessSpec.
  • In ConcurrentDatabaseAccessSpec, the test saying "on prepare and rollback the locks are released" will be removed, because the new architecture will not need any locks. The use of locks is an implementation detail and these specs were not fully decoupled from it.

What is a "unit test"?

The above example also raises the question about the size of a "unit" in a unit test. For me "a unit" is always "a behaviour". It never is "a class" or "a method" or similar implementation detail. Although following the Single Responsibility Principle often leads to one class dealing with one behaviour, that is a side-effect of following SRP and not something that would affect the way the tests are structured.

For those two test classes in the above example, the number of production classes being exercised by the tests is about 15 concrete classes (excluding JDK classes) from 2 subsystems (transactions and database). The lower-level components of those 15 classes have been tested individually (the transaction subsystem has its own tests, as well as do the couple of data structures which were used as stepping stones for the in-memory database) because the higher-level tests will not cover the lower-level components thoroughly, and I anyways wrote those tests to drive the design of the lower-level components with TDD. So the new code produced by those two test classes is about 5 production classes (originally it was about 3 production classes, but they were split to follow SRP).

From TDD's point of view, it's very important to be able to run all tests quickly, in a couple of seconds (more than 10-20 seconds will make TDD painful). On my machine SequentialDatabaseAccessSpec takes about 15 ms to execute (1.2 ms/test) and ConcurrentDatabaseAccessSpec about 120 ms (5.5 ms/test). I prefer tests which execute in 1 ms or less. If it takes much longer, then I'll try to decouple the system so that I can test it in smaller parts using test doubles. So to me the "unit" in a "unit test" is one behaviour, with the added restriction that its tests can be executed quickly.

More on specification-style

To learn more about how to write tests in specification-style, do the tutorial at http://github.com/orfjackal/tdd-tetris-tutorial and also have a look at its reference implementation and tests.

Update 2010-03-17: I just started reading Growing Object-Oriented Software, Guided by Tests and I'm happy to notice that also the authors of that book prefer specification-style. In chapter 21 "Test Readability", page 249, under the subheading "Test Names Describe Features", they have the following tests names:

ListTests:

- holds items in the order they were added
- can hold multiple references to the same item
- throws an exception when removing an item it doesn't hold

If you notice more books which use specification-style, please leave a comment.

Update 2015-11-01: Also the excellent presentation What We Talk About When We Talk About Unit Testing by Kevlin Henney recommends specification-style.

Tests as examples of system usage

Example-style is perhaps the most popular among TDD'ers, maybe because many books, tutorials and proponents of TDD use this style. It's also quite easy to name tests with example-style, while at the same time being much better than implementation-style. I use this style maybe 10% of the time, usually to cover a corner case for which I have too hard a time to give a name using specification-style, or the specification-style name would be too verbose without added value as documentation. For some situations one style fits better than the other.

In example-style the tests are considered to be examples of system usage, or scenarios of using the system. The test names tell what is the scenario, and you will need to read the body of the test to find out how the system will behave in that scenario. The test names are not a direct specification, but instead to arrive at the specification, you will need to read the tests and reverse-engineer and generalize the behaviour that is happening on those tests.

A famous example which is written in example-style is Uncle Bob's Bowling Game Kata. There the test names are:

testGutterGame
testAllOnes
testOneSpare
testOneStrike
testPerfectGame

Now that you have read the test names, can you tell me the scoring rules of bowling? You can't? Exactly! That is what sets example-style apart from specification-style. In example-style you would need to reverse-engineer the scoring rules of bowling from the test code. In specification-style the test names would tell the scoring rules directly.

My domain knowledge about bowling is not good enough for me to write good specification-style tests for it, but it might look something like below. I took the scoring rules from the Bowling Game Kata's page 2 and reworded them some.

The game has 10 frames

In each frame the player has 2 opportunities (rolls) to knock down 10 pins

When the player fails to knock down some pins
  - the score is the number of pins knocked down

When the player knocks down all pins in two tries
  - he gets spare bonus: the value of the next roll

When the player knocks down all pins on his first try
  - he gets strike bonus: the value of the next two rolls

When the player does a spare or strike in the 10th frame
  - he may roll an extra ball to complete the frame

Here is another example of example-style, this time from JUnit's AssertionTest class.

arraysExpectedNullMessage
arraysActualNullMessage
arraysDifferentLengthMessage
arraysDifferAtElement0nullMessage
arraysDifferAtElement1nullMessage
arraysDifferAtElement0withMessage
arraysDifferAtElement1withMessage
multiDimensionalArraysAreEqual
multiDimensionalIntArraysAreEqual
oneDimensionalPrimitiveArraysAreEqual
oneDimensionalDoubleArraysAreNotEqual
...

This shows well a situation where example-style is useful: corner cases. In English it would be possible to describe the behaviour specified by AssertionTest with one sentence. Even though there are lots of corner cases, they all are semantically very similar. Writing these tests in specification-style would be impractically verbose. Here are the specification-style tests for a generic assertion:

When the expected and actual value are equal
  - the assertion passes and does nothing

When the expected and actual value differ
  - the assertion fails and throws an exception
  - the exception has the actual value
  - the exception has the expected value
  - the exception has an optional user-defined message

Repeating that specification for every corner case is not practical, because it would just be more verbose but without any added documentation value. That's why in this case it would make more sense to write one use case with specification-style and the rest of the use cases in example-style (this is how I did it with GoSpec's matchers). Or since this particular problem domain is quite simple, just leave out the specifications and use only example-style.

Tests reflecting the implementation of the system

Implementation-style is typical in test-last codebases and with people new to TDD who are still thinking about the implementation before the test. I never use this style. It was only in my very first TDD project that I wrote also implementation-style tests (for example it had tests for setters), but at least I knew about BDD and was aware of my shortcomings and tried to aim for specification-style. (It took about one year and seven projects to fine-tune my style of writing tests, after which I wrote tdd-tetris-tutorial.)

In implementation-style the tests are considered to be verifying the implementation - i.e. the tests are considered to be just tests. There is a direction relation from the implementation classes and methods to the test cases. By reading the test names you will be able to guess that what methods a class has.

Typically the test names start with the name of the method being tested. Since nearly always more than one test case is needed to cover a method, people tend to append the method parameters to the test name, or append a sequential number, or *gasp* put all test cases into one test method.

As an example of implementation-style, here are some of the test cases from Project Darkstar's TestDataServiceImpl class. I know for sure that Darkstar has been written test-last, in addition to which mosts of its tests are integration tests (it takes 20-30 minutes to run them all, which makes it painful for me to make my changes with TDD).

testConstructorNullArgs
testConstructorNoAppName
testConstructorBadDebugCheckInterval
testConstructorNoDirectory
...
testGetName
testGetBindingNullArgs
testGetBindingEmptyName
testGetBindingNotFound
testGetBindingObjectNotFound
testGetBindingAborting
testGetBindingAborted
testGetBindingBeforeCompletion
testGetBindingPreparing
testGetBindingCommitting
testGetBindingCommitted
...

From the above test names it's possible to guess that there is a class DataServiceImpl which has a constructor which takes as parameters at least an app name, a debug check interval and some directory. It's not clear which are the valid values for them and whether null arguments are allowed or not. Also we can guess that the DataServiceImpl class has methods getName and getBinding, the latter which probably takes a name as parameter. With getBinding it's possible that "something is not found" or "an object is not found". The getBinding method's behaviour also appears to depend on the state of the current transaction. It's not clear how it should behave in any of those states.

Implementation-style is bad compared to example-style and specification-style, because implementation-style is not useful as documentation - it does not tell how the system should behave or how to use it - which in turn makes it hard to know what to do when a test fails. Also implementation-style couples the tests to the implementation, which makes it hard to refactor the code; if you rename a method, you need to also rename the tests. If you do big structural refactorings, you must rewrite the tests. And when you rewrite the tests, the old tests are of little benefit in knowing which new tests to write.

Summary

Specification-style test names describe how the system will behave in different situations. By reading the test names it will be possible to implement the system. When a test fails, the test name will tell which behaviour is specified by that test, after which it's possible to decide whether that test is still needed. The test names use the problem domain's vocabulary and do not depend on implementation details.

Example-style test names describe which special cases or scenarios the system should handle. You will need to read the body of the test to find out how the system should behave in those situations.

Implementation-style test names tell what methods and classes the system has. It will be very hard to find out from the tests that which situations the system should handle and how it should behave in those situations. Refactorings require you to change also the tests.

8 comments:

  1. Interesting post, thanks.

    I am not sure how to classify the approach advocated by Roy Osherove in 'The Art of Unit Testing'. He recommends that we use test method names of the following form:

    [method-name]_[state-under-test]_[expected-behaviour]

    Where the items in square brackets are defined as follows:

    [method-name]: the name of the method you are testing

    [state-under-test]: the condition used to produce the expected behaviour

    [expected-behaviour]: What you expect the tested method to do under the specified conditions

    e.g.: isValidFileName_validFile_returnTrue, factorial_three_returnsSix

    The test method names we get if we apply the guideline match your criteria for implementation-style tests, but Roy Osherove says the following about the guideline:

    Removing even one of these parts from a test name can cause the reader of a test to wonder what is going on, and to start reading the test code. Our main goal is to release the next developer from the burden of resading the test code in order to understand what the test is testing. If the developer sticks to this naming convention, it will be easy for the other developers to jump in and understand the tests.

    ReplyDelete
  2. That seems like implementation-style coupled with method-level documentation. It might work in simple situations, but in more complex situations where the expected behaviour is produced by multiple methods and objects, it will be hard to fill in the method name part of that form of names. For example, how would you use that style to name the tests in the following two test classes?

    http://github.com/orfjackal/specsy/blob/7f79b7c0c131516077de/src/test/scala/net/orfjackal/specsy/CapturingOutputTest.scala
    http://github.com/orfjackal/specsy/blob/7f79b7c0c131516077de/src/test/scala/net/orfjackal/specsy/ExecutionModelTest.scala

    ReplyDelete
  3. One could of course argue that those two test classes which I mentioned are testing nearly the whole system, so according to somebody's terminology they are not "unit tests". Although according to the Growing Object-Oriented Software book's terminology they are not integration tests, because they don't use any third party code (except the standard Java library), and neither are they end-to-end tests (those tests don't exercise the full application). Acceptance test might be right word.

    Here is an example which should qualify as "unit test" by everybody, because the system under test is only one class. Is it possible to write equally informative test names for the below class using Roy Osherove's style?

    http://github.com/orfjackal/specsy/blob/7f79b7c0c131516077de/src/main/scala/net/orfjackal/specsy/core/OutputCapturer.scala
    http://github.com/orfjackal/specsy/blob/7f79b7c0c131516077de/src/test/scala/net/orfjackal/specsy/core/OutputCapturerTest.scala

    ReplyDelete
  4. Great article.

    After doing TDD for several years, I've internalized many things. This article restores my ability to explain one reason why test-last is inferior.

    I'm also newly introduced to BDD and ATDD. This article helps me understand the substantive difference between my familiar example-style and the specification-style. The structural difference is obvious, but now I'm starting to grok what all the BDD fuss is about.

    ReplyDelete
  5. I really enjoyed this article. Very well written. Thanks for putting it together.

    ReplyDelete
  6. Thanks for this great post. Learned a lot from it.
    Olaf

    ReplyDelete
  7. Really liked this post, well explained.
    I really share the thoughts on this subjects.

    ReplyDelete
  8. Hi Esko!

    This was a very well put post. It clearly motivated the reason naming consideration is important also for tests. This will definetley go down as a reference bookmark.

    ReplyDelete