Interaction-based testing and stable expectations

Yesterday I participated in a discussion about how to test the following class (it was actually an interview question). Here’s the class under test:

public class ClassUnderTest
{
    private SomeService service;

    public ClassUnderTest(SomeService service)
    {
        this.service = service;
    }

    public void Foo(int fooInput)
    {
        if (fooInput > 0)
            this.service.DoSomething();
    }
}

What SomeService does is not relevant. Just something. As you can see, the ClassUnderTest is not very cooperative for the purpose of testing its functionality: it does not expose any properties, and the only method it contains does not return anything. How can we test it?

It looks like the only validation option that is left is to check that “something” is done, and the only way to check it is through interaction testing. Here is a couple of tests that achieve this (I used “method-condition-outcome” naming convention, but the names could be rephrased in BDD style):

[Test]
public void Foo_PositiveInput_SomethingIsDone()
{
    // Arrange
    var service = MockRepository.GenerateMock<SomeService>();
    var cut = new ClassUnderTest(service);

    // Act
    cut.Foo(1);

    // Assert
    service.AssertWasCalled(s => s.DoSomething());
}

[Test]
public void Foo_NonPositiveInput_SomethingIsNotDone()
{
    // Arrange
    var service = MockRepository.GenerateMock<SomeService>();
    var cut = new ClassUnderTest(service);

    // Act
    cut.Foo(0);

    // Assert
    service.AssertWasNotCalled(s => s.DoSomething());
}

I expressed during the discussion a lack of my enthusiasm about this solution noting that I tend to use state-based testing if possible. But is this possible in the example above? And what is a disadvantage of the interaction-based tests in this case?

Let’s have a closer look. First, the above tests essentially duplicate internal logic of the Foo implementation. If we merged two tests into one, then the logic of the assert part of the code would reflect pretty much what’s inside the implementation of Foo: “for positive input service.DoSomething should be called, otherwise not”.

Since the logic of the tests follow the logic of the code and was probably written by the same developer in the same time, there is very little in tests that validates the actual functionality. If I believe that Foo should call DoSomething for positive input, I will most likely get it right in the test that validates that Foo calls DoSomething for positive input. But what if my assumption is wrong? What if DoSomething should be called for input bigger than 100, and for other input values Foo should call DoSomethingElse? No, these tests can’t validate it. I’d say they can’t validate it by design because they are designed to reflect the internal logic of the method they validate. So the correctness of Foo implementation must be verified using other means, and these tests will ensure the Foo implementation stays correct assuming the original algorithm is right.

And this becomes the main value of such tests: they serve as regression tests protecting correct code from being improperly changed (perhaps by another developer who had to extend the original code and didn’t get it right). However, this sort of protection is not refactoring friendly. Let’s see why.

Imagine that SomeService class is refactored, DoSomething is still there but it’s semantics has changed and no longer fits the implementation of Foo. Instead, Foo is supposed to call a new method DoSomethingElse:

public void Foo(int fooInput)
{
    if (fooInput > 0)
        this.service.DoSomethingElse();
}

The system with the new change works properly, but one of the test for ClassUnderTest now fails! Of course, it has no knowledge about what is right or wrong semantically, it only cares about a certain method to be called. Depending on a system complexity, searching for the failure problem can take some time, but even if it’s an easy fix, we have a test that fails for a wrong reason.

In the article by Steve Freeman et al “Mock Roles, No Objects” they warn about mirroring target code logic in the tests: “Some uses of Mock Objects set up behaviour that shadows the target code exactly, which makes the tests brittle. This isparticularly common in tests that mock third-party libraries. The problem here is that the mock objects are not being used to drive the design, but to work with someone else’s. At some level, mock objects should shadow a scenario for the target code, but only because the design of that code should be driven by the test. Complex mock setup for a test is actually a hint that there is a missing object in the design.”

So the interaction-based tests can be more brittle, as we saw in our example. But what are the choices for that example? If we don’t verify calling DoSomething for positive Foo input, then what else can we verify?

To try to answer this question, let’s first figure out what kind of testing we are dealing with: unit or integation. For black box unit testing I am afraid there is not much to validate. No properties are exposed, the only method is void. If you are a code coverage junkie, you can maybe write a couple of tests to validate that the class can be instantiated and used without raising unexpected exceptions, but the value of such work is outside the scope of this blog post. Let’s focus instead on integration testing. How can we validate that the class does what it should?

As soon as we leave the idealistic constraints of black box unit testing and focus on overall functionality validation, we no longer need to express the validation goal in terms of ClassUnderTest methods, not to say about their internal implementation. We focus on the outcome of Foo in terms of what it does to the rest of the system. Does it change data in a database, send email, draws a graph on a screen?

In most cases business logic operations result in some (persistent or in-memory) state change. Then we can strive to catch these changes and use them in the test assertsions. This will help increasing test trustworthiness and tolerance to refactoring. However, determining state changes that correspond to certain action is not always easy, and even when it is, it may result in higher test complexity – you will have to insert some additional code that retrieves the state that is expected to change. And apart from this, there are still scenarios when the result of a certain action is not easily converted to the state available for validation in a test code. Nat Pryce in his old blog post gave an example of such scenario: tests for a graphical simulator. Really, how would you verify that a cell was drawn on a display using state based testing? Other scenarios that may fall in this category include calling external services, sending messages etc. They may (and usually do) leave the traces that can be used for state based testing, but tests will become complex and overloaded with state retrieval details. Clearly, interaction based tests may have their place here.

So what can we test with interaction-based test? Mark Seemann in his forthcoming book “Dependency Injection in .NET” classify dependencies as either stable or volatile. According to him, stable dependencies “are already there, tend to be very backwards compatible and invoking them have deterministic outcomes”. An example of stable dependency is .NET BCL. “Other examples may include specialized libraries that encapsulate algorithms relevant to your application. If you are developing an application that deals with chemistry, you may reference a third-party library that contains chemistry-specific functionality.” Volatile dependencies, on the other hand, do not provide a sufficient foundation for applications.

I believe separating things on stable and volatile is also helpful when deciding what to test using interaction-based testing, we only need to replace word “dependency” to “expectation”. We can use interaction-based testing as long as our expectations are stable, otherwise test becomes brittle. So coming back to original tests for Foo, we should ask ourselves a question: is DoSomething a stable expectation for a given scenario, or it is only valid for a current implementation and may not survive refactoring? If so, the expectation is volatile. Then we should inspect the call graph, find a stable expectation and use it in interaction-based tests. Examples of stable expectations are calls to external services, external API, output to display. As long as the feature is unchanged, the system will be making same outbound calls and draw same pictures on a screen. These activities are stable and can become foundations for interaction-based tests.

So how the original Foo test should look if we find that DoSomething internally sends an email? It can look like this:

[Test]
public void Foo_Should_Send_Notification_Mail_On_Positive_Input()
{
    // Arrange
    var mailServer = MockRepository.GenerateMock<MailServer>();
    var cut = new ClassUnderTest(new SomeService);

    // Act
    cut.Foo(1);

    // Assert
    mailServer.AssertWasCalled(s => s.SendMail());
}

Note that we no longer care about Foo exact implementation. It’s implementation is volatile. We only expect something that is stable in the scope of the given feature. Like sending mail (more accurate test code could verify that the mail was sent to a right person with a right subject, but we should be careful about not overspecifying the expectations, otherwise they will be no longer stable).

When I was about to finish this post, I found another old blog post, by Jeremy D. Miller, where he also uses email sending as an example of an operation that it worth testing using interaction-based tests: “We could use a state-based strategy to test the email. We could run the test, then run around and ask the expected recipients to check their inbox. […] You could also check an audit trail, but that’s not the real functionality being tested. The easiest approach in the case of the email tests is to use interaction testing to verify that the email was sent to the SMTP service.”

So I believe interaction-based testing can be an efficient way to validate that operations result in proper actions, as long as tests don’t focus on volatile interactions and instead select stable expectations related to the feauture itself rather than it’s implementation details.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s