Wednesday, April 22, 2009

Test Code Is Just Code

Test Code is Just Code
Some Anti-Patterns to avoid and some techniques for making sure Test Code doesn't slow you down

Maintaining Test Code can become Costly
One of the key objections to Test Driven Development is that the tests will become a barrier to change; that the maintenance and reshaping of the tests needed as new business requirements come into play will start to become prohibitively expensive. In fact there is evidence that this indeed happens, over time many Agile teams spend more and more time fixing tests that have been broken by new requirements as opposed to working on those new requirements. I've seen too many teams in exactly this situation and it usually due to difficult to understand test code that is hard and costly to maintain.

We already know how to keep code in good shape
This problem has, however, been solved and many high performing teams are able to introduce new functionality or make rapid changes to existing code without needing to start deleting tests or abandon the XP principle of TDD (Test Driven Development). The key is that they behave towards test code as they do any other code, Test Code is Just Code. All the things we have learnt about keeping production code in shape through high discipline, techniques like refactoring, and the use of patterns all apply to test code. And yes, sometimes this even means writing tests to make sure our test code behaves as expected.

Common Traps
I want to say more on why people seem to treat test code differently later but first I want to describe some of the common traps that I've seen people fall into and some ways of avoiding them. Here are five I see again and again, they can cause a lot of pain and I've met developers who have been put off TDD by falling into them.

Cut & Paste
The first one is cut & paste, for some reason when it comes to unit tests people suddenly start cutting and pasting all over the place. Suddenly you find a file with 20 tests each of which repeats exactly the same few lines of code. I don't think I need to describe why this is bad and I expect we've all seen the outcome: at some point later those tests all start breaking at the same time, if we are unlucky a few tweaks have happened to the cut & pasted code in each so we spend a lot of effort figuring out how to make each one pass again. There is no rule that says we can't have methods in test fixtures so ExtractMethod still applies, and using SetUp sensibly often helps. The same rational for avoiding cut & paste and the same solutions we know from production code apply to test code.

Poor Encapsulation
The classic example of a failure of encapsulation in test code is when you start to see database specific code creeping into every test, perhaps you have "Naked SQL" in your tests? I'm going to focus on the database here but the same ideas apply to other areas as well, for instance security, auditing and messaging.

The pain this trap causes often shows itself at the worse time when you are optimising the database or trying sort out a referential integrity issues, suddenly the tests become a millstone around your neck and start breaking on a very frequent basis — because of Cut & Paste this can often be a very large number of tests. I've seen teams avoid addressing things like foreign keys in the database because it has become too difficult to get the test code to create and delete things in an appropriate order.

The solution is to introduce proper encapsulation around the test code that looks after the database. The builder and fluent interface patterns are very useful here, if we can hide the implementation and separate concerns we can also often open to door to more advanced testing techniques. Of course you'll want to make sure that builder code is properly tested, that it really puts things into the database in the order you expect and deletes them properly as well, but that should be work you only need to do once.

Bloated SetUp
The next example is Bloated Setup, hopefully the name is self evident, you end up with a SetUp method in your test fixture that becomes huge. As with any bloated method it becomes code that is hard to understand and error prone. Perhaps changing the order of calls in SetUp fixes one test but causes 5 others to fail and the reasons are just not obvious? If you've seen this then you are probably suffering from bloated SetUp.

This pain point has two main causes, the first is poor coding practice such as described above, and the second is the structure of the production code itself. It is often dependencies within the production code that force us to do too much work in SetUp, we have to create all the things each class under test requires and in turn their dependencies as well. The pain grows as SetUp code then gets Cut & Pasted into other test fixtures.

As opposed to the first two examples the solution here lies not just into our approach to the test code, we also need to look at the way the production code is structured. The first step is to make sure we use a well understood pattern such as DI (Dependency Injection) to express and understand the dependencies in the production code. The second step is to make use of stubbing or mocking techniques in our test code so that we can isolate the class under test and exercise it without needing to instantiate the whole tree of dependencies. A lot has been written on mocking and DI already, some of the best articles are here & here. Mocking has its own traps not least of which is only having the real wire up of all the components done for the first time in production (more later) but used well it is an invaluable technique for keeping test code easy to maintain and understand.

Too hard to write the Test
The previous pain point described a situation where the solution lay in looking at both our test code and the production code, this one is about testing pain that comes entirely from the production code. Finding a test hard to write can tell us a lot about the production code and a pain point people often refer to is "it's just too hard to write a test for that". My view on this is that code that is hard to test is poorly written code. To write a new unit test, and hence make a change to make it pass, the code we are testing needs to be easy to read and understand. If we can't work out which class to write the test against that means we have poor separation of concerns or a lack of coherence in our production code. If at first you can't write the new test then refactor the production code until you can, it's the production code that is at fault and not the TDD technique.

Code Integration Test Pain
The last example I want to talk about is integration testing. As this is such an overloaded term I first need to describe by what I mean by code level integration testing. In essence it is making sure that when we wire up all the components in our solution they behave as expected together. If we've used patterns like DI, a DI framework such as Spring and a test technique such as Mocking then we need to make sure we are fully testing the wired up solution as well, unfortunately this can get forgotten. Often we end up with a special test wire-up class or a Bloated SetUp doing the wire-up job, either way there is a bug waiting to happen if the production wire-up code never gets called until actual deployment.

In essence SetUp for our code integration tests ought to be calling the production wire-up code. I often hear that this is impossible because the production wire-up code is hard wired to a specific environment. If that is the case the team is probably heading for trouble, too many teams don't spend enough time thinking about how to handle configuration for multiple environments such as Dev, QA and Production. If you can solve that problem first then exercising the full production wire-up code should not be an issue, just point it at the appropriate environmental configuration.

Why do people treat test code differently?
Test Code is only throw away code if the production code is throw away.

I think people often think of test code as throw away, that once in production it becomes superfluous and so there is little use devoting time to keeping it in good shape. When people say this to me I like to ask if they plan on ever changing the production code or doing another release, the answer is always yes. Test Code is an intrinsic part of the solution, like the launch tower for a rocket it provides an essential stepping stone in the solution and one that it's worth expending time and effort to keep in good shape.

There are very few engineering disciplines where regular monitoring and testing do not form part of ongoing maintenance activities, in fact I can't think of a single one. We are lucky with software because if we look after our tests they are a very low cost solution to making sure things can be evolved with minimal risk. We ought to treat testing as an intrinsic part of software engineering instead of a one off 'gate keeper' activity or a problem for the QA team to solve . Test Code is as much a part of a modern software solutions as the production code itself.

Wednesday, February 25, 2009

Words of wisdom from Ron Jeffries

Via Patrick Logan

Think you are doing Scrum and/or XP? Then read this.

My take away is that yet again people want to latch on to a silver bullet instead of working out what is really wrong with their organisation.

Acceptance Testing Silverlight with white

I've used White to acceptance test WPF applications a few times so was pleased to discover it just works with Silverlight. It seems that IE passes the usability api calls into the contained Silverlight app so to an outside caller the app just looks like extra controls within IE.

There were a few things I had to make sure were in place though

1. You need to make sure you start IE in a way White can 'find', so if you just pass the URL as the executable name IE (or your default browser) will start but White will not then be able to find that application and window. Obviously you'll need to alter the values to match your install and set up.

const string internetExplorer = @"C:\Program Files\Internet Explorer\IEXPLORE.EXE";
const string url = "http://localhost:1138/";
const string windowTitle = "silverlightApp - Microsoft Internet Explorer";

var ie = new ProcessStartInfo(internetExplorer, url);
var application = Core.Application.Launch(ie);
var window = application.GetWindow(windowTitle, Core.Factory.InitializeOption.NoCache);

2. Make sure you set the automation name property.


<TextBox Name="textBox1" Text="some text" AutomationProperties.Name="textbox"/>


3. Use the same Name when you try to find the control.

var textBox = window.Get("textbox");

That is it. You'll need to make sure the names you use don't clash with the names of the IE controls, I'd suggest using GUI Spy for that but I've not been able to find a working link to download that tool for about 6 months now.

I'm using IE version 6.0 and Silverlight 2. I've not tried this with any other browsers.

Tuesday, February 17, 2009

The Tyranny of corporate licences

I often hear enterprise architects say "we can use product X as it is free", as I know X is actually quite pricey I ask "wow, XYZ is giving you X for free?" To which they reply "No, we have a corporate license and that comes out of someone elses budget"

So this is a special kind of free which I'd not come across before, one that actually means "free, as in we don't care about the cost to the company as a whole". This sort of individualism is revealed as doubly corrosive when you talk to the developers who find product X is way more complex than they need and actually increases development costs by requiring specialist skills and damaging developer productivity.

This disconnect between people who actually have to use software and those who make the purchasing decisions is not at all unusual. The fact there is a corporate licenses leads to some nasty behaviors as well, it is argued that the software is "free" when someone raises cost as an objection and it is argued as "highly expensive" when someone questions how much value it really adds (i.e. an erroneous equation of expensive and high ROI).

The other things that gets thrown into the mix is standards - I just ask people go and look if the standards are being driven and defined by companies that make big complex enterprise software products. They can then draw their conclusion on how much value those particular standards have and whether they truly help avoid lock in.

All of these things allow an easy defense for those purchasers who failed to exercise due diligence in their decision and create a a situation that is almost impossible for the people who have to use those products day-to-day to challenge. I suggest projects that use a license pay a proportion of its costs and any loss (or gain) the developers see is carefully tracked.

Before you comment you may want to check that your friendly enterprise software vendor hasn't placed you under a gagging order, no really, I'm not joking.

Tuesday, February 03, 2009

problems with ./configure and make on osx

I've been trying out various AMQP implementations, most still require a build from source on OSX.

So the usual

./configure

make


I'm sure others have seen this error before

/bin/sh: line 0: cd: someDirNameHere: No such file or directory
More of a reminder for me than anything else, the solution is usually

export CONFIG_SHELL=/bin/bash
./configure
make



Monday, January 05, 2009

Five kinds of technical debt

Martin has described a concept called Technical Debt and this seems to have entered the vocabulary of our projects. It is a useful term but I worry it is being abused somewhat.

It seems that the term is being used to described a multitude of evils. I've found it very useful to try and categorise the different kinds of debt. In fact I'd go further: I think it is dangerous not to understand the type of Technical Debt we are dealing with and, even more importantly, what the root cause of the Debt was (or continues to be).

The other thing I've seen with technical debt is that can be transparent or opaque. Transparent debt is 'on the books' so to speak, we incur it knowingly and the team and the client are part of the decision. Opaque debt is off the books, either developers don't realise they are incurring debt or the team decides, for whatever reason, to hide the debt from the client.

Anyway here are five different kinds of "Technical Debt", I'm sure we won't all agree with this list and it certainly isn't meant to be exhaustive but I do think there is merit in better categorising and understanding these issues:

Ignore
Skip doing something we know is really needed. So take a quicker path ignoring, say, the need to build something horizontally scalable. Of course there are sensible ways to do this, such as splitting a story into two parts and playing another part later. Unfortunately this doesn't always happen, instead developers sometimes take a unilateral decision to ignore a requirement and hence incur debt that is not visible to the team as a whole (at least not until later). I've heard people use 'yagni' to justify this behaviour.

Get Surprised
Your client comes up with some new requirement that means a change in direction we cannot easily accommodate with existing code. This often shows itself as a 'code base of two halves': pre and post change of direction. We may live with two ways of doing things in the code base and this might never become debt at all, or at least not until some cross cutting requirement forces the issue.

Disregard some data
A classic example of this is build time, or more specifically the time it takes to run all the unit tests. Projects teams often see this time starting to creep up but it gets ignored until such time as it starts to really impact productivity. They are slowly incurring more debt and as is the nature of debt the longer you wait to repay it the tougher it gets.

Repeat a mistake
We incur Technical Debt, often without even knowing it, when we repeat previous mistakes. The one I see the most is Hibernate hiding from the developers that there is a database is involved, to the extent they forget about the cost of a database call. I think this one could also be called 'ignore advice'. Either way we storing up trouble.

Guess
This usually happens in relationship to a non-functional requirement, such as performance, and again it is largely hidden incurrence of debt. Basically we just guess that something will work as expected and don't bother to check , we just assume it will be ok. It is easy to avoid guessing either by using a spike or just a 'back of the envelope' calculation.

And finally...

Phantom Debt
This is 'technical debt' as invoked by a developer who has decided they don't like part of the code base and hence want to rewrite it. Technical Debt certainly sounds better than 'egotistical refactoring session' ;-)

(Many Thanks to all my colleagues for the internal discussion and feedback I got for the first version of these ideas, no doubt we will continue to argue about this)

Friday, August 08, 2008

Don't delete unit tests even if you have acceptance test coverage

I think it is well understood that the sooner you find a defect the less it will cost. A defect costs the most once into production whether that be a car that needs to be recalled or a software bug that stops your companies orders being processed.

If you use TDD and CI then you get to find some defects at the development stage where it is clearly a lot cheaper to fix them than finding those same bugs in QA, UAT or worse still in production.

What is really odd then is that people delete unit tests and while there can be legitimate reasons such as code no longer being used this is not why most tests get deleted. Some bogus reasons for deleting tests are

a) "They keep failing when I change things" - Well that's because you just broke some existing functionality! And guess what: it's a lot cheaper to fix that defect now at development time than once it gets into production. Of course maybe someone else has to deal with production bugs so you could just be selfish and ignore the cost to them and your company and go ahead and delete that failing test.

b) "They are very hard to maintain" - This is more insidious and is a symptom of something that actually happens quite a lot: developers forget that tests are code too. So for some reason once developers are working on unit testing all those things we know are bad practice are allowed. Unsurprisingly the end result is poorly factored code that is hard to maintain. So should you delete those tests? No! You should do the same as you would when you find hard to maintain code anywhere else in the codebase (that is slowing you down), after all test code is still just code.

c) "I have coverage of that code from acceptance tests" - So what if you have coverage of that code from a higher level acceptance test? Is it ok to delete the unit test then? I think the answer must still be no and the reason is to do with the cost of finding and fixing a defect. A while ago I was working on a big complex distributed application with great acceptance level tests but relatively poor coverage around the messaging layers. These acceptance tests failed frequently and developers would spend a lot of time trying to track down the problem. We added units tests to flag those issues much earlier and in a far clearer way. So while we were adding test coverage to code that was already tested we did so to lower the cost of finding and fixing defects.

Deleting test code should only be done if it will not increase the cost of finding a defect.

Thursday, July 24, 2008

Patterns for Concurrency over Concurrent Language Features

The Multi-core Problem
I continue to see lots of posts and articles about multicore and rightly so, multicore CPUs present some real challenges for the software industry. I think by now most people get it: by default most software is written in a way that can only make use of one CPU. If I buy a new faster CPU that code will run faster, but if I buy an additional CPU or a multi-core CPU then that code will not run faster. This is because it will use only one of the available CPUs, the rest will sit idle, in fact it might even run slower as the individual cores in a multi-core CPU often run slower than a single core CPU.

At the moment this problem is masked to some degree, if you've a 2 CPU machine you probably run enough pieces of software at the same time such that, along with the operating system, both cores are kept reasonably busy. When we have 8, 16 or 32 cores the problem will be a lot more evident. I'm guessing the issue will really surface when large organizations start to upgrade their desktop machines and find out that instead of running faster, as has always been the case before, the software on those machines actually runs no faster or even slower. Nor a good return on their investment.


Evolution or Revolution
A lot of people are suggesting ways in which we get around this problem, the approaches advocated seem to fit broadly in two categories.

Evolutionary - We need to learn how to use existing threading libraries and features properly from within existing programming languages.

Revolutionary - We need new to start using different programming languages to write our code (i.e. Erlang)

What surprises me is the concentration on programming languages and language level features, I believe both approaches will fail unless we look at the architecture and patterns we use to deliver software.

Evolutionary
Lets look first at the evolutionary approach. Many people are recommending we learn how to use threads, locks, semaphores et al correctly but this is no use if you just try and bolt multithreading onto patterns you would choose for a non-concurrent design. All to often everything is designed and written ignoring the need for concurrency with the hope the bottlenecks can be tackling by adding a 'little bit' of concurrency later.

To give a concrete example consider the observer pattern, all too often this is implemented in a way that means the order that the observers are invoked is fixed. A common approach to 'bolting on' concurrency to a non-concurrent design is to service each observer on a different thread. In fact this often looks likes its working when tested on a single core machine (where the order is deterministic) but soon becomes a source of bugs once it's running on a multi-core machine and the order that the observers are called can change.

Another example is taking work from a queue, why not just have multiple threads consuming from the queue? If you've designed for this you'll be fine, but often that does not happen and it turns out that when the order items are taken from the queue is changed the outcome changes i.e the cancellation of an order overtaking the creation of that order. So the cancel is processed first and discarded (the order does not exist yet!) and then the order creation arrives.....

Concurrency is also something that is hard to compartmentalise and isolate to one part of software, the impact of concurrency tends to be pervasive. So you need to design for multi-core from the beginning and choose an architecture and patterns that create a suitable abstraction around the concurrency required for your problem domain. You may then find someone has already created or documented approaches and patterns that meet your need and you never actually need to start creating your own threads, locks, etc. This is where choosing from existing concurrency libraries comes in.

Revolutionary
So what about the revolutionary approach? I think here it's even clearer we need to change the patterns and architecture we use. I really hope people don't think that by just coding in Erlang, say, that software will magically scale to multiple cores. What's different about Erlang is that it has language features that provide really great support for a particular architectural approach. If you ignore those features and just write sequential code and design the solution as you always have you should not be surprised when it doesn't scale to many cores.

Patterns
So the evolutionary and revolutionary approaches are both valid if we adjust the architectures and the patterns we use to take account of concurrency. If we do that we'll create software that can better scale to multiple cores. If we try and ignore concurrency when making design decisions and instead expect languages or language features to just solve the problem for us we'll be no better off than we are today.

There is a reason we don't normally write assembly code directly, we have much higher level abstractions and languages. Using threads directly is not that dissimilar to assembly language, it directly exposes the underlying hardware level implementation. Therefore we should not be surprised that using threads directly is difficult and often unproductive, we need higher level abstractions for concurrency. Some languages come with features that make using certain patterns for concurrency much easier to implement than others, Erlang is the obvious example. Other languages have libraries that provide implementations of those same patterns, for example Retlang. Other people have written code that scales incredibly well across multiple CPUs in C and C++ as well. It's not about the language, it's about having a design that takes account of concurrency and then choosing patterns that provide suitable abstractions around it.

Conclusion
When it comes to software we have multiple patterns available and we try and choose the best ones for the problem at hand. We need to develop and use similar patterns to help use with concurrency and in fact many exist already. Many of the patterns described by http://www.enterpriseintegrationpatterns.com/ play equally well at the software level as they do the message level, many patterns in EDA (i.e. Event Collaboration Patterns) also lend themselves to concurrent solutions. There are plenty of other examples.

Its here at the level of patterns we should be focusing our efforts around solving the multi-core problem and not at the language level.