Friday, August 08, 2008

Don't delete unit tests even if you have acceptance test coverage

I think it is well understood that the sooner you find a defect the less it will cost. A defect costs the most once into production whether that be a car that needs to be recalled or a software bug that stops your companies orders being processed.

If you use TDD and CI then you get to find some defects at the development stage where it is clearly a lot cheaper to fix them than finding those same bugs in QA, UAT or worse still in production.

What is really odd then is that people delete unit tests and while there can be legitimate reasons such as code no longer being used this is not why most tests get deleted. Some bogus reasons for deleting tests are

a) "They keep failing when I change things" - Well that's because you just broke some existing functionality! And guess what: it's a lot cheaper to fix that defect now at development time than once it gets into production. Of course maybe someone else has to deal with production bugs so you could just be selfish and ignore the cost to them and your company and go ahead and delete that failing test.

b) "They are very hard to maintain" - This is more insidious and is a symptom of something that actually happens quite a lot: developers forget that tests are code too. So for some reason once developers are working on unit testing all those things we know are bad practice are allowed. Unsurprisingly the end result is poorly factored code that is hard to maintain. So should you delete those tests? No! You should do the same as you would when you find hard to maintain code anywhere else in the codebase (that is slowing you down), after all test code is still just code.

c) "I have coverage of that code from acceptance tests" - So what if you have coverage of that code from a higher level acceptance test? Is it ok to delete the unit test then? I think the answer must still be no and the reason is to do with the cost of finding and fixing a defect. A while ago I was working on a big complex distributed application with great acceptance level tests but relatively poor coverage around the messaging layers. These acceptance tests failed frequently and developers would spend a lot of time trying to track down the problem. We added units tests to flag those issues much earlier and in a far clearer way. So while we were adding test coverage to code that was already tested we did so to lower the cost of finding and fixing defects.

Deleting test code should only be done if it will not increase the cost of finding a defect.

Thursday, July 24, 2008

Patterns for Concurrency over Concurrent Language Features

The Multi-core Problem
I continue to see lots of posts and articles about multicore and rightly so, multicore CPUs present some real challenges for the software industry. I think by now most people get it: by default most software is written in a way that can only make use of one CPU. If I buy a new faster CPU that code will run faster, but if I buy an additional CPU or a multi-core CPU then that code will not run faster. This is because it will use only one of the available CPUs, the rest will sit idle, in fact it might even run slower as the individual cores in a multi-core CPU often run slower than a single core CPU.

At the moment this problem is masked to some degree, if you've a 2 CPU machine you probably run enough pieces of software at the same time such that, along with the operating system, both cores are kept reasonably busy. When we have 8, 16 or 32 cores the problem will be a lot more evident. I'm guessing the issue will really surface when large organizations start to upgrade their desktop machines and find out that instead of running faster, as has always been the case before, the software on those machines actually runs no faster or even slower. Nor a good return on their investment.


Evolution or Revolution
A lot of people are suggesting ways in which we get around this problem, the approaches advocated seem to fit broadly in two categories.

Evolutionary - We need to learn how to use existing threading libraries and features properly from within existing programming languages.

Revolutionary - We need new to start using different programming languages to write our code (i.e. Erlang)

What surprises me is the concentration on programming languages and language level features, I believe both approaches will fail unless we look at the architecture and patterns we use to deliver software.

Evolutionary
Lets look first at the evolutionary approach. Many people are recommending we learn how to use threads, locks, semaphores et al correctly but this is no use if you just try and bolt multithreading onto patterns you would choose for a non-concurrent design. All to often everything is designed and written ignoring the need for concurrency with the hope the bottlenecks can be tackling by adding a 'little bit' of concurrency later.

To give a concrete example consider the observer pattern, all too often this is implemented in a way that means the order that the observers are invoked is fixed. A common approach to 'bolting on' concurrency to a non-concurrent design is to service each observer on a different thread. In fact this often looks likes its working when tested on a single core machine (where the order is deterministic) but soon becomes a source of bugs once it's running on a multi-core machine and the order that the observers are called can change.

Another example is taking work from a queue, why not just have multiple threads consuming from the queue? If you've designed for this you'll be fine, but often that does not happen and it turns out that when the order items are taken from the queue is changed the outcome changes i.e the cancellation of an order overtaking the creation of that order. So the cancel is processed first and discarded (the order does not exist yet!) and then the order creation arrives.....

Concurrency is also something that is hard to compartmentalise and isolate to one part of software, the impact of concurrency tends to be pervasive. So you need to design for multi-core from the beginning and choose an architecture and patterns that create a suitable abstraction around the concurrency required for your problem domain. You may then find someone has already created or documented approaches and patterns that meet your need and you never actually need to start creating your own threads, locks, etc. This is where choosing from existing concurrency libraries comes in.

Revolutionary
So what about the revolutionary approach? I think here it's even clearer we need to change the patterns and architecture we use. I really hope people don't think that by just coding in Erlang, say, that software will magically scale to multiple cores. What's different about Erlang is that it has language features that provide really great support for a particular architectural approach. If you ignore those features and just write sequential code and design the solution as you always have you should not be surprised when it doesn't scale to many cores.

Patterns
So the evolutionary and revolutionary approaches are both valid if we adjust the architectures and the patterns we use to take account of concurrency. If we do that we'll create software that can better scale to multiple cores. If we try and ignore concurrency when making design decisions and instead expect languages or language features to just solve the problem for us we'll be no better off than we are today.

There is a reason we don't normally write assembly code directly, we have much higher level abstractions and languages. Using threads directly is not that dissimilar to assembly language, it directly exposes the underlying hardware level implementation. Therefore we should not be surprised that using threads directly is difficult and often unproductive, we need higher level abstractions for concurrency. Some languages come with features that make using certain patterns for concurrency much easier to implement than others, Erlang is the obvious example. Other languages have libraries that provide implementations of those same patterns, for example Retlang. Other people have written code that scales incredibly well across multiple CPUs in C and C++ as well. It's not about the language, it's about having a design that takes account of concurrency and then choosing patterns that provide suitable abstractions around it.

Conclusion
When it comes to software we have multiple patterns available and we try and choose the best ones for the problem at hand. We need to develop and use similar patterns to help use with concurrency and in fact many exist already. Many of the patterns described by http://www.enterpriseintegrationpatterns.com/ play equally well at the software level as they do the message level, many patterns in EDA (i.e. Event Collaboration Patterns) also lend themselves to concurrent solutions. There are plenty of other examples.

Its here at the level of patterns we should be focusing our efforts around solving the multi-core problem and not at the language level.

Tuesday, July 22, 2008

Tools: Enablement or Control

I've used a lot of tools during my software development career and it seems they fall primarily into two categories - tools that seek to control and tools that seek to enable.

Enabling Tools
  • Concentrate on addressing one key issue rather than the whole development life cycle
  • Don't dictate a particular work flow or way of working
  • Are often written by people who use them
  • Require a degree of trust between end users (so someone can ignore the agreed approach if they want)
  • Tend to be quick and easy to use
  • Are open and easy to integrate with other tools
  • Are adaptable to many situations and ways of working
  • Leading to: The ability to adapt to changing situations and constantly refine process leading to incremental improvements in productivity over time
Controlling Tools
  • Come with a built in vision on how a whole series of activities should be done
  • Dictate work flow and the ordering of activities
  • Assume no trust between end users (you must follow the tool's defined approach)
  • Tend to be complex and hard to use
  • Only work in one situation, you change to work their way or you spend all your time fighting the tool
  • Are often closed, proprietary and hard to integrate with other tools
  • Leading to: A lack of flexibility and consequently much frustration when trying to respond to changing circumstances, ultimately this constrains the productivity of the team
Another thing I've noticed over the years is no two projects are alike and no one approach fits every situation. If you don't have trust and good working relationships among your team members then the tools are the least of your worries.

So I prefer enabling tools over controlling tools - I'd much rather trust people and be flexible than constrain both people and the process.

Monday, July 21, 2008

Marketing Driven Development

Marketing Driven Development is a process where the features and priorities for a new release of software are driven primarily by a marketing department. So a feature is valued more for the ability to create a marketing campaign around it over any usefulness or value that it may have for an end user.

Marketing Driven Development is primarily a reaction to the fear of commoditization.

It's consequences often include
  • Well known bugs (or to use the correct term: sub-optimal feature) remain for many versions as fixing them is never a priority
  • Usability does not improve as this is rarely seen as valuable by marketing and anyway it was sold as easy to use the first time around
  • It tends to be expensive to buy, maintain and customize i.e. marketing driven development does seem to allow the avoidance of commoditization.

Thursday, July 17, 2008

.net: A fluent Facade around System.CodeDom

I've been doing quite a lot of work with System.CodeDom recently (more on exactly what later) and have frankly found the api quite hard to use.

So I've found myself growing a fluent interface facade around the api, for example the following code creates a class with one method that always returns the integer four.

[TestFixture]
public class TestsForWriteUp
{
[Test]
public void testShouldCreateClassWithOneMethod()
{
ClassBuilder builder = new ClassBuilder(new ResolveTypeFromAssemblies());
builder.AddMethod("theMethod", typeof (int)).Number(4).Return();

Compiler compiler = new Compiler("CodeGenTests.dll");
ReturnsAnInt dynamicallyBuildCode = compiler.compile<ReturnsAnInt>(builder);
Assert.AreEqual(4,dynamicallyBuildCode.theMethod());
}
}

public interface ReturnsAnInt
{
int theMethod();
}

So I get hold of a classbuilder, add a method to it, then add the number four and finally return. The reason things look a bit back to front is that the facade is using a stack under the covers, so what I'm really saying is
  • Push a statement (a constant integer of value four) on to the stack and then
  • Pop whatever is at the top of stack and return it.
Finally I take the builder and dynamically compile the contents of the builder into an assembly, in this case in a way that gives me back an instance of a particular type so I can immediately start working with. (I can also save the output to a dll if needed)

The new ResolveTypeFromAssemblies() injected into the ClassBuilder allows it to resolve any types that are used, in this case it just checks the loaded assemblies. The "CodeGenTests.dll" passed into the compiler lets it know the generated code depends upon that assembly (i.e. for the interface ReturnsAnInt).

Actually that first example is not very interesting, how about a slightly more complex example:

[Test]
public void testShouldCreateAddMethod()
{
MethodBuilder methodBuilder = builder.AddMethod("theMethod", typeof (int));
methodBuilder.DeclareParameter(typeof (int), "a").DeclareParameter(typeof (int), "b");
methodBuilder.VarUsage("a").VarUsage("b").BinaryOperator(CodeBinaryOperatorType.Add).Return();

TakesTwoArgs takesTwoArgs = compiler.compile<TakesTwoArgs>(builder);
Assert.AreEqual(3, takesTwoArgs.theMethod(1, 2));
Assert.AreEqual(9, takesTwoArgs.theMethod(4, 5));
}

public interface TakesTwoArgs
{
int theMethod(int a, int b);
}

Still pretty simple, declare a method, add two parameters, sum them and then return the result. Again note the back-to-front invocation style. This is considerably less code than you'd need to accomplish the same thing using the System.CodeDom classes directly.

The next example shows the use of a field, a constructor with an argument and then how to create an instance using that constructor. Hopefully much more terse than CodeDom but still easy to understand.

[Test]
public void testShouldCreateConstructorAndField()
{
string fieldName = "seed";
builder.DeclareVariable(typeof (int), fieldName);
builder.AddConstructor().DeclareParameter(typeof (string), "input")
.FieldUsage(fieldName).VarUsage("input").UnaryOperator(typeof(int),"Parse").VarAssignment();
builder.AddMethod("theMethod", typeof (int)).FieldUsage(fieldName).Return();

Parameter argument = new Parameter(typeof(string),"42");
ReturnsAnInt returnsAnInt = compiler.compile<ReturnsAnInt>(builder,argument);
Assert.AreEqual(42, returnsAnInt.theMethod());
}

Again things look back to front, if you read the line ending in .VarAssignment() from right to left you'll probably find it clearer to see what is happening. The reason for the use of the stack is that it greatly simplifies the underlying code, in effect I'm using Reverse Polish Notation

Finally here is the c# version of the code that is compiled:

namespace GeneratedCode {
using System;
using CodeGen;


public class DynamicallyCompiledCode : CodeGen.CompiledBase, CodeGenTests.TestsForWriteUp.ReturnsAnInt, CodeGen.ICompiled {

public int seed;

public DynamicallyCompiledCode(string input) {
this.seed = int.Parse(input);
}

public virtual int theMethod() {
return this.seed;
}
}
}

I'd be interested in comments, is this something people would find useful? It's grown out of a formula compiler, a dynamic proxy generator and a compiler compiler so is reasonably complete. I'll be posting more about the compiler compiler and how it is using this facade around CodeCom to bootstrap itself soon. (btw I never meant to write the compiler compiler, it just grew out of some work I've been doing around DSL's)

Wednesday, April 23, 2008

Cutting the Cost of Change over Just Cutting Cost

Many businesses talk about cutting cost, especially at the moment, and IT is often an easy target. The problem with focusing on just cutting cost is that you can increase your future cost of change.

The most obvious example is when you let people go who are experts in your legacy systems, you certainly cut cost as you are not paying their salary any longer - but at the same time you've just constrained your ability to deliver changes to those legacy systems.

Now if a business has been focused on cutting cost for a while chances are that any initiatives to pull functionality out of a legacy systems (where change might be costly) into newer systems (where, hopefully, changes are cheaper) will not have happened.

So you've been cutting opex, which funds the ability to change existing systems, and often also cutting capex which funds the creation of new systems or the movement of existing functionality into systems that require less opex.

So the end result of cost cutting is that the cost of change will increase over time, you are removing capability to change systems while at the same time restricting the ability to replace those systems. Eventually you reach a point where cost of change is high enough to mean you can't make the business case for even small changes. At that point you might decide to invest in replacing the legacy systems that have become so expensive to change - but chances are that you no longer understand what those systems do and that it's beyond the capacity of the business to fund replacing them. Even if you can afford to replace them what do you do in the mean time, it might takes months or years to build the replacement.

So what is the alternative? One might be to focus on cutting the cost of change as opposed to just cutting cost. So incrementally invest to move functionality off legacy systems into new systems or to keep software in a reasonable shape in the first place. Consider viewing systems as things that constantly evolve as opposed to things you grow to a certain point and then throw over the fence into support. Think about architectural approaches that reduce coupling and dependencies between systems. These are just suggestions and there are a lot of other things you could do, chances are the people on the ground will have some good idea of where to start.

If you can cut the cost of change that means you can get more done for less money, which is cutting your cost. Focus on the cost of change may well be the thing to measure and to drive down over time, as opposed to just cost per se.

Tuesday, April 08, 2008

Great post on "Bad Behaviour" in projects

These all ring very true.