Thursday, November 30, 2006

Dangers of Serialization

I get quite worried when I see serialization used for messaging, it tends to make the solution fragile and harder to change. My concerns partly flow on from the earlier posting about schemas giving a false sense of security. There is always a temptation when you have a schema to auto generate a class from it and then just use serialization. There a few issues with this:
  • Schema changes require recompilation of the code to re-generate the class
  • It's harder to work out which fields from the message you actually care about on the subscriber side
  • There are more cases where changes in the schema break the code (or rather cause the deserialization to stop working)
  • There is also a temptation to just serialize domain objects and start passing those around instead of defining proper business documents. (there are so many reasons why this is bad, a topic for a whole other post)
I much prefer the 'get named field' approach where the subscriber extracts only those fields they care about. At face value it seems like more work but there are quite a lot of benefits:
  • You can ask your message subscribers to write tests for fetching the values they care about. So then you can know exactly which parts of the message the subscriber cares about and if recent changes have broken anything. (more on testing and schematron soon)
  • Subscribers are far less exposed to schema changes: ordering, extra fields and/or extra elements don't break anything as you can still just get the parts of the message you care about
  • You can improve performance later on by pre-fetching or caching values so you don't hit the underlying message every time, as opposed to deserialing the whole message
  • I'd also argue you achieve better encapsulation since, when using serialization the fact this mechanism is involved tends to 'creep' into other parts of the code. i.e. people start passing the serialized objects around instead of just using them for messaging.
Of course this assumes a document centric approach to integration and messaging, as opposed to RPC, say. It does seem people are using Web Services to re-implement RPC style integration over a new technology stack, but that rant will have to wait for another day....

Finally having said all of that if you have messages (or parts of messages) which hardly ever change, and when they do you expect code changes, then serialization can probably help.

Schema Validation offers a false sense of security

I'm coming to think that schemas can offer a false sense of security when used to validate XML messages at runtime.

Suppose I have have an order message with an element called price. I could have the message subscribers validate the order messages against a schema. However this doesn't seem to address the most likely error - that we make a mistake in the code. So we try and access an element called prike, say. The schema validation didn't help at all, we still can't consume the message.

Now suppose someone sends a message with the price element misspelled, the validation fails and we are still can't consume the message.

Finally by validating the schema everytime we've made change a lot harder, if the order schema changes then chances are I have to ship it to all of my subscribers, even if the change was a simple addition that doesn't actually break anything.

Much better to write some tests using the schemas. So always validate my test messages against the schema before I use them in tests. Write a test to check my test messages exercise all the attributes and elements in the schema. I can also write tests for interoperability across multiple versions of a schema. etc. etc. We could get even cleverer and get these tests to run for *all* my different applications whenever a new version of a schema is checked in.

So I'm thinking most of the time schema validation at runtime is just going to tell you about a bug in some code that should have been caught by a unit test.

(Of course runtime validation does still have it's place, for example where I receive messages from many different third parties. )

Monday, November 20, 2006

Alistair Cockburn on what engineering has in common with manufacturing

I've heard a lot of people talk about this topic recently, this article from Alistair Cockburn seems particularly well thought out and reasoned.

What engineering has in common with manufacturing and why it matters - AC

I certainly buy the idea of looking for the bottlenecks - in fact when I'm being an iteration manager this is often what I'm doing. Do we have enough story backlog? Are the developers finishing the stories quickly enough? Are the Business signing stories off? etc. etc.

That said I'm still not convinced bespoke (Agile) software development is 'like' manufacturing - especially when you look at the much longer lifecycle of product a typical manufacturing production line is set up to service. For example when we buy a car we are (usually) buying something that has been researched, designed and developed over a period of years - often including the provision and tooling of a multi-million pound production line. While we can customize some aspects (fabric, paint, engine size) the core of the car remains the same.

Anyway it's interesting to see the different levels of abstraction people are trying to use in order to map the concept of (lean) manufacturing onto the software development, some more successfully than others.

Friday, November 03, 2006

Intentional software, WYSIWYG and testing

Charles Simonyi has blogged about some parallels between what they are doing at Intentional and the development of WYSIWYG. It's an interesting read, in particular the familiar idea that change in software often creates problems - we've all fixed one bug only to introduce another. However I think we have a solution to that problem already, test driven development. If I write tests first and then change something I have the safety net of a failing test to tell me I broke something.

If we make our acceptance tests express the intentions of the business person we can verify both that our code is behaving correctly and also that it captures the correct intent. This can be as simple as giving our tests meaningful names or something like Behavior Driven Design which is a more evolved form of the same idea.

All sounds nice huh? However when we look at more complex applications and in particular some of the acceptance tests things come a bit unstuck. I've seen projects where the complexity of the acceptance tests becomes a barrier to change, and they are all to often in a form a business person can't understand let alone express an intent in.

Of course this can be because someone didn't make the economic choice of just testing things manually, sometimes thats just cheaper and easier. On the other hand if I had a tool which let business people easily capture their intentions as tests.....Of course nothing new with this kind of idea, Martin Fowler mentioned something similar an age ago, we also found ourselves writing java code akin to a DSL for expressing acceptance tests on a recent project.