Monday, January 15, 2007


Consider how an engineer designs a bridge. There's a lot of analysis and math, plus there's a big game of "what's the worst thing that could happen":

"What if there were a fire here, fueled by the natural gas line, how hot could it get? What would it melt? What if a big truck smacked into this support? (Engineers LOVE using technical terms like "smacked into" or "walloped".) How many beams could we lose here and still have the bridge stand? What would happen to the bridge in an earthquake? What if it were a deep earthquake instead of a shallow one? Where's the worst place a bomb could be planted? How fast can the wind get here, are there any harmonics we need to worry about? What about Godzilla? Can we conclusively rule out Godzilla?"

Have you ruled out Godzilla for your code?

It's about designing for the unexpected. Making code that not only works now, but keeps working when the conditions change (or at least calls definitive attention to itself). For an example, let's come up with a simple-minded method to add a string to the start of a linked list:

   1: void AddString(char * s)
2: {
3: Node * p = new Node(s);
4: p->next = this->head;
5: this->head = p;
6: }
This sample represents level 0 of robustness - what you can crank out without thinking. At first glance this might seem to be enough, especially if you're the only one calling this function. You know you'll only pass in good parameters. This breeds a false sense of security, though...someone else might eventually call this code, or you might call it nine months from now when you've forgotten all the assumptions you made.

The next level represents some basic debug-build robustness. (For those unfamiliar with the idiom, an assert is a function that fires when its expression is false, stopping the program in the debugger so the programmer can figure out what went wrong. Typically it's only checked on the slower, debug builds used for testing - not the fast release builds that are shipped to customers.)
   7: void AddString(char * s)
8: {
9: Assert(s != null && s[0] != '\0'); // don't allow null or empty strings
10: Node * p = new Node(s);
11: Assert(p != null); // the allocation might have failed
12: p->next = this->head;
13: this->head = p;
14: }
This is great for catching bugs during development but it isn't much use to the end user running your software she just downloaded. For that, you need some checks that are done for release builds:
  15: void AddString(char * s)
16: {
17: if (s == null || s[0] == '\0') // don't allow null or empty strings
18: {
19: // Error: do something.
20: }
22: Node * p = new Node(s);
24: if (p == null) // the allocation might have failed
25: {
26: // Maybe you weren’t listening: DO something.
27: }
29: p->next = this->head;
30: this->head = p;
31: }
(It isn’t easy to figure out what to do when you detect an error. It's a complicated subject that deserves an article of its own.) Sometimes you'll see this type of code with the asserts still included, just to break to the debugger if "impossible" situations are encountered.

How can you make it even more robust? You can introduce the idea of pre and post conditions for your functions. These are the contracts you're agreeing to with your function and you can include code to enforce these contracts (back to asserts to keep things short):
  32: void AddString(char * s)
33: {
34: Assert(IsValidListWithNoCycles(this->head)); // we're assuming this is true when the function starts
35: int startSize = GetListLength(this->head);
37: Node * p = new Node(s);
38: p->next = this->head;
39: this->head = p;
41: Assert(IsValidListWithNoCycles(this->head)); // this better still be true
42: Assert(GetListLength(this->head) + 1 == startSize); // and we can check that we added something
43: }
What more can be done? How about adding another thread to watchdog the data structure?
  44: void ListWatcherThreadFunction(Container * c)
45: {
46: while (1) // or check for an exit flag
47: {
48: Assert(IsValidListWithNoCycles(c->head));
49: Sleep(); // or wait or whatever
50: }
51: }
This might very well be overkill, but it might be warranted in some situations. Are there any other ideas for making this more robust?

Of course there are some drawbacks to adding this robustness. You add complexity and you add more lines of code to maintain (and debug). You might introduce performance problems, but those really should be measured and any really slow sections of code can be made "debug build only".

When you do a code review with your team (you do have code reviews, right?) what level of robustness do you expect to see? Has your team ever worked out what's appropriate for the project? Maybe you should call a truce in the vicious battles over bracket placement (let me settle the arguments for you: my way is the right way) and devote some time to debating what kind of robustness should be designed into the product.

And whatever you do, do NOT forget the hazards of Godzilla.

No comments: