News

Self-documentation of code

Richard examines the inadequacy of documentation for free software and asks whether there’s a better way of producing it

The inadequacy or lack of documentation of software is a recurring issue. This applies just as often to proprietary software as it does to free software. Documentation of code has two main purposes: to make the code readable for other programmers, and to make the code useable. Good documentation of free software is vital for users, and contributing to the documentation (or translation to a minority language) of a free software project is a good way to get involved for those who don’t know where to start, or how to program, and want to know how it’s done. The problem is a shortage of recruits.

Good code is self-documenting and looks right even when you don’t know its purpose. Good code just works, and even when it fails, is easy to fix. Code that doesn’t feel right at a glance is almost certainly better rewritten from scratch, or you’ll never get to the end of the problem. Code that doesn’t contain some level of internal documentation is hard to follow and maintain. Or, as Linus Torvalds puts it in his own coding style guidelines for the Linux kernel, “You know you’re brilliant, but maybe you’d like to understand what you did two weeks from now.”

He also suggests the reader of his coding guidelines begins by first “printing out a copy of the GNU coding standards, and not read[ing] it. Burn them, it’s a great symbolic gesture.” Torvalds is sceptical of rules and guidelines, but his own document on coding style is a tacit acceptance of the fact that a bit of documentation makes a lot of things a lot easier. We like the source code, but reading the source code isn’t always the best way to discover the coder’s intentions – just as the coder’s intentions are not always the same as what the code actually does.

Code is about style, and good code not only solves the immediate problem but leaves a solution in place for the next sequence of problems that might arise. In the broadest sense, documentation and usability are interchangeable notions. A program without a clear description of its purpose and parameters is incomplete. Good documentation is even more vital at the user level, and isn’t always reliable, whatever its source. Documentation suffers bugs and errors just like the bugs and errors that afflict both good and bad code – probably more so, because the programmer, the one who is most likely to know what the code actually does, isn’t always involved in the production of the end-user documentation.

Linus Torvalds once declared: “Stacks of papers, diagrams and rules are absolutely worthless if you can’t just understand the fact that documentation is nothing more than a guide-line… Once you realise that documentation should be laughed at, peed upon, put on fire, and just ridiculed in general, then, and only then, have you reached the level where you can safely read it and try to use it”, he wrote. “I’m continually amazed and absolutely scared silly by your blind trust in paperwork, whether it be standards or committees or vendor documentation.”

The internals of the Linux kernel are of primary concern to system level programmers, and as such are not typical. System and userland code have very different purposes, and system level and userland documentation will also tend to have very different audiences, but documentation gives both the user and the coder an entry point into the code and its purpose. Projects like Gimp or LibreOffice require specialist documentation because there are so many variables, but typically, software written to a Unix model is modular and has a single purpose, and requires little documentation other than a lucid description of its purpose and the effect of its inputs and outputs.

The person who knows the code best is the programmer, and there is an argument for saying that software, with headers written by the programmer to a coding style and flexible syntax rules, and I/O written to strict I/O guidelines (such as KDE programs might be), can be truly self- documenting. Software can be used to parse program headers and I/O descriptors and generate formatted documentation to a given standard.

This is not an original idea, and is not as unrealistic or as prescriptive as it may seem. The immediate results may be imperfect, but practice can only improve the outcome, and it has been done before – but can only work if a project has a good set of rules on internal documentation and acceptance of code.

×