Books, blogs, comments and code samples

2008-11-25 • Documentation, C++, DocBook, Python • Comments

Fastware, slow progress

Few would argue with Scott Meyers’ claim to have written one of “the most important C++ books … Ever”. There is not (and could never be) a K&R for C++, but every C++ programmer should have access to the current edition of Meyers’ classic book, Effective C++, which makes sense of a subtle and complex language.

Naturally, then, I was interested to discover Meyers has started writing a new book, working title Fastware! Or maybe I should say that he will be starting on a new book just as soon as he can settle on a suitable toolchain. His new Fastware Project blog explores the issues.

Why are the tools he used so successfully to produce Effective C++ no longer adequate?

… my writing has been stalled for quite some time as I’ve wrestled with the question of what it means to write a book these days. For conventional print books, things are easy for an author, because the game is pretty well understood: ink is black, paper is white, standard font size is around 10 point, page dimensions are generally around 9"x6" with maybe a margin of around 1" on all sides …

But I don’t think the ink-on-paper world is the one I want to write for any more. I still want to write something that is recognizably a book, but I want to think of ink on paper as but one of many possible output devices. Others include computer screens (big with color support), portable ebook readers like Kindle (smaller and currently with no color support), and portable devices that happen to support text (e.g., iPhones — very small with color support).

— Scott Meyers, Two Projects in One

Here’s an example of the distinctions: Meyers corrects any errors he discovers in Effective C++ each time it gets reprinted, trying to ensure that no pages are renumbered as a result of these changes. So if you’re directed e.g. to page 44 of the 3rd edition of Effective C++, you’ll find what you’re looking for no matter which print run your copy happened to come from. Page numbers are the canonical way of referring to positions in a book, so they’d better persist.

For web based presentation of the same content, page breaking problems should be easier to avoid, but a fixed URL scheme is crucial; further, readers should be able to discover and use subsection links within a page (e.g. clicking on a subsection header could copy its permanent link to the clipboard).

On the Fastware blog, Meyers ponders more interesting examples, such as the representation of audio content and animations in different output formats. One key topic he has yet to discuss in depth is perhaps the most important of all: what to do about code samples?

Code and document editors

Consider the problem of including code in a book, or indeed an online article. Obviously, there’s a formatting problem. You can’t just paste (e.g.) C++ from your programming IDE into an HTML page or a .tex file: you’ll lose the formatting, and the angle brackets and ampersands need escaping. Not so very hard to solve, maybe, but still an issue to overcome. Experienced programmer and author Pete Goodliffe complains:

Putting code examples into blogger is hateful. HTML-conversion of templated C++ code is not my idea of a good time!

(Templated C++ code isn’t my idea of a good time either, but I think Pete Goodliffe means all those <angle> <brackets> are giving him grief.)

The formatting problem is amplified when it comes to comments. One advantage of web publishing is that readers can add comments, and may even offer code samples in their comments. One danger of web publishing is that crackers may try and attack your server by including code in their comments! As a result, comments must be sanitised, which often mangles code samples. This problem is far from being solved, as I realised when a reader posted some code in a comment on this site, which unfortunately the comment handler ruined.

Document editors and code

Pasting from a code editor into a document editor can lead to problems. So can writing code directly in a document editor — even if you’re Bjarne Stroustrup and the code is in C++, a language you invented and implemented. Here’s what happened when I tried compiling some code copied directly from Stroustrup’s paper Abstraction, libraries, and efficiency in C++ (PDF).

The code reads:

string s;
in >> s; // “in” is an input stream connected to a data source
cout << “I read “ << s.length() << “characters”;

The compiler barfs:

bs.cpp:3: error: stray '\226' in program
bs.cpp:3: error: stray '\128' in program
bs.cpp:3: error: stray '\156' in program
bs.cpp:3: error: stray '\226' in program
bs.cpp:3: error: stray '\128' in program
bs.cpp:3: error: stray '\156' in program
bs.cpp:3: error: stray '\226' in program
bs.cpp:3: error: stray '\128' in program
bs.cpp:3: error: stray '\156' in program
bs.cpp:3: error: stray '\226' in program
bs.cpp:3: error: stray '\128' in program
bs.cpp:3: error: stray '\157' in program
...
bs.cpp:3: error: 'I' was not declared in this scope
bs.cpp:3: error: expected `;' before 'read'

Not a bad haul of errors for code which the author found “trivial to write”!

Of course, Stroustrup has been caught out by some (not so!) smart quoting applied by his authoring tools.

(Any programming editor should expose this problem by failing to highlight “I read ” and “characters” as string literals, and if you’re reading this article served directly from wordaligned.org, rather than in a feed reader, then the javascript prettifier should do just this. The poorly spaced output bug may not be spotted until the code actually executes.)

Serious problems with code examples

Whether or not you can paste source code directly into the source text of your book/article depends on the format used for that source. I can and do copy code directly into the articles I post to wordaligned.org. A documentation toolchain based around Markdown takes care of conversion to HTML, and a clever javascript program handles syntax highlighting. The more serious problems here are twofold:

Whenever you cut and paste code between documents, more than one version of that code exists. You’ve introduced a branch.
Once code leaves its normal development environment, it can no longer be executed in the usual way.

Both of these problems can be overcome, and many conscientious authors have put together their own solutions, but I think it’s fair to say there is no single, accepted, way of solving them.

Programmers are familiar with build systems, and this is exactly Scott Meyers’ approach to book production:

… it’s crucial that I have a single master source for each book, and it’s also crucial that the various target versions of the book can be automatically built from the single master source. If this sounds like the usual requirement for cross-platform software development, it should, because that’s exactly how I think of it.

He doesn’t mention if his build system includes regression tests, which, in this case, would involve extracting the various code examples, building them, testing the output — ideally (especially for a language as close to the platform as C++) using multiple compilers on multiple platforms.

Python’s Doctest module

I haven’t cracked these problems myself and have often come to regret this flaw in the toolchain I use for code examples on this site. I’ve published syntax errors, off-by-one bugs, logical inversions — and this despite the fact that my programming and documentation editors are one and the same.

When writing about Python life should be a little easier since the compilation phase doesn’t exist, and perhaps this explains my laziness. Python’s doctest module also allows a degree of sanity checking, at least for any interpreted Python code. For example, a bug has somehow crept into the string reversal shown below:

>>> 'wordaligned.org'[::-1]
'gro.dengliadrow'
>>>

Doctest exposes this problem by reading in the file, finding anything which looks like an interpreted Python session, and playing it back, checking for errors.

python -c 'import doctest; doctest.testfile("code-samples")'
**********************************************************************
File "code-samples", line 104, in code-samples
Failed example:
    'wordaligned.org'[::-1]
Expected:
    'gro.dengliadrow'
Got:
    'gro.dengiladrow'
**********************************************************************
1 items had failures:
   1 of   1 in code-examples
***Test Failed*** 1 failures.

Note though, that to properly check this article, with its intentional mistake, doctest alone is no longer up to the job, since I need to confirm the mistake fails correctly.

Subtle problems with code examples

So, formatting problems should be simple to solve. Testing code examples for correctness is hard. There are more subtle problems, too.

How do you annotate sample code? Authors frequently use comments for such annotations, resulting in code examples which are far from exemplary. Here’s another Stroustrup program, taken from Learning Standard C++ as a New Language (PDF).

#include<iostream>//get standard I/O facilities 
#include<string>//get standard string facilities 

int main()
{ 
    using namespace std; //gain access to standard library 
    cout << "Please enter your first name\n";
    string name;
    cin >> name;
    cout << "Hello " << name << ’\n’; 
}

It’s seductively easy to adopt this style of annotation, and it’s employed in many of the best programming texts (including Effective C++). Nonetheless, it’s poor programming style. I’d like to see authors find a better way.

Can code in a book ever differ from production code? I’m more convinced by Jon Bentley’s up front note in the preface to his excellent and code-packed book, Programming Pearls:

The programs use a terse programming style: short variable names, few blank lines, and little or no error checking. This is inappropriate in large software projects, but it is useful to convey the key ideas of algorithms.

He’s right: this particular book would be less accessible if the code (e.g.) checked inputs rigorously, or employed the variable naming conventions Bentley prefers for large software projects. The difference is that the code in his book exists to illustrate the key ideas dealt with more fully in the text; whereas, in a software project, the code is the text.

Bentley balances code and text superbly, switching between pseudocode, real code and prose to find a solution matching the precise needs of his book. My only complaint is that you can’t download the code examples from the website referenced in the book (www.programmingpearls.com, don’t go there!) because someone has snatched the domain. Other books have different goals and it would be a severe failing if, for example, Effective C++ were to include code examples which weren’t exception safe. How often does “exercise for the reader” sound like “excuse for the writer”?

Content and presentation

Testing code samples is really a diversion from the points Meyers raises on his blog: he’s more concerned with multiple format presentation. How exactly do you arrange for suitable syntax highlighting in print, on a computer screen, in a podcast?

Traditionally, authors deliver content to publishers, and publishers control the presentation of that content. Both sides do what they’re best at. In the world of computing books, this model often isn’t the case. Programmers are capable of driving and configuring the software involved with book production — and indeed of writing new software if what exists isn’t good enough. Many programmers are ahead of the game when it comes to understanding the opportunities with newer formats. (I could also add that programmers like to be self-sufficient, enjoy tinkering, value control, and think they know better!) Reading Meyers’ blog, it becomes apparent that he takes immense care — and exercises complete control — over the final presentation of his traditional book content, in fact delivering his publishers camera-ready copy.

I write my books with a goal of their remaining useful for at least five years, and there are generally at least one or two reprints each year, so camera-ready copy for one of my books should have to be produced at least 10 times. It’s often more than that. More Effective C++, which I wrote in 1996, is now in its 26th printing.

Perfectionism and programming

Perfectionism and programming is a rare combination. Donald Knuth’s famous typesetting program, TeX, is reputedly as close as a substantial program ever gets to perfection, and it continues to set the standard for printed material (and looks likely to play a role in the print version of Fastware). TeX was born from a dissatisfaction with the available tools. Knuth simply couldn’t accept the content of The Art of Computer Programming being spoiled by ugly presentation

I had spent 15 years writing those books, but if they were going to look awful I didn’t want to write any more 1

and he had the determination, passion and ability to take a ten year detour and do something about.

The outlook for Fastware!

Will Meyers find himself similarly diverted? I don’t know, but I’m enjoying reading his thoughts.

On reflection, I suggest Effective C++ succeeds because of its narrow scope. It doesn’t aim to teach programming, or even programming in C++: it’s a concise survival guide for those who work with the language. As such, its traditional printed form serves it well. It’s not an entertainment (you wouldn’t want to listen to it on a long journey) and nor is it strictly a reference (you won’t be using it to cut and paste code from): rather, it’s packed with material you need to read carefully and understand.

Fastware! a language-agnostic, multi-format book, will clearly be something very different.

Subscribe!

Notes

DocBook aims to solve the multiple output formats problem. Loosely speaking, it comprises an XML vocabulary to describe a book’s content, together with XSLT transforms and other tools to convert that content into web pages, PDFs and so on. You can read about my experience with DocBook here.

I also recommend More Effective C++ and Effective STL, both by Scott Meyers.

1 This Knuth quotation appears in the links section of the Wikipedia page on TeX.

Word Aligned

sweating the small stuff