Problems with DocBook


Most things went rather suprisingly well, but I did encounter a small number of hitches.


My first unpleasant surprise with the DocBook toolchain came when I tried to generate the printable PDF output on a Windows XP machine. Rather naively, perhaps, I'd assumed that since all the tools were Java based I'd be able to run them on any platform with a JVM. Not so.

The first time I tried a Windows build, I got a two page traceback which sliced through methods in, org.apache.fop.pdf, org.apache.xerces.parsers, arriving finally at the cause:

Caused by: java.lang.IllegalArgumentException: Invalid ICC Profile Data
       at java.awt.color.ICC_Profile.getInstance(
       at java.awt.color.ICC_Profile.getInstance(
       at java.awt.color.ICC_Profile.getDeferredInstance(
       at java.awt.color.ICC_Profile.getInstance(
       at java.awt.color.ColorSpace.getInstance(
       at java.awt.image.ColorModel.<init>(
       at java.awt.image.ComponentColorModel.<init>(
       ... 34 more

I had several options here: web search for a solution, raise a query on an email list, swap out the defective component in the toolchain, roll up my sleeves and debug the problem, or restrict the documentation build to Linux only.

I discovered this problem quite early on, before the technical author left—otherwise the Linux-only build restriction might have been an acceptable compromise; several other Product components were by now tied to Linux. (Bear in mind that the documentation build outputs were entirely portable, it was only the build itself which didn't work on all platforms). My actual solution was, though, another compromise: I swapped the JAI libraries for the more primitive [ JIMI] ones, apparently with no adverse effects.

The incident did shake my confidence, though. It may well be true that open source tools allow you the ultimate level of control, but you don't usually want to exercise it! At this stage I had only tried building small documents with a few images. I remained fearful that similar problems might recur when the manual grew larger and more laden with screenshots.


We all know that healthy software tools are in active development, but this does have a downside. Some problems actually arose from the progression of the tools I was using. For example, I started out with the version of the DocBook XSL stylesheets I found in the Hibernate distribution (version 1.65.1).

These were more than good enough for my needs, but much of the DocBook documentation I was using referred to more recent distributions. In this case, switching to the most recent stable distribution of the XSL stylesheets resulted in improvements all round. Apache FOP is less mature though: the last stable version (as of December 2005) is 0.20.5—hardly a version number to inspire confidence—and the latest release, 0.90 alpha 1, represents a break from the past. I anticipate problems if and when I migrate to a modern version of FOP, though again, I also hope for improvements.


XML is verbose and DocBook XML is no exception. As an illustration, here is a section of a DocBook document:

<section id="hello_world">
    <title>Hello World</title>
    Here is the canonical C++ example program.
    #include <iostream>

    int main() {
       std::cout << "Hello world!" << std::endl;
       return 0;

XML claims to be human readable, and on one level, it is. On another level, though, the clunky angle brackets and obtrusive tags make the actual text content in the master document hard to read: the syntax obscures the semantics.


The DocBook toolchain gave us superb control over some aspects of the documentation task. In other areas the controls existed but were tricky to locate and operate.

For example, controlling the chunking of the HTML output was straightforward and could all be done using build time parameters—with no modifications needed to the document source. Similarly, controlling file and anchor names in the generated HTML was easy, which meant the integration between the Product and the online version of the manual was both stable and clean.

Some of the printed output options don't seem so simple, especially for someone without a background in printing. In particular, I still haven't got to grips with fine control of page-breaking logic, and have to hope noone minds too much about tables which split awkwardly over pages.

Copyright © 2006 Thomas Guest