Part Three - Refactor

Introduction

Of course I did have a go at re-implementing the codec using the third implementation strategy:

Devise a code generator which, given a section format, will generate a program to encode/decode that particular format.

In fact, my solution using the code generator approach reads in a list of section formats at build time. These formats are then parsed – still at build time – and semantic actions associated with the parser create a generated C++ file. This generated file then gets built into the actual DVB codec.

Advantages of the Code Generator Approach

The main advantages are:

Problems which used to be caught at run time can now be caught at build time. For example, we can check that our section formats are syntactically valid at build time.
Since the section format parsing has been done at build time, the dvbcodec itself runs more efficiently. Without any deliberate attempt to optimise we have achieved a five-fold speed up – and I would also guess a much reduced memory requirement (I haven't measured this).
The refactor caused me to split up a Context class, which was taking on more responsibility than it should have. As a result, the class which maintains context during decoding (things like position in the input data, a container mapping field names to field values etc) has been separated from the class which produces the formatted output. This is a better partitioned and more flexible solution: for example, you could generate XML structured output simply by providing a variant implementation of this decode output class.

A common advantage of code generators – of reducing the amount of code to be written and maintained – was, in this case, absent. In fact, there is very slightly more code in the generated dvbcodec.

Disadvantages of the Code Generator Approach

The generated code approach has some very real disadvantages:

The codec worked just fine as it was: any refactor is risky, even with unit tests in place.
The generator approach puts more strain on the build system. Of course, we generally prefer to work harder at build time if it gives us more confidence about what will happen at run time – but this is still an important consideration. Some integrated development environments don't accomodate custom builds well.
Writing code to write code is a form of metaprogramming with the various associated metaproblems: what might once have been achieved directly now requires thinking on a different level. This is particularly tricky when – as in this case – C++ is being used to write C++.

To illustrate this third point, compare the following direct lines of C++, used to produce formatted output:


std::string const value =
    context.readFieldValue(field_name,
                           decodeUnsignedLong(bitwidth));

context.decodeOut()
    << context.indent()
    << field_name << " "
    << bitwidth
    << " = 0x" << value << "\n";

with the lines used to generate code which has the equivalent functionality:


cpp_
    << indent()
    << "context.decodeOut() << context.indent() << "
    << quote(field_name
            + " "
            + bitwidth 
            + " = 0x")
    << " << context.readFieldValue("
    << quote(field_name) + ", "
    << decodeUnsignedLong(bitwidth) << ") << \"\\n\";\n";

Now you know why I had to separate decode context from decode output for the generated version. You'll also realise why it's more common to use a language like Python – with its handy support for triple quoted strings, raw strings and so on – to generate C++.

One common disadvantage – that the generated code isn't very nice to look at – was, in this case, absent. I was genuinely surprised and delighted by the clarity of the generated section decoder. The extract below compares nicely with the PAT section syntax.


// GENERATED FILE. DO NOT EDIT.
// Generated by: dvbcodecgenerator 
// On: Nov 25 2004
/**
 * Copyright (c) 2004, Thomas Guest. All rights reserved.
 * @file
 * @brief Generated section decoders.
 */    

/**
 * @brief Generated program_association_section decoder
 */
void
decode_program_association_section(DecodeOut & out)
{
    out.putSectionName("program_association_section()");
    {
        out.enterBlock();
        out.putField("table_id", 8);
        out.putField("section_syntax_indicator", 1);
        out.putField("'0'", 1);
        out.putField("reserved", 2);
        out.putField("section_length", 12);
        out.putField("transport_stream_id", 16);
        out.putField("reserved", 2);
        out.putField("version_number", 5);
        out.putField("current_next_indicator", 1);
        out.putField("section_number", 8);
        out.putField("last_section_number", 8);
        out.putLoopControl("for (i=0; i<N; i++)");
        out.enterBlock();
        while(!out.testLoopExit())
        {
            out.putField("program_number", 16);
            out.putField("reserved", 3);
            if (out.putIf("program_number", "==", '0'))
            {
                out.enterBlock();
                out.putField("network_PID", 13);
                out.leaveBlock();
            }
            else if (out.putElse())
            {
                out.enterBlock();
                out.putField("program_map_PID", 13);
                out.leaveBlock();
            }
        }
        out.putField("CRC_32", 32);
        out.leaveBlock();
    }
}

On Balance

On balance, I prefer the version of dvbcodec based on the generated section decoder. The big win is that more errors can be caught at build time. Either way, it was educational and interesting to explore both options.