A language people use and bitch about

2016-08-24, Comments

Every so often a new essay hits the top of the programming news sites:

  • why I’m rewriting X in C (and abandoning C++)
  • why we will never use C++ for Y (which has always been written in C)
  • why C++ is a horrible language (it isn’t C)

I’m a sucker for these articles and have some sympathy with them. Every so often I work with C code rather than C++, and find the experience refreshing: rapid compilation, clear build diagnostics, comprehensible APIs, easy debugging, and an agreeable reduction in layering and punctuation.

It’s always easy to return to C because it hasn’t really changed. Well, it has changed, conservatively, and C programmers seem conservative in adopting new features.

By contrast, C++ does change. Sure, C++0X was a long time coming — finally realised as C++11 — but since that giant step, smaller improvements continue apace. As a result C++ code soon looks stale. Every time I visit a source file it seems I have to freshen it, using the new way to prevent copying, or return containers, or manage resources, or pass functions, or … you get the idea.

The truth is, every substantial C++ codebase I’ve worked on reads like a history of C++. Somewhere low down there’ll be hand-written intrusive linked lists; then there’ll be a few layers of classes, some nested for good measure; then templates, boost, functions disguised as classes for use in algorithms; and finally the good stuff, in which auto, lambda, move semantics and intialisers allow us to code as we’ve always wanted.

Bjarne Stroustrup defends C++ on his website:

As someone remarked: There are only two kinds of programming languages: those people always bitch about and those nobody uses.

If that “someone” wasn’t Stroustrup, it is now — he has become the source of his own quotation. His point is that C++ is well-used and must therefore be bitched about.

C provides an immediate counter-example, however. It is undeniably used, but its traps and pitfalls are known and respected, rather than bitched about.

Ultimately, though, I find C too limited. The lack of standard containers and algorithms combined with the overheads of heap management and ubiquitous error checking make programming in C heavy going.

C++ is hard to learn and hard to keep up with. The language has grown, considerably, but the progression is necessary. It’s the more recent extensions which fully realise the power of the standard template library, for example. Stick with C++. The reward will be code which is expressive and efficient.

Negative Sequence Indices in Python

2016-08-01, Comments

Supply a negative index when accessing a sequence and Python counts back from the end. So, for example, my_list[-2] is the penultimate element of my_list, which is much better than my_list[len(my_list)-2] or even *(++my_list.rbegin()).

That final example uses one of C++’s reverse iterators. It gets the penultimate element of a collection by advancing an iterator from the reversed beginning of that collection. If you’re addicted to negative indices you can use them with C++ arrays, sort of.

Negative array indices in C++
#include <iostream>

int main()
{
    char const * domain = "wordaligned.org";
    char const * end = domain + strlen(domain);
    std::cout << end[-3] << end[-2] << end[-1] << '\n';
    return 0;
}

Compiling and running this program outputs the string “org”.

Going back to Python, the valid indices into a sequence of length L are -L, -L+1, … , 0, 1, … L-1. Whenever you write code to calculate an index used for accessing a sequence, and especially if you’re catching any resulting IndexErrors, it’s worth checking if the result of the calculation can be negative, and if — in this case — you really do want the from-the-back indexing behaviour.

 
 
 
 
0
1
2
3
4
5
6
7
8
9
10
11
12
13
 
 
 
 
W
O
R
D
A
L
I
G
N
E
D
 
 
 
 
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
 
 
 
 

The power of negative indices increases with slicing. Take a slice of a sequence by supplying begin and end indices.

>>> domain = 'wordaligned.org'
>>> domain[4:9]
'align'
>>> domain[4:-4]
'aligned'
>>> digits = list(range(10))
>>> digits
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> digits[3:4]
[3]
>>> digits[1:-1]
[1, 2, 3, 4, 5, 6, 7, 8]

Omitting an index defaults it to the end of the sequence. Omit both indices and both ends of the sequence are defaulted, giving a sliced copy.

>>> domain[-3:]
'org'
>>> domain[:4]
'word'
>>> digits[:]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

I prefer the list(digits) form for copying digits but you should certainly be familiar with the digits[:] version.

You can supply any indices as slice limits, even ones which wouldn’t be valid for item access. Imagine laying your sequence out on an indexed chopping board, slicing it at the specified points, then taking whatever lies between these points.

>>> digits[-1000000]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> digits[1000000]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> digits[-1000000:1000000]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Sequence slicing also takes a step parameter.

>>> digits[::2]
[0, 2, 4, 6, 8]
>>> digits[1::2]
[1, 3, 5, 7, 9]

This parameter too can be negative. The sign of the step affects which limiting values the begin and end slice parameters default to. It’s subtle behaviour, but you soon get used to it.

>>> digits[0:10:-2]
[]
>>> digits[::-2]
[9, 7, 5, 3, 1]
>>> digits[-2::-2]
[8, 6, 4, 2, 0]

How do you reverse a string? Slice it back to front!

>>> domain[::-1]
'gro.dengiladrow'

To recap: the default slice limits are the start and end of the sequence, 0 and -1, or -1 and 0 if the step is negative. The default step is 1 whichever way round the limits are. When slicing, s[i:j:k], i and j may take any integer value, and k can take any integer value except 0.

The zero value creates another interesting edge case. Here’s a function to return the last n items of a sequence.

>>> def tail(xs, n)
...     return xs[-n:]
...

It fails when n is 0.

>>> tail(digits, 3)
[7, 8, 9]
>>> tail(digits, 2)
[8, 9]
>>> tail(digits, 1)
[9]
>>> tail(digits, 0)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

By the way, we’ve already seen slicing working well with lists and strings. It also works nicely with range objects.

>>> r = range(10)
>>> r[::2]
range(0, 10, 2)
>>> r[1::2]
range(1, 10, 2)

Python Streams vs Unix Pipes

2016-07-28, , , , Comments

I chanced upon an interesting puzzle:

Find the smallest number that can be expressed as the sum of 5, 17, 563, 641 consecutive prime numbers, and is itself a prime number.

Small primes graphic

Infinite series and Python

The prime numbers famously form an infinite series and my first thought was to tackle this puzzle using Python iterators and generators. Courtesy of the Python Cookbook, I already had a couple of useful recipes:

def primes():
    '''Generate the sequence of prime numbers: 2, 3, 5 ... '''
    ....

def stream_merge(*ss):
    '''Merge a collection of sorted streams.
    
    Example: merge multiples of 2, 3, 5
    >>> from itertools import count, islice
    >>> def multiples(x): return (x * n for n in count(1))
    >>> s = stream_merge(multiples(2), multiples(3), multiples(5))
    >>> list(islice(s, 12))
    [2, 3, 4, 5, 6, 6, 8, 9, 10, 10, 12, 12]
    '''
    ....

Both these functions merit a closer look for the cunning use they make of standard containers, but we’ll defer this inspection until later. In passing, note that stream_merge()’s docstring suggests we might try using it as basis for primes():

  1. form the series of composite (non-prime) numbers by merging the streams formed by multiples of prime numbers;

  2. the primes remain when you remove these composites from the series of natural numbers.

This scheme is hardly original — it’s a variant of Eratosthenes’ famous sieve — but if you look carefully you’ll notice the self-reference. Unfortunately recursive definitions of infinite series don’t work well with Python[1], hence primes() requires a little more finesse. We’ll take a look at it later.

Moving on then, to solve the original puzzle we need a consecutive sum filter. This takes a stream of numbers and yields a stream of consecutive sums of these numbers:

def consecutive_sum(s, n):
    '''Generate the series of sums of n consecutive elements of s
    
    Example: 0, 1, 2, 3, 4 ... => 0+1, 1+2, 2+3, 3+4, ...
    >>> from itertools import count, islice
    >>> list(islice(consecutive_sum(count(), 2), 10))
    [1, 3, 5, 7, 9, 11, 13, 15, 17, 19]
    '''
    lo, hi = itertools.tee(s)
    csum = sum(next(hi) for _ in range(n))
    while True:
        yield csum
        csum += next(hi) - next(lo)

Here we visualise the summed elements as lying within a sliding window: each time we slide the window an element gets added to the top and an element gets removed from the bottom, and we adjust csum accordingly.

So, now we have:

  • the series of prime numbers, primes()
  • a stream_merge() connector
  • a consecutive_sum() filter

The remaining stream adaptors come from the standard itertools module. Note that the stream_merge() works here since all the consecutive sum series are strictly increasing. Note also that the stream of prime numbers can be treated as consecutive_sum(s=primes(), n=1), handling the “and is itself a prime number” requirement.

>>> lens = 1, 5, 17, 563, 641
>>> N = len(lens)
>>> from itertools import tee, groupby
>>> ps = tee(primes(), N)
>>> csums = [consecutive_sum(p, n) for p, n in zip(ps, lens)]
>>> solns = (n for n, g in groupby(stream_merge(*csums)) 
             if len(list(g)) == N)
>>> next(solns)
7002221

Here’s a picture of how these stream tools link up to solve this particular puzzle. The great thing is that we can reconnect these same tools to solve a wide range of puzzles, and indeed more practical processing tasks. To use the common analogy, we direct data streams along pipes.

Stream connections

Infinite series in Other Languages

Python is the high-level language I find most convenient most of the time, which explains why I reached for it first. It’s an increasingly popular language, which helps explain why it I didn’t need to write the tricky parts of my solution from scratch: they’d already been done. Python is also a language which makes compromises. Having used Python to find a solution to the puzzle I wondered if there wasn’t some other language better suited to this kind of problem.

Haskell makes no compromises when it comes to functional programming. Its lazy evaluation and guilt-free recursion make it a perfect fit for this kind of puzzle — but my Pythonic approach of teeing, filtering and merging streams made me think first of the Unix Shell. Now, I use Bash every day and page through its manual at least once a week. Scripting appeals and I’m comfortable at the command line. How hard could it be to solve this puzzle using Bash? After all, I already knew the answer! Let’s give it a go.

Partial sums.

Here’s a simple shell function to generate partial sums. I’ve used awk, a little language I gave up on a long time ago in favour of more rounded scripting languages like Perl and then Python. Now I look at it again, it seems to fill a useful gap. Awk processes a file sequentially, applying pattern-action rules to each line, a processing pattern which I’ve reinvented less cleanly many times. Despite my rediscovery of awk, I’ll be keeping its use strongly in check in what follows.

$ psum() { awk '{ print s += $1 }'; }

Much like Perl, awk guesses what you want to do. Here, it conjures the summation variable, s, into existence, assigning it a default initial value of 0. (Good guess!) Since we’re doing arithmetic awk converts the first field of each input line into a number. We can test psum by using jot to generate the sequence 1, 2, 3, 4, 5 (on a Linux platform use seq instead of jot).

$ jot 5 | psum
1
3
6
10
15

Consecutive sums

You may be wondering why we’ve bothered creating this partial sum filter since it’s the sums of consecutive elements we’re after, rather than the sum of the series so far. Well, notice that if P[i] and P[i+n] are two elements from the series of partial sums of S, then their difference, P[i+n] - P[i], is the sum of the n consecutive elements from S.

So to form an n-element consecutive sum series we can tee the partial sums streams, advance one of these by n, then zip through them in parallel finding their differences. An example makes things clear:

$ mkfifo pipe
$ jot 5 | psum | tee pipe | tail -n +2 | paste - pipe
3       1
6       3
10      6
15      10
        15

Here, jot 5 generates the sequence 1, 2, 3, 4, 5, which psum progressively accumulates to 1, 3, 6, 10, 15. We then tee this partial sum series through two pipes: the first, pipe, is an explicitly created named pipe created by mkfifo, the second is implicitly created by the pipeline operator, |. The remainder of the command line delays one series by one (note that tail numbers lines from 1, not 0, so tail -n +1 is the identity filter) then pastes the two series back together[2].

By appending a single awk action to the pipeline we get a consecutive sum series.

$ jot 5 | psum | tee pipe | tail -n +2 | paste - pipe | awk '{print $1 - $2}'
2
3
4
5
15

The output 2, 3, 4, 5 is the series of consecutive sums of length 1 taken from the original series 1, 2, 3, 4, 5. The trailing 15 and the 1 missed from the start are edge case problems, and easily corrected.

Accumulating an increasing series of numbers in order to find the differences between elements lying a given distance apart on this series isn’t a very smart idea on a computer with a fixed word-size, but it’s good to know (e.g.) that awk doesn’t stop counting at 32 bits.

$ let "N=1<<32" && echo $N | tee >(awk '{print $1 * $1}')
4294967296
18446744073709551616

Exactly if and when awk stops counting, I’m not sure. The documentation doesn’t say and I haven’t looked at the source code.

Bug Fixes

Let’s capture these tiny functions and name them. Here, then, are revised psum() and sdiff() filters. The edge case problems should now be fixed.

$ psum()  { awk 'BEGIN { print 0 }{print s += $1 }'; }
$ delay() { let "n = $1 + 1" && tail +$n; } 
$ sdiff() { mkfifo p.$1 && tee p.$1 | delay $1 | paste - p.$1 | \
            awk 'NF == 2 {print $1 - $2 }'; }

A quick test:

$ jot 5 | psum | sdiff 3
6
9
12

The output is, as expected, the series of sums of consecutive triples taken from 1, 2, 3, 4, 5 (1+2+3, 2+3+4, 3+4+5).

There’s a more pernicious bug, though. Sdiff can’t handle an infinite series, so it’s of limited use as a pipeline tool. For example, if we stream it the series 0, 1, 2, … (generated here as the partial sums of the series 1, 1, 1, …) nothing gets output and we have to interrupt the process.

# This command appears to hang
$ yes 1 | psum | sdiff 1
^C

I found a way to work around this problem by redirecting the teed stream to a subshell which itself redirects into a named pipe.

$ sdiff() { mkfifo p.$1 && tee >(delay $1 >p.$1) | paste - p.$1 | \
            awk 'NF == 2 {print $2 - $1 }'; }
$ yes 1 | psum | sdiff 1
1
1
1
1
^C

As you can see this gets data flowing once again, but I’m already out of my comfort zone: I couldn’t tell you why exactly this works but the preceding version doesn’t.

(You can of course write a consecutive sum function directly in awk: here’s a version which leans hard on the “do what I mean” style of programming, accessing and deleting array entries which haven’t even been set.

csum() { 
awk -v n=$1 
'{lo = NR - n; w[NR] = $1; s += $1 - w[lo]; delete w[lo]}' \
'NR >= n {print s}'
}

For this article I’ve avoided pushing the consecutive sum job awk’s way because I’m more interested in figuring out how this pipe-connection works for non-trivial cases. Any advice would be welcome!)

Merging Streams

The Unix shell merges streams rather more succinctly than Python. Sort -m does the job directly. Note that a standard sort cannot yield any output until all its inputs are exhausted, since the final input item might turn out to be the one which should appear first in the output. Merge sort, sort -m can and does produce output without delay[3].

$ yes | sort
^C
$ yes | sort -m
y
y
y
y
y
^C

Generating Primes

No doubt it’s possible to generate the infinite series of prime numbers using native Bash code, but I chose to reuse the Python Cookbook recipe for the job.

primes
#!/usr/bin/env python
import itertools

def primes():
    '''Generate the prime number series: 2, 3, 5 ... '''
    D = {}
    for n in itertools.count(2):
        p = D.pop(n, None)
        if p is None:
            yield n
            D[n * n] = n
        else:
            x = n + p
            while x in D:
                x += p
            D[x] = p

for p in primes():
    print(p)

This is a subtle little program which makes clever use of Python’s native hashed array container, the dictionary. In this case dictionary values are the primes less than n and the keys are composite multiples of these primes. The loop invariant, roughly speaking, is that the dictionary values are the primes less than n, and the corresponding keys are the lowest multiples of these primes greater than or equal to n. It’s a lazy, recursion-free variant of Eratosthenes’ sieve.

For the purposes of this article the important things about this program are:

  • it generates an infinite series of numbers to standard output[4], making it a good source for a shell pipeline;
  • by making it executable and adding the usual shebang incantation, we can invoke this Python program seamlessly from the shell.

Pipe Connection

Recall the original puzzle:

Find the smallest number that can be expressed as the sum of 5, 17, 563, 641 consecutive prime numbers, and is itself a prime number.

First, let’s check the connections by solving a simpler problem which we can manually verify: to find prime numbers which are also the sum of 2 consecutive primes. As we noted before, this is the same as finding primes numbers which are the consecutive sums of 1 and 2 primes.

In one shell window we create a couple of named pipes, c.1 and c.2, which we’ll use to stream the consecutive sum series of 1 and 2 primes respectively. The results series comprises the duplicates when we merge these pipes.

Shell 1
$ mkfifo c.{1,2}
$ sort -mn c.{1,2} | uniq -d

In another shell window, stream data into c.1 and c.2:

Shell 2
$ for i in 1 2; do (primes | psum | sdiff $i > c.$i) & done

In the first window we see the single number 5, which is the first and only prime number equal to the sum of two consecutive primes.

Prime numbers equal to the sum of three consecutive primes are more interesting. In each shell window recall the previous commands and switch the 2s to 3s (a simple command history recall and edit, ^2^3^, does the trick). The merged output now looks like this:

$ sort -mn c.1 c.3 | uniq -d
23
31
41
...

We can check the first few values:

23 = 5 + 7 + 11
31 = 7 + 11 + 13
41 = 11 + 13 + 17

At this point we’re confident enough to give the actual puzzle a try. Start up the solutions stream.

$ mkfifo c.{1,5,17,563,641}
$ sort -mn c.{1,5,17,563,641} | uniq -c | grep "5 "

Here, we use a standard shell script set intersection recipe: uniq -c groups and counts repeated elements, and the grep pattern matches those numbers common to all five input streams.

Now we can kick off the processes which will feed into the consecutive sum streams, which sort is waiting on.

$ for i in 1 5 17 563 641; do (primes | psum | sdiff $i > c.$i) & done

Sure enough, after about 15 seconds elapsed time[5], out pops the result:

$ sort -mn c.{1,5,17,563,641} | uniq -c | grep "5 "
    5 7002221

15 seconds seems an eternity for arithmetic on a modern computer (you could start up a word processor in less time!), and you might be inclined to blame the overhead of all those processes, files, large numbers, etc. In fact it took around 6 seconds for the Python program simply to generate prime numbers up to 7002221, and my pure Python solution ran in 9 seconds.

Pipe Teeing

5 × primes | sum
$ for i in 1 5 17 563 641; do (primes | psum | sdiff $i > c.$i) & done

This shell solution works but I’m not happy with it. The source loop spawns 5 separate instances of primes and pipes them all through psum. I don’t care about giving the computer more to do than necessary, but I do wish I could figure out the details of this dataflow plumbing. What I’d like to do is split the output of primes | psum five ways, connecting each resulting stream to an instance of sdiff. Something like the following:

This doesn’t work
$ mkfifo t.{1,5,17,563,641}
$ primes | psum | tee t.{1,5,17,563,641} >/dev/null &
$ for i in 1 5 17 563 641; do (sdiff $i < t.$i > c.$i) & done
$ sort -mn c.{1,5,17,563,641} | uniq -c | grep " 5 "

When I run this sequence of commands nothing happens: by which I mean something blocks and nothing gets through to sort. Again, I’m out of my depth and not sure what’s going on — any advice would be welcome. The shell makes it all too easy to experiment, and I reckon I’ve already exhausted the obvious permutations of wax, tac, crunch, wane, amp, zap, tic, pretzel.

Portability

One of the most convenient things about Python is its portability. I don’t mean “portable so long as you conform to the language standard” or “portable if you stick to a subset of the language” — I mean that a Python program works whatever platform I use without me having to worry about it. In fact I’m guilty of taking this portability for granted, and was surprised to discover a program of mine failed on Windows (the fcntl module is Unix only).

Non-portability put me off the Unix shell when I first encountered it: there seemed too many details, too many platform differences — which shell are you using? which extensions? which implementation of the core utilities, etc, etc? Readily available and well-written documentation didn’t help much here: generally I want the shell to just do what I mean, which is why I switched so happily to Perl when I discovered it.

Since then this situation has, for me, improved in many ways. Non-Unix platforms are declining as are the different flavours of Unix. Bash seems to have become the standard shell of choice and Cygwin gets better all the time. GNU coreutils predominate. As a consequence I’ve forgotten almost all the Perl I ever knew and am actively rediscovering the Unix shell.

Writing this article, though, I was reminded of the platform dependent behaviour which used to discourage me. On a Linux platform close to hand I had to use seq instead of jot, and awk formatted large integers in a scientific form with a loss of precision.

Loss of precision
$ echo '10000000001 0' | awk '{print $1 - $2}'
1e+10

On OS X the same command outputs 10000000001. I couldn’t tell you which implementation is more correct. The fix is to explicitly format these numbers as decimal integers, but the danger is that the shell silently swallows these discrepancies and you’ve got a portability problem you don’t even notice.

Precision recovered
$ echo '10000000001 0' | awk '{printf "%d\n", $1 - $2}'
10000000001

Stream Merge

I mentioned stream_merge() at the start of this article, a general purpose function written by Raymond Hettinger which I originally found in the Python Cookbook. As with the prime number generator, you might imagine the merge algorithm to be recursively defined:

  1. to merge a pair of streams, take items from the first which are less than the head of the second, then swap;

  2. to merge N streams, merge the first stream with the merged (N-1) rest.

Again the Python solution does it differently, this time using a heap as a priority queue of items from the input streams. It’s ingenious and efficient. Look how easy it is in Python to shunt functions and pairs in and out of queues.

from heapq import heapify, heappop, heapreplace

def stream_merge(*ss):
    '''Merge a collection of sorted streams.'''
    pqueue = []
    for i in map(iter, ss):
        try:
            pqueue.append((i.next(), i.next))
        except StopIteration:
            pass
    heapify(pqueue)
    while pqueue:
        val, it = pqueue[0]
        yield val
        try:
            heapreplace(pqueue, (it(), it))
        except StopIteration:
            heappop(pqueue)

A more sophisticated version of this code has made it into the Python standard library, where it goes by the name of heapq.merge (I wonder why it wasn’t filed in itertools?)

Alternative Solutions

As usual Haskell wins the elegance award, so I’ll quote in full a solution built without resorting to cookbookery which produces the (correct!) answer in 20 seconds.

module Main where

import List

isPrime x = all (\i -> 0/=x`mod`i) $ takeWhile (\i -> i*i <= x) primes

primes = 2:filter (\x -> isPrime x) [3..]

cplist n = map (sum . take n) (tails primes)

meet (x:xs) (y:ys) | x < y = meet xs (y:ys)
                   | y < x = meet (x:xs) ys
                   | x == y =  x:meet xs ys

main = print $ head $ \
(primes `meet` cplist 5) `meet` (cplist 17 `meet` cplist 563) `meet` cplist 641


[1] CPython, more precisely — I don’t think anything in the Python language itself prohibits tail recursion. Even using CPython, yet another recipe from the online Python Cookbook explores the idea of an @tail_recursion decorator.

[2] Tail is more commonly used to yield a fixed number of lines from the end of the file: by prefixing the line count argument with a + sign, it skips lines from the head of the file. The GNU version of head can similarly be used with a - prefix to skip lines at the tail of a file. The notation is {compact,powerful,subtle,implementation dependent}.

[3] Sort -m is a sort which doesn’t really sort — its inputs should already be sorted — rather like the +n option turning tail on its head.

[4] The series is infinite in theory only: at time n the number of items in the has_prime_factors dictionary equals the number of primes less than n, and each key in this dictionary is larger than n. So resource use increases steadily as n increases.

[5] I used a MacBook laptop used to run these scripts.

  Model Name:               MacBook
  Model Identifier:         MacBook1,1
  Processor Name:           Intel Core Duo
  Processor Speed:          2 GHz
  Number Of Processors:     1
  Total Number Of Cores:    2
  L2 Cache (per processor): 2 MB
  Memory:                   2 GB
  Bus Speed:                667 MHz

Productivity++ != Better

2016-07-14Comments

When it comes to productivity there seems to be an assumption that it’s a good thing, and that more equals better. I’d like to question that. What matters is producing the right stuff, whatever that is, and if you’re not sure, producing less would be better.

It can be hard to cancel work which is in progress or which has already been completed, even though, as software developers, version control can faithfully store and restore our work should we need it.

More tests does not equal better either. An increasing unit test count would seem uncontroversial, but let’s not forget we’re aiming for coverage and completeness whilst avoiding redundancy. Good unit tests and good design go hand in hand. Good unit tests avoid extended setup and good design reduces the need for combinatorial test suites. More system tests will slow down development, especially if they’re flaky.

I went to a wonderful and detailed talk at ACCU 2014 by Wojciech Seliga. He described his time as a development manager at Atlassian and the various steps taken to improve their CI builds: to speed them up and get them passing, that is. One particularly effective step was the daring elimination of an entire suite of fragile system tests. Late one evening, after a drink or two, a senior engineer just deleted them. Those tests haven’t failed since (!).

I wonder how that engineer felt. Nervous maybe, a bit reckless perhaps. But productive? Can destruction be productive?

On a similar note, one of the best changes made to the code base I work on has been to kill off unwanted configurations. We no longer support 32 bit builds: eliminating them has removed an entire class of integer conversion issues and slashed CI turnaround times. A reductive act, but one which allows us to produce more of what’s wanted.

What is wanted though?

The trouble is, if we’re not sure what to do, we like to do something, and we like to be seen to do something. We like to produce. We like to deliver. We don’t like to sit around in meetings talking about what to do. In these circumstances, though, production is questionable. More than once I have seen clean architecture crumble then decay, smothered by unwanted features.

Project dashboards can add to the problem. Graphics showing who added the most lines of code are open to misinterpretation and gaming. Who had the best ideas? Who reduced, removed or simplified? Does your dashboard display these metrics?

Remember, we are software developers, not software producers.

Go! Steady. Ready?

2016-04-18, , Comments

I’m looking forward to talking about Go at the ACCU 2016 Conference this Friday. If you’d like to find out what connects an Easter egg with flying horses and the runner in the green vest, then come along. Expect live coding, concurrent functions and answers.

>>> import antigravity

muybridge-horse

Tyntesfield 10k 2014

8 Queens Puzzle++

2016-04-05, Comments

Yesterday I wrote about a Python solution to the 8 Queens puzzle.

n = 8
sqs = range(n)

Qs = (Q for Q in itertools.permutations(sqs)
      if n == len({Q[i]+i for i in sqs})
           == len({Q[i]-i for i in sqs}))

It’s possible to reproduce this strategy in C++:

The std::next_permutation algorithm stands alone in the C++ standard library. Used here, it pairs well with the similarly uncommon do ... while loop. The solution depends on the vector sqs starting off in sorted order, and by the end of the loop the vector will have been returned to this state.

8 Queens Puzzle

2016-04-04, Comments

♛♛♛♛♛♛♛♛

Here’s one of my favourite recipes, by Raymond Hettinger, lightly adapted for Python 3.

from itertools import permutations

n = width_of_chessboard = 8
sqs = range(n)

Qs = (Q for Q in permutations(sqs)
      if n == len({Q[i]+i for i in sqs})
           == len({Q[i]-i for i in sqs}))

We start by assigning sqs to the range 0 through 7.

>>> sqs = range(8)
>>> list(sqs)
[0, 1, 2, 3, 4, 5, 6, 7]

The range has 8 indices. If each index represents a column on a standard 8x8 chessboard and the value at that index represents a row on the same chessboard, then our range represents 8 positions on the board. Using the built-in enumerate function to generate these (index, value) pairs we see that sqs encodes the diagonal (0, 0) to (7, 7):

>>> list(enumerate(sqs))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7)]

Next, permute the values — the rows.

>>> from itertools import permutations
>>> rooks = permutations(sqs)
>>> next(rooks)
(0, 1, 2, 3, 4, 5, 6, 7)
>>> next(rooks)
(0, 1, 2, 3, 4, 5, 7, 6)
>>> next(rooks)
(0, 1, 2, 3, 4, 6, 5, 7)
>>> list(rooks)[34567]
(6, 7, 0, 1, 3, 4, 5, 2)

Itertools.permutations generates values lazily. The snippet above shows the first two results, then skips forward 34568 places. Permutations(sqs) generates all possible arrangements of 8 pieces on a chessboard such that each row has exactly one piece on it and so does each column. That is, it generates all possible ways of placing 8 rooks on a chessboard so that no pair attacks each other.

In the final program, we filter these rook positions to generate solutions to the more challenging — and more interesting — eight Queens puzzle.

Consider our starting point, the diagonal (0, 0) to (7, 7)

>>> diagonal = range(8)
>>> {r-c for c,r in enumerate(diagonal)}
{0}
>>> {r+c for c,r in enumerate(diagonal)}
{0, 2, 4, 6, 8, 10, 12, 14}

Here, a set comprehension collects the distinct values taken by the difference between the row and column along this diagonal, which in this case gives {0}. That is, if we placed 8 bishops along this ↗ diagonal they would all attack each other along this diagonal. The sum of the row and column takes 8 distinct values, however, meaning no pair attacks along a ↖ diagonal.

Comparison operators chain in Python, so the expression:

n == len({Q[i]+i for i in sqs}) == len({Q[i]-i for i in sqs})

is True if both sets have 8 elements, that is, if the squares in Q are on distinct ↖ and ↗ diagonals; or, equivalently no pair of bishops placed on the squares in Q would attack each other. Since we already know Q positions 8 rooks so that no pair attacks each other, and a chess Queen combines the moves of a rook and a bishop, we can see that Qs generates every possible way of placing 8 Queens on a chessboard so that no pair attacks each other: which is to say, we’ve solved the 8 Queens puzzle.

Qs = (Q for Q in permutations(sqs)
      if n == len({Q[i]+i for i in sqs})
           == len({Q[i]-i for i in sqs}))

This is beautiful code and there’s one final twist.

Qs is a generator expression primed to permute squares into neighbourly rooks filtered by amicable bishops yielding unthreatening Queens. Until asked, however, it does nothing.

♕♕♕♕♕♕♕♕

Easy as Py

2016-03-23, , Comments

What makes Python Simple?

I consider Python a simple language. Here’s why.

Easy to Read

I can read and understand Python code (unless it’s wilfully perverse). Syntactic whitespace and the associated removal of punctuation results in a regular, open layout. The combination of built in containers, extensive standard libraries and high level constructs allow for clear, compact code: code which fits in your head.

Easy to Write

I can write Python code which is free of syntax errors and which does what I want. Of course it helps that I’ve been actively using the language for 15 years, but I’ve been using C++ for longer and still make mistakes with it: ask me to declare a pointer to a member function, for example, or to knock up a variadic template function, and I’ll need a moment or two.

Transparent

I also consider C a simple language. C offers a transparent abstraction of a register machine, with a stack, a heap, and addressable memory. If you can imagine the operation of such a machine, you can figure out C. Python is less transparent but reveals its workings if pressed. Dicts form a part of the language seen by users, and under the hood they provide the dynamic context which supports a running program. The read-eval-print loop makes it easy to poke and reshape your program. You can disassemble code to see what the virtual machine sees.

Consistent improvement

The language has got better since I first started using it. It has also got bigger, and this growth would, at first, seem at odds with simplicity. However, consider — as an example — the point when list comprehensions were introduced. Language support for building a list from an iterable results in compact declarative code. Simple code. What’s more, the square brackets which now delimit list comprehensions are the same square brackets that were previously used to delimit lists. The syntax may have been new but it didn’t surprise. Now consider the introduction of set and dict comprehensions, which follow logically and naturally from list comprehensions, almost as if they were discovered rather than invented.

There are many other examples where additions to the language have unified and simplified.

Vision

I’m not a Python insider and cannot comment on the exact balance of benevolence and dictatorship which goes into the language enhancement process. I would say Python doesn’t suffer from being designed by a committee. It sticks to its strengths and its direction, to its vision.

Sausages, sausages, sausages - slice, slice, slice

2016-03-21, Comments

A friend asked for help reaching the next level of a puzzle game. The test which stalled her involves machine placement in a sausage factory.

… each sausage was branded with a letter for quality control purposes, thus: ypbtkizfgxptclcoirdsuhjwulqkoszrabfc

The string was then drawn through seven machines which rearranged the sausages in flavour enhancing ways.

Machine A: The Reversifier

Reverses the order of the sausages, so they get tastier as you go along.

Machine G: Secondhalffirstifier

move the second half of the string to the beginning, as the earlier sausages are too spicy to eat early in the morning.

He attached these machines in a certain sequence, though one of them was out for repair so only six were used. He then fed a string of sausages through and was surprised to discover the string that came out at the other end said lickyourlips. What order were the machines in?

It’s nicely phrased, but what’s really wanted is the sequence of simple transformations that takes input “ypbtkizfgxptclcoirdsuhjwulqkoszrabfc” and produces output “lickyourlips”.

It’s no doubt possible to work backwards and figure out a solution using no more than logic, pencil and paper. For example, only two of the machines change the length of the string, and — looking at the before and after lengths — these must both be used. It’s rather easier to write a short program to find a solution.

First we must simulate the seven sausage machines, A-G, which perform the following sequence operations.

  1. reverse the order of a sequence
  2. remove every other element of a sequence
  3. remove every third element of a sequence
  4. pairwise reverse elements of a sequence
  5. move even numbered elements to the front of a sequence
  6. move the last element of a sequence to the front
  7. swap the front and back half of a sequence

None of these is difficult, especially in a high-level language which builds in support for sequence operations. What I found noteworthy is that a solution can be found without any loops or if statements. What’s more, every operation can handled using nothing more than slice operations.

Here’s my solution. The machines consist of slice operations, helped by a couple of conditional expressions and recursive calls. The solution can then be brute-forced: there are only 5040 ways of permuting 6 out of 7 machines.

I’ve used reduce to apply a chain of functions to a string of sausages — an explicit loop might be clearer, but I want a loop-free solution. For this same reason I use recursion in the pairwise swapper and the element dropper. Generally in Python, recursion is a poor choice. In this case I know I’m starting with a string of just 36 elements which cannot get any longer; there’s no risk of exceeding the system recursion limit.

The sequence reversal s[::-1] is idiomatic but alarming to the uninitiated. Slices have [start:stop:stride] fields, any of which may be defaulted. Usually start and stop default to the start and end of the sequence, but in this case the negative stride reverses them.

To rotate the last element of a sequence to the front, prefer:

return s[-1:] + s[:-1]

to:

return [s[-1]] + s[:-1]

because the latter raises an IndexError for an empty sequence.

Slicing is a formidable tool for sequence manipulation, especially when combined with the option of using negative indices to count back from the end. Slices allow you to reverse, rotate and partition sequences, to pairwise swap elements, and to drop every nth element.

The miniature recipes presented here don’t even use slice assignment, which gives me an excuse to reproduce this elegant prime sieve function, which does.

Gofmt knows best

2016-03-07Comments

Everyone’s favourite

Gofmt’s style is no one’s favorite, yet gofmt is everyone’s favorite. — Rob Pike

Code formatting tools are nothing new but Go notches them up a level. Your Go installation comes with an automated formatter, gofmt, and this tool has been used to layout the entire code base. As a result, Go code looks uniform and familiar. The recommendation is to gofmt your own code, ideally on save and certainly before submitting for review.

(add-hook 'before-save-hook 'gofmt-before-save)

Formatting tools for other languages require configuration. You must instruct the tool about your preferences for spaces and braces. Gofmt is inflexible and this is a strength.

The language designers did Go a favour by supplying and promoting gofmt early on, before alternative style guides had been developed and adopted. There can be no more arguments about tabs vs spaces, for example. Code reviews must focus on content rather than layout.

There are also more subtle benefits. Certain kinds of manual layout are fragile: the insertion of space to vertically align assignment statements, for example. A code formatter gets it right every time.

Go knows best

As writers — and programmers are, fundamentally, writers — maybe we don’t want our work to look uniform or familiar. Maybe we prefer more control over what we produce. Maybe sometimes, we’d like to surprise or provoke.

Does the regularity gofmt imposes render our code bland, even boring?

I’ve heard this point extended to be a criticism of Go in general: that the language restricts what you can do because the language designers know best, and that it’s for your own good.

Well, if I’ve ever submitted extravagently formatted code for review, it’s been slapped back. Substantial programs are developed by teams who adopt idiom and share style rules for good reason. It’s how we work together. Imagine how much easier it would be if the third party code in your code base was also written to your preferred style. Go gives you that, so long as you prefer its native style.

Gofmt does not completely over-ride a programmer’s choices. Two programs which are semantically identical but laid out differently will probably still be laid out differently after a pass through gofmt.

When formatters fail

Formatting sometimes fails. Some examples follow.

Decluttered Java

Here’s some creatively formatted Java. Of course it’s a joke, and a funny one, but a language like Python, and indeed go, gets the last laugh. It is possible to get rid of braces and semicolons and your code will look better for it.

Tetrominoes

Here’s an equally witty but perfectly sensible use of non-standard layout.

Mathias is referring to some tetrominoes — tetris shapes — coded in elm by Joseph Collard.

-- A Tetromino is a list of Locations. By definition, a valid tetromino
-- should contain exactly 4 different locations
type Tetromino = [Location]

line : Tetromino
line = [(0,0), (1,0), (2,0), (3,0)]

square : Tetromino
square = [(0,0), (1,0),
          (0,1), (1,1)]

spiece : Tetromino
spiece = [       (1,0), (2,0),
          (0,1), (1,1)]

jpiece : Tetromino
jpiece = [       (1,0),
                 (1,1),
          (0,2), (1,2)]

For brevity, I’ve left out the Z and L pieces. I’ve also omitted comments from the original, which I would suggest are redundant since the code has been painstakingly formatted to indicate the shape of the pieces.

The same tetrominoes in hand-formatted Go read:

package tetris

type Tetronimo [4][2]int

var (
    Line   = Tetronimo   {{0, 0}, {1, 0}, {2, 0}, {3, 0}}
    
    Square = Tetronimo   {{0, 0}, {1, 0}, 
                          {0, 1}, {1, 1}}
    
    Spiece = Tetronimo   {        {1, 0}, {2, 0}, 
                          {0, 1}, {1, 1}}
    
    Jpiece = Tetronimo   {        {1, 0}, 
                                  {1, 1}, 
                          {0, 2}, {1, 2}}
)

Gofmt spoils them:

package tetris

type Tetronimo [4][2]int

var (
	Line = Tetronimo{{0, 0}, {1, 0}, {2, 0}, {3, 0}}
    
	Square = Tetronimo{{0, 0}, {1, 0},
		{0, 1}, {1, 1}}
    
	Spiece = Tetronimo{{1, 0}, {2, 0},
		{0, 1}, {1, 1}}
    
	Jpiece = Tetronimo{{1, 0},
		{1, 1},
		{0, 2}, {1, 2}}
)

Obfuscation

Here’s a tiny rectangular program submitted by Thaddaeus Frogley to the International Obfuscated C Code Contest in 2001. It’s meant to be hard to read and the compact layout helps, by hindering.

/*(c) 2001 Thad */
#include<string.h>
#include <stdio.h>
#define abc stdout
int main(int a,ch\
ar*b){char*c="??="
"??(??/??/??)??'{"
"??!??>??-";while(
!((a=fgetc(stdin))
==EOF))fputc((b=s\
trchr(c,a))?fputc(
fputc(077,abc),abc
),"=(/)'<!>""-"??(
b-c??):a, abc);??>

In the author’s own words:

I wanted to create a piece of code that could be considered to be “art” on multiple levels.

He’s succeeded. It’s impossible to translate this program into Go so I cannot gofmt it. If, instead, I try clang-format, the resulting code no longer compiles! Obfuscation has bamboozled the formatter.

/*(c) 2001 Thad */
#include <stdio.h>
#include <string.h>
#define abc stdout
int main(int a, ch\
ar *b) {
  char *c = "??="
            "??(??/??/??)??'{"
            "??!??>??-";
  while (!((a = fgetc(stdin)) == EOF))fputc((b=s\
trchr(c,a))?fputc(
fputc(077,abc),abc
),"=(/)'<!>""-"??(
b-c??):a, abc);
? ? >

Depth First Search

Finally, here’s a recursive depth first search from the boost graph library

template <class IncidenceGraph, class DFSVisitor, class ColorMap,
          class TerminatorFunc>
void depth_first_visit_impl
  (const IncidenceGraph& g,
   typename graph_traits<IncidenceGraph>::vertex_descriptor u,
   DFSVisitor& vis,  // pass-by-reference here, important!
   ColorMap color, TerminatorFunc func)
{
  typedef typename graph_traits<IncidenceGraph>::vertex_descriptor Vertex;
  typedef typename property_traits<ColorMap>::value_type ColorValue;
  typedef color_traits<ColorValue> Color;
  typename graph_traits<IncidenceGraph>::out_edge_iterator ei, ei_end;
  
  put(color, u, Color::gray());          vis.discover_vertex(u, g);
  
  if (!func(u, g))
    for (boost::tie(ei, ei_end) = out_edges(u, g); ei != ei_end; ++ei) {
      Vertex v = target(*ei, g);           vis.examine_edge(*ei, g);
      ColorValue v_color = get(color, v);
      if (v_color == Color::white()) {     vis.tree_edge(*ei, g);
      depth_first_visit_impl(g, v, vis, color, func);
      } else if (v_color == Color::gray()) vis.back_edge(*ei, g);
      else                                 vis.forward_or_cross_edge(*ei, g);
      call_finish_edge(vis, *ei, g);
    }
  put(color, u, Color::black());         vis.finish_vertex(u, g);
}

Clients of the function supply a visitor, vis, which is called back as vertices and edges are discovered and classified. These callbacks are carefully placed on the right hand side, visually distinguishing the core traversal from associated events. Note too the alignment, with edge events indented to the right of vertex events. Again, a code formatter tramples over this elegantly shaped code:

template <class IncidenceGraph, class DFSVisitor, class ColorMap,
          class TerminatorFunc>
void depth_first_visit_impl(
    const IncidenceGraph &g,
    typename graph_traits<IncidenceGraph>::vertex_descriptor u,
    DFSVisitor &vis, // pass-by-reference here, important!
    ColorMap color, TerminatorFunc func) {
  typedef typename graph_traits<IncidenceGraph>::vertex_descriptor Vertex;
  typedef typename property_traits<ColorMap>::value_type ColorValue;
  typedef color_traits<ColorValue> Color;
  typename graph_traits<IncidenceGraph>::out_edge_iterator ei, ei_end;
  
  put(color, u, Color::gray());
  vis.discover_vertex(u, g);
  
  if (!func(u, g))
    for (boost::tie(ei, ei_end) = out_edges(u, g); ei != ei_end; ++ei) {
      Vertex v = target(*ei, g);
      vis.examine_edge(*ei, g);
      ColorValue v_color = get(color, v);
      if (v_color == Color::white()) {
        vis.tree_edge(*ei, g);
        depth_first_visit_impl(g, v, vis, color, func);
      } else if (v_color == Color::gray())
        vis.back_edge(*ei, g);
      else
        vis.forward_or_cross_edge(*ei, g);
      call_finish_edge(vis, *ei, g);
    }
  put(color, u, Color::black());
  vis.finish_vertex(u, g);
}

Conclusion

Creative code formatting does have a place. Generally, though, gofmt gets it right: an automated tool can and should take care of formatting our code base, and the benefits are amplified if the tool is inflexible. With formatting solved we can focus our creative energy where it counts.

Sledgehammers vs Nut Crackers

2016-02-23, Comments

Awk

I get awk and can read awk programs but find the language of little use. Its focus is narrow and its syntax can be surprising. Python matches it at home and smashes it away. Nonetheless, a number of the advent of code puzzles fit the awk processing model — line based instructions, the interpretation of which builds state contributing to the final result — and I enjoyed writing awk solutions. There’s satisfaction in using a tool which is up to the job, no more and no less: in using a nutcracker, rather than a sledgehammer, to crack a nut.

Puzzle

For example, the puzzle set on day 6 consists of a list of instructions for switching and toggling a 1000 x 1000 grid of lights. The instructions read:

turn on 296,50 through 729,664
turn on 212,957 through 490,987
toggle 171,31 through 688,88
turn off 991,989 through 994,998
....

and the question is, after following these instructions, how many lights are lit?

Each instruction is a single line; the actions — turn on, turn off, toggle — can be matched by patterns; and to follow these actions requires no more than an array and arithmetic: awk fits nicely.

Code

Comments

Here, the BEGIN action sets the field separator FS to the regular expression [ ,] which picks out the textual and numeric fields from each instruction. Awk is highly dynamic: use a variable as a number and it becomes a number, in this case the coordinates of a lighting grid; and similarly, the fields “on”, “off” and “toggle” are matched and treated as strings.

The grid of lights appears to be represented as a two dimensional array, accessed lights[x,y] rather than lights[x][y]. In fact, the array — like all arrays in awk — is an associative container, which maps from strings — not numbers — to values. The key x,y becomes a string which joins the values of x and y with a separator defaulted to "\034".

Niggles

The escape character at the end of line 5 is needed to continue the long line. I’d prefer to use parentheses to wrap the expression over more than one line, as I would in Python, but this trick doesn’t seem to work. I was somewhat surprised there was no built in sum() function to count up the number of lights turned on by the END. It would have been cute to pass on(), off() and toggle() as functions into switch(), separating traversal from action, but I couldn’t find a way to do this in awk.

My awk script solved the puzzle in 45 seconds. A Python solution took 17 seconds. I didn’t try optimising either.

Don’t use a sledgehammer to crack a nut!

This advice, commonly given to programmers, demands explanation. If it’s intended to imply a sledgehammer is more likely to pulverise the nut than open it, then fine, that’s true — but the analogy fails in this case: a solution written in Python would have been equally correct.

Alternatively, if we mean you shouldn’t use a powerful language when a less powerful one would do, then the question becomes: why not? Python is a general purpose programming language. It can crack nuts, peel bananas, serve web pages and so much more. If you know Python why bother with Awk?

At the outset of this post I admitted I don’t generally bother with awk. Sometimes, though, I encounter the language and need to read and possibly adapt an existing script. So that’s one reason to bother. Another reason is that it’s elegant and compact. Studying its operation and motivation may help us compose and factor our own programs — programs far more substantial than the scripts presented here, and in which there will surely be places for mini-languages of our own.

Advent of Code

2016-02-22, , Comments

I enjoyed solving Eric Wastl’s excellent Advent of Code programming puzzles over Christmas last year. The puzzles are nicely themed and the site itself is a delight: lighting up an ascii art Christmas tree, row by row, became an addiction.

Advent of Code is a series of small programming puzzles for a variety of skill levels. They are self-contained and are just as appropriate for an expert who wants to stay sharp as they are for a beginner who is just learning to code.

First time through, I blasted through all the puzzles using Python, leaning on its standard containers, algorithms and text munging capabilities: “an expert who wants to stay sharp”, if you like. Now I’m tackling the problems again using languages and techniques I’m less comfortable with: “a beginner who is just learning to code”, which I like.

Second time round is fun, slow and instructive. I’m posting solutions on github.

ascii xmas tree

Code Reviews - the rules

2015-08-05, , , Comments

The rule is: no code gets checked in without a review.

It’s not always easy to get a reviewer to sign off a changelist. Does the code build? In all configurations and on all platforms? Do the tests pass? Are all new code paths covered? Does the commit message describe the change? Does the formatting match the style guide? Does the code match its surroundings? How about documentation, compiler warnings, license requirements?

Is the change really necessary? Could it have been realised more simply?

Certainly the reviewer’s task is easier if the task has been paired on. Small and self-contained changelists are more straightforward. Removing code, too, should be less contentious.

Depending on infrastructure, some checklist items can be automated. Ideally the changelist has already been though CI, for example, ticking the builds-cleanly and passes-its-tests boxes.

So far, so what? (So obvious!)

Here’s another rule, and one I’ve not seen written down before.

When I review code I might well consider how I would have made the change. That doesn’t mean I’ll insist the submitter does things my way. In the absence of automated formatters there will be more than one acceptable way to lay out the code. Sometimes there’s little reason to prefer an explicit loop over an algorithm + lambda combination, or vice-versa. Short names work for me but not for everyone. It’s hard to argue against test coverage, but is more always better?

In such cases I won’t try to impose my own style on the changelist. Instead, the question becomes: does the code match the standards we, as a team, have set? Or, do these changes merit a place in our codebase?

It’s a simple principle but not an obvious one. It helps me review fairly and also to learn from others on my team.

There is more than one way to do it!

Programming Paired and Shared

2015-05-27, , Comments

We pair program where I work. Two people, one desk. It can get intense.

Floor cleaner Emacs

Editor wars claim few casualties — until an emacs and a vim user are forced to share a keyboard and screen, that is. Anyone for notepad?

Sharing the typing isn’t why we pair. Where I work we also do code reviews. Whilst pair programming is encouraged, code reviews are mandated: code cannot be checked in until approved by a reviewer. That reviewer could be the person you paired with, and it soon becomes apparent that reviews conducted with a pair go more smoothly than ones where the reviewer is new to the task. It’s hard for someone to review a tricky changeset without the context of its development.

The context of its development turns out to be paramount for understanding a codebase, not just a changeset. Pair programming helps provide that context.

The term pairing serves better than pair programming. The former suggests sharing; the latter, typing. The benefits come from sharing all aspects of the work: decisions and responsibility; research, design, development; how to test; where to cut corners and where to go beyond the immediate requirement; when to take a break. Anyone for table football?

Edd and Bridget vs Bridget and Edd

Jokey Code?

2015-05-11, Comments

Choose Talks

I usually leave it as late as possible before deciding which sessions to attend at the ACCU conference. There’s ample time to study the schedule when you’re there and anyway, it’s subject to change.

This year I stuck with this policy with a couple of notable exceptions. First, there was my own talk, where I noted that computer programmers are writers who impose formal constraints on their texts, and who can learn from other writers who practice the same discipline. Second, there was Peter Hilton’s talk, “How to name things — the hardest problem in programming”, in which he argued more generally that programmers had much to learn from writers.

I had to see Peter Hilton’s talk and it did not disappoint. It engaged me at the time, in discussions afterwards, and it continues to make me think. I won’t post a full response here but I do want to consider one aspect: humour.

Tell Jokes

Peter Hilton argued we should pay attention to tips from the likes of George Orwell and Stephen King because their advice is better written and also because it’s funnier. Why does this humourous aspect matter? Well, perhaps we’re more likely to listen to someone who makes us laugh. Witty advice sounds less pompous.

It goes deeper than this, though. Another point from the talk:

Improve your general vocabulary. Read books, especially funny novels.

Why funny novels? At this point in the talk Peter Hilton disingenously suggested such books were easier to read. A point he made later goes deeper:

Tell jokes … Puns are important for naming, because they rely on double-meanings. Spotting double-meanings is the essential skill for avoiding ambiguous names.

Interesting! I agree that word play and word power are linked: but do you need to be a punster to avoid ambiguity? I’m not sure. In the words of Chris Oldwood:

I’ve been writing more functional code lately. I recently tried a few numerical recipes with currying functions, but all I got was a NaN.

all I got was a NaN

Laughable Code

Is there a place for humour in code? Rarely, I’d say. Code is read, re-read and then read again: most jokes become tired and then irritating under such scrutiny. Peter Hilton, though, described his amusement on discovering this function, which configures and starts Apache Camel:

/** Configure and start Apache Camel */
def mountCamel() {
    Logger.info("Starting Camel...")
    val context = new DefaultCamelContext()
    configuredRoutes foreach { route =>
        context.addRoutes(route)
    }
    context.start()
}

The obvious alternative, startCamel(), just isn’t funny enough, apparently. I’m glad the author resisted the temptation to call it humpCamel().

I’m reminded of a colleague with a fondness for franglais who would, for example, check in a graphics routine called do_le_render(). Mildly amusing first time round, maybe, but less so each time it got revisited to fix les bugs.

Ark of Desert - Camel

I don’t see many jokes in the code I read and I don’t think it’s because the authors lack a sense of humour. Just as good documentation should inform rather than entertain, good code should express complex ideas as plainly as possible: humour doesn’t get a look in.

There are exceptions. We’ve already seen the name “camel” used for something which isn’t a camel: libraries, products and projects can benefit from short, memorable and quirky names. In unit tests, too, code is less complex, which can leave space for quips and in-jokes. When an integer is expected it often turns out to be 42. Binary input data is laid out to form strange messages when viewed as hexadecimal. If some text — any old text — is needed, lorem ipsum would indicate a lack of imagination. Even so, well-chosen names and values from the domain under test would probably be more helpful when a failing test needs fixing.

Charming code

I’m down on jokes in code but I do think code can be a pleasure to read and that well written programs can delight. Whilst I agree with Peter Hilton’s recommendation to read widely and well, I was surprised he didn’t recommend reading code, and when I discussed this with him afterwards he asked, where is it? Where is it! That will be the subject of another article, but off the top of my head, freetype, ffmpeg, llvm, the Go and Python standard libraries. If you read through any of these you’ll enjoy their clarity and internal consistency — their style; and should you code against them, these same attributes show through their interfaces.

I haven’t found any jokes in the SQLite source but if you decide to integrate it in your application, when you check the license you’ll find, instead, a blessing. This may seem funny — unusual, certainly — but it’s actually quite serious.

May you do good and not evil
May you find forgiveness for yourself and forgive others
May you share freely, never taking more than you give.

We may not issue blessings with our own code but maybe we can find other ways to surprise and delight.