Power programming

2010-01-26, , , Comments

Powerful or dangerous?

Recently I wrote about one of the Google Code Jam challenges, where, perhaps surprisingly, the best answer — the most elegant and obviously correct answer, requiring the fewest lines of code, with virtually zero space overhead, and running the quickest — the very best answer was coded in C++.

Why should this be surprising? C++ is a powerful language.

In my experience there is almost no limit to the damage that a sufficiently ingenious fool can do with C++. But there is also almost no limit to the degree of complexity that a skillful library designer can hide behind a simple, safe, and elegant C++ interface.

— Greg Colvin, “In the Spirit of C”

Yes. And yes! But in this article I wanted to discuss something C++ can’t do. Let’s start with another challenge from the same round of the 2009 Google Code Jam.

Decision Trees

Decision trees — in particular, a type called classification trees — are data structures that are used to classify items into categories using features of those items. For example, each animal is either “cute” or not. For any given animal, we can decide whether it is cute by looking at the animal’s features and using the following decision tree.

(0.2 furry
  (0.81 fast
    (0.3)
    (0.2)
  )
  (0.1 fishy
    (0.3 freshwater
      (0.01)
      (0.01)
    )
    (0.1)
  )
)

Decision Trees, Google Code Jam 2009

Cute beaver!

The challenge goes on to describe the structure more formally, then steps through an example calculation. What is the probability, p, that a beaver is cute?

For example, a beaver is an animal that has two features: furry and freshwater. We start at the root with p equal to 1. We multiply p by 0.2, the weight of the root and move into the first sub-tree because the beaver has the furry feature. There, we multiply p by 0.81, which makes p equal to 0.162. From there we move further down into the second sub-tree because the beaver does not have the fast feature. Finally, we multiply p by 0.2 and end up with 0.0324 — the probability that the beaver is cute.

Decision tree calculation

The challenge itself involves processing input comprising a number of test cases. Each test case consists of a decision tree followed by a number of animals. A solution should parse the input and output the calculated cuteness probabilities.

Python, Surprise me!

2009-12-15Comments

A Simple Function

Here’s a simple function which converts the third item of a list into an integer and returns it, returning -1 if the list has fewer than three entries or if the third entry fails to convert.

def third_int(xs):
    '''Convert the third item of xs into an int and return it.
        
    Returns -1 on failure.
    '''    
    try:
        return int(xs[2])
    except IndexError, ValueError:
        return -1

Unfortunately this simple function is simply wrong. Evidently some exceptions aren’t being caught.

>>> third_int([1, 2, 3, 4])
3
>>> third_int([1])
-1
>>> third_int(('1', '2', '3', '4',))
3
>>> third_int(['one', 'two', 'three', 'four'])
Traceback (most recent call last):
    ....
ValueError: invalid literal for int() with base 10: 'three'

How ever did a ValueError sneak past the except clause?

Next permutation: When C++ gets it right

2009-11-19, , , , Comments

The Next Number Problem

Suppose you have a fixed list of digits chosen from the range 1..9. What numbers can you make with them? You’re allowed as many zeros as you want. Write the numbers in increasing order.

Exactly this puzzle came up in the recent Google Code Jam programming contest:

You are writing out a list of numbers. Your list contains all numbers with exactly Di digits in its decimal representation which are equal to i, for each i between 1 and 9, inclusive. You are writing them out in ascending order.

For example, you might be writing every number with two ‘1’s and one ‘5’. Your list would begin 115, 151, 511, 1015, 1051.

Given N, the last number you wrote, compute what the next number in the list will be.

The competition has closed now, but if you’d like to give it a go sample input files can be found on the website, where you can also upload your results and have them checked.

Here’s a short section from a trial I ran on my computer. Input numbers are in the left-hand column: the corresponding output numbers are in the right-hand column.

50110812884911623516 → 50110812884911623561
82454322474161687049 → 82454322474161687094
82040229261723155710 → 82040229261723157015
43888989554234187388 → 43888989554234187838
76080994872481480636 → 76080994872481480663
31000989133449480678 → 31000989133449480687
20347716554681051891 → 20347716554681051918

Choice of Algorithm

Like many of the code jam challenges, you’ll need to write a program which runs fast enough; but choosing the right algorithm is more important than choosing the right language. Typically a high-level interpreted language like Python allows me to code and test a solution far more quickly than using a low-level language like C or C++.

In this particular case, though, like most successful candidates, I used C++. Here’s why.

Python on Ice

2009-10-28Comments

A moratorium on Python changes is probably a good thing—the last edition of my book nearly made my head explode. — @dabeaz

Python?

“Python?”

“Yes, Python. It’s a high-level language, we used it for the prototype. We can use it for parts of the system where performance isn’t critical. Connecting components together. The web server.”

“But what will we do when Python changes? It’s a developing language, right? How can we maintain our system.”

Not an issue, I explained. Python takes backwards compatibility very seriously. Besides, we choose which version of Python to deploy with, we choose when we migrate — maybe never. Look, you can download the source for every version of Python ever released. All you need is a C compiler. C is the porting layer, if you like, and C isn’t going anywhere in a hurry.

In all honesty, I expected more maintenance issues with the C++ parts of our product, where the language may not have changed in a decade but compilers are only just catching up with it; and in fact I didn’t have to argue for long to persuade senior management, not on this issue at least. They’d already seen how quickly I could get things up and running using Python. Even though the company had more experience with C, C++, Java, and even .Net, I convinced them Python had a role on the server-based system we were developing.

Nonetheless, I didn’t think it the right time to mention Python 3. Why confuse things?

Steady on Subversion

2009-10-13Comments

Subversion banner

My secret shame

In a world of distributed version control systems I’m ashamed to confess I still use Subversion. We use it at work, exclusively. I use it at home, by default. Worse still, in a small way, I help promote and perpetuate this antiquated version control system: if you want to mirror a Subversion repository or set up a Subversion pre-commit hook, you may well find some faded notes I wrote on these subjects.

Whisper the words. I still like Subversion.

What do I like most? The command to revert a change. Merge it backwards.

$ svn merge --change -666

Branches and tags are one and the same. For someone who grew up with CVS, that’s quite a relief. Anyone can grasp the versioned file tree model. No one wants their version control system to surprise them. My boss, who doesn’t get to program as much as he’d like, has discovered a shiny Subversion client — and he doesn’t even use Windows. The sales team, who do use Windows, can use Subversion to collaborate on their office documents. TortoiseSVN interfaces to the Word diff tool, a nice touch. And software developers can surely find stable Subversion plug-ins for whatever tools they use.

The Subversion documentation is solid and has been for some while. I’m surprised anyone ever arrives at my website seeking tips, but arrive they do, and in ever-increasing numbers.

Subversion does enough. The hard parts of my job are deciding what software to write, writing it, and working as a team. Version control should be frictionless, the easy bit. Which it is.

Favicon

2009-09-16, , Comments

On the subject of this site, I wanted to mention the recent addition of a favicon Little chap favicon. Per pixel, it’s cost me more effort than any other feature; but then it’s accessed more than any other asset. It’s meant to be a piece from a jigsaw puzzle. I got the idea when re-reading Life A User’s Manual. I like puzzles and piecing things together.

Life A User's Manual

Perec’s great masterpiece is packed with interwoven stories and trickery, but at its heart is the epic battle between the millionaire, Bartlebooth, and the puzzle-maker Gaspard Winckler. Bartlebooth begins his campaign by learning how to paint, which takes him 10 years. For the next 20 years he travels the world, painting a water colour picture of a different port every couple of weeks. He sends the paintings back home to Paris. On receipt, Winckler glues each picture to a board which he then cuts, making a series of jigsaw puzzles for Bartlebooth to solve on his return. Once Bartlebooth completes each puzzle, an ingenious process is used to glue its pieces together and re-join the cut fibres of the paper; then the picture itself is lifted from the board, returned to the port it depicts, and washed clean in the sea; and finally the paper is returned in something close to its original state to Bartlebooth.

Thus, after 50 years of work, there will be nothing to show.

French jigsaw pieces

In the book’s preamble Perec describes familiar die-cut jigsaws, classifying the best known pieces of such puzzles as “little chaps”, “double crosses” and “crossbars”. Such diversions are eschewed by the true puzzler:

The art of jigsaw puzzling begins with wooden puzzles cut by hand, whose maker undertakes to ask himself all the questions the player will have to solve, and, instead of allowing chance to cover his tracks, aims to replace it with cunning, trickery and subterfuge. All the elements occurring in the image to be reassembled — this armchair covered in gold brocade, that three-pointed black hat with its rather ruined black plume, or that silver-braided bright yellow livery — serve by design as points of departure for trails that lead to false information.


My thanks to Tim Beard for scanning a couple of pages from his edition of La Vie, Mode d’Emploi. I wanted to know what the “little chaps” etc. were before they got translated into English. I realise my favicon Little chap favicon could be improved but I don’t know how to go about it. Anyone?

YouTube Favicon - improvedYouTube Favicon - improvedYouTube Favicon - improved

Code Rot

2009-09-03, Comments

Those of us who have to tiptoe around non-standard or ancient compilers will know that template template parameters are off limits.

Hubert Matthews (PDF)

Dvbcodec Fail

Long ago, way back in 2004, I wrote an article for Overload describing how to use the Boost Spirit parser framework to generate C++ code which could convert structured binary data to text. I went on to republish this article on my own website, where I also included a source distribution.

Much has changed since then. The C++ language may not have, but compiler and platform support for it has improved considerably. Boost survives — indeed, many of its libraries will feed into the next version of C++. Overload thrives, adapting to an age when printed magazines about programming are all but extinct. My old website proved less durable: I’ve changed domain name and shuffled things around more than once. But you can still find the article online if you look hard enough, and recently someone did indeed find it. He, let’s call him Rick, downloaded the source code archive, dvbcodec-1.0.zip, extracted it, scanned the README, typed:

$ make

… and discovered the code didn’t even build.

At this point many of us would assume (correctly) the code had not been maintained. We’d delete it and write off the few minutes it took to evaluate it. Rick decided instead to contact me and let me know my code was broken. He even offered a fix for one problem.

Code Rot

Sad to say, I wasn’t entirely surprised. I no longer use this code. Unused code stops working. It decays.

I’m not talking about a compiled executable, which the compiler has tied to a particular platform, and which therefore progressively degrades as the platform advances. (I’ve heard stories about device drivers for which the source code has long been lost, and which require ever more elaborate emulation shims to keep them alive.) I’m talking about source code. And the decay isn’t usually literal, though I suppose you might have a source listing on a mouldy printout, or an unreadable floppy disk.

No, the code itself is usually a pristine copy of the original. Publishers often attach checksums to source distributions so readers can verify their download is correct. I hadn’t taken this precaution with my dvbcodec-1.0.zip but I’m certain the version Rick downloaded was exactly the same as the one I created 5 years ago. Yet in that time it had stopped working. Why?

A useful octal escape sequence

2009-08-09, Comments

Previously I grumbled about octal integer literals, suggesting:

  1. they aren’t much use, not when you’ve got hexademical and binary literals.
  2. they’re risky. As Doug Napoleone noted in a comment, 011 is all too easily read as eleven rather than nine.

Python 3 attempts to patch the readability issue: octal nine must be written as 0o11 or 0O11. Choose your fonts with care, O and 0 look similar!

>>> 0o11, 0O11
(9, 9)

Octal numbers can also appear in escape sequences embedded in string and bytes literals. The syntax hasn’t changed for Python 3 and the rules are as in C: the escape sequence \OOO embeds a character/byte with octal value OOO. Up to three octal digits are allowed in an octal escape sequence.

Smart mollusc Smart mollusc Smart mollusc

Like octal integers, these octal escape sequences may appear to be of limited use — a syntactic oddity rarely seen in the wild — but in fact there’s one particular use case which is so common we don’t even notice it.

char const terminator = '\0';

Converting integer literals in C++ and Python

2009-08-06, Comments

An integral literal in a C program can be decimal, hexadecimal or octal.

int percent = 110;
unsigned flags = 0x80;
unsigned agent = 007;

This snippet would be equivalent to (e.g.):

int percent = 0156;
unsigned flags = 128;
unsigned agent = 0x7;

So programmers can choose the best of these options when including numbers in their code.

Python adopted this same C syntax, but has recently gone on to extend and modify it. Some Python 2.6 numbers:

Python 2.6
>>> 0x80, 110, 007, 0O7, 0o7, 0b10000000
(128, 110, 7, 7, 7, 128)

I’m pleased to see support for binary literals, which are useful for (e.g.) bitmasks. I’ve never really seen the point of octals; nonetheless, they’ve been enhanced for Python 3. Python 2.6 backports the new improved octal literal syntax whilst retaining support for classic C-style octals. Python 3 drops C-style octals.

Inner, Outer, Shake it all abouter

2009-07-29, , Comments

C++ programmers enjoy three levels of access control: private, protected and public. Some programmers use protected instead of private just in case someone might want to derive from their class some day. Others keep everything as private as possible, hiding nested classes in anonymous namespaces; these inner classes never seem to work quite the way you’d want, but if you get tangled up a friend can cut through the knots!

Python is less sophisticated. Prefix class members with a double underscore and their names are disguised to the world outside. Prefix module members with a single underscore to indicate they won’t be exported from that module. Many Python programmers use single underscore prefix in classes too (no mangling but better looking).

ACCU Button

Once you’ve used Python for a while you may well question the benefits of the C++ model. Recently a C++ question came up on the accu-general mailing list. It involved nested classes, operator<<() and code which refused to compile. You’ll have to trawl through the list archives if you want the exact question: I didn’t give it much attention since it seemed an example of the kind of struggle with the language which causes me to throw in the towel. I would like to quote from one of the answers though.

Surely your issue is that f() is a friend of Inner only. f() is not a friend of Outer. Inner is private to Outer. Therefore in the global scope, outside Outer, f() cannot access Inner via Outer::Inner, as that is private.

Wow, some brain twister!

Giant Klein bottle and Cliff Stoll

Time to get back to basics.

Encapsulation is about allocating responsibility and easing utility rather than protecting data, which is a side effect. — @KevlinHenney

§

My thanks to Jason McGuiness for allowing me to quote from his expert answer to a tricky C++ question. The photo shows Cliff Stoll holding the world’s largest glass klein bottle, which was produced by the company he owns, operates and mismanages, Acme Klein Bottles. Klein bottles get a mention here because they don’t have an outside or an inside, they just have a side. You can solve every computing problem with an extra dimension, except the problem of too many dimensions.

Blackmail made easy using Python counters

2009-07-27, , Comments

The Obsessive Blackmailer

An obsessive blackmailer writes anonymous messages by by cut-and-pasting letters from newspapers. Being obsessive, the blackmailer only writes messages which can be composed entirely from a single newspaper.

word aligned

Devise an algorithm which determines whether a given message can be written using a given newspaper.

Modeling the Problem

This is a nice little problem but I’m about to spoil it since I’m using it here as a study in Python’s evolution. So if you’d like to try it yourself, look away now.

Could OCR conquer the calligraphylion?

2009-07-14, , Comments

OCR outline

Optical character recognition (OCR) algorithms typically process an image in stages:

  • convert the image to monochrome
  • identify blocks of text
  • find lines of text within those blocks
  • separate out words, then characters
  • extract character outlines
  • match outlines to archetypes
  • match candidate words to dictionary

OCR matching

Undogfooding

2009-07-08, , , Comments

David Jones perfectly captures the look and feel of Sir Tony Hoare’s presentation at Europython 2009

Tony Hoare is clearly old skool. His slides had the calm and aged patina of the OHP era, and I thought they were all the better for that. If you have a message, then that message can be conveyed without all the flash and shine that PowerPoint tempts you with (although, being a Microsoft man, of course his slides were in PowerPoint).

My first turtle sketch

Note the parenthetical comment: Tony Hoare works for Microsoft and he uses Microsoft software, an activity developers refer to as “eating your own dogfood”. Also eating dogfood at Europython, Gregor Lingl employed his very own turtle to guide the audience through a nifty presentation belying the reptile’s slow-and-steady reputation. I’ve always enjoyed sketching code using the Python interpreter, and sketching pictures with a turtle feels very pythonic.

My first turtle sketch
Python 3.1
>>> from turtle import *
>>> shape('turtle')
>>> circle(100)
>>> fillcolour('red')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'fillcolour' is not defined
>>> fillcolor('red')
>>> begin_fill(); circle(100); end_fill()
>>> def grey(pc): c = pc/100; fillcolor(c, c, c)
... 
>>> clear()
>>> def doit(r): begin_fill(); grey(r); circle(r); end_fill()
... 
>>> for r in range(100): doit(r)
... 
^C^C
Traceback (most recent call last):
...
>>> clear()
>>> for r in range(100, 0, -5): doit(r)
...
>>> fillcolor('white')
>>> forward(150)

Gregor Lingl’s session at Europython wasn’t the only one delivered using software developed by the presenter. What would be the opposite of eating your own dogfood, I wondered? Abusing someone else’s software, perhaps.

Tony Hoare’s vision, car crashes, and Alan Turing

2009-07-02, Comments

Europython Logo

Yesterday afternoon, at Europython 2009, David Jones addressed the subject “What sucks about Python?” Despite this provocative title, David Jones had lots of good things to say about Python, and the two topics which really roused the audience (the global interpreter lock and the over-crowded Python packaging space) had more to do with the Python-the-platform than Python-the-language. He also failed to mention the thing I miss most when working with Python: via Peter Norvig’s Python IAQ, quoting Bjarne Stroustrup:

“If I were to design a language from scratch, I would follow the Algol68 path and make every statement and declaration an expression that yields a value.”

Of course what really matters to an engineer is Python-the-platform rather than Python-the-language. Python famously comes with batteries included, and, stretching the metaphor, it also excels at integrating with the other batteries used in modern software applications. The packaging confusion is a side-effect of Python-the-platform’s success. More on over-stretched batteries later …

The day had started with Bruce Eckel’s keynote on Software Archeology. Bruce Eckel is a relaxed and engaging speaker but I found his presentation rather flimsy. Its substance could (and did!) fit on to a couple of David Jones’ slides and the remainder dwelt a little too long on Bruce Eckel, his Java and C++ books, blogs, and www.mindviewinc.com.

During the rest of the day, I had a choice of 4 or 5 different presentations at any one time, generally on the subject of Python modules or frameworks. The common theme I took away is that people turn to Python to get things done, and they’re reluctant to turn back. As the afternoon drew on everyone regathered in the Adrian Boult Hall to listen to a brilliant series of lightning talks which developed on this same theme. We also heard a wonderful short story about the best way to wreck cars and software systems.

Sir Tony Hoare

Partitioning with Python

2009-06-17, , Comments

Sums and Splits

On the subject of hunting for eodermdromes, here are a couple of semi-related partitioning problems.

  1. for a positive integer, N, find the positive integer sequences which sum to N
  2. for a sequence, S, find the distinct partitions of that sequence

As an example of the first, the 16 distinct integer sequences which sum to 5 are:

5
4 + 1
3 + 1 + 1
3 + 2
2 + 1 + 2
2 + 1 + 1 + 1
2 + 2 + 1
2 + 3
1 + 1 + 3
1 + 1 + 2 + 1
1 + 1 + 1 + 1 + 1
1 + 1 + 1 + 2
1 + 2 + 2
1 + 2 + 1 + 1
1 + 3 + 1
1 + 4

and of the second, the 8 distinct ways of partitioning the sequence ABCD are:

ABCD
A BCD
AB CD
ABC D
A B CD
A BC D
AB C D
A B C D

Note that I’ve counted 2 + 1 + 2, 2 + 2 + 1, and 1 + 2 + 2 as distinct sums totalling 5. That happens to be the formulation of the problem which interested me.