Hiding iterator boilerplate behind a Boost facade
Filling in missing methods. Python
Here’s another wholesome recipe served up by Raymond Hettinger.
def total_ordering(cls):
'Class decorator that fills-in missing ordering methods'
convert = {
'__lt__': [('__gt__', lambda self, other: other < self),
('__le__', lambda self, other: not other < self),
('__ge__', lambda self, other: not self < other)],
'__le__': [('__ge__', lambda self, other: other <= self),
('__lt__', lambda self, other: not other <= self),
('__gt__', lambda self, other: not self <= other)],
'__gt__': [('__lt__', lambda self, other: other > self),
('__ge__', lambda self, other: not other > self),
('__le__', lambda self, other: not self > other)],
'__ge__': [('__le__', lambda self, other: other >= self),
('__gt__', lambda self, other: not other >= self),
('__lt__', lambda self, other: not self >= other)]
}
roots = set(dir(cls)) & set(convert)
assert roots, 'must define at least one ordering operation: < > <= >='
root = max(roots) # prefer __lt__ to __le__ to __gt__ to __ge__
for opname, opfunc in convert[root]:
if opname not in roots:
opfunc.__name__ = opname
opfunc.__doc__ = getattr(int, opname).__doc__
setattr(cls, opname, opfunc)
return cls
If you have a class, X, which implements one or more of the ordering operators, <, <=, >, >= then total_ordering(X) adapts and returns the class with the missing operators filled-in. Alternatively, use standard decorator syntax to adapt a class. If we apply @total_ordering to a Point class
@total_ordering
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
def __lt__(self, other):
return (self.x, self.y) < (other.x, other.y)
then we can compare points however we like
>>> p = Point(1,2) >>> q = Point(1,3) >>> p < q, p > q, p >= q, p <= q (True, False, False, True)
Here’s a nice touch: the freshly-baked methods even have documentation!
>>> help(Point) Help on class Point in module __main__: class Point | Methods defined here: | | __ge__(self, other) | x.__ge__(y) <==> x>=y | | __gt__(self, other) | x.__gt__(y) <==> x>y | | __init__(self, x, y) | | __le__(self, other) | x.__le__(y) <==> x<=y | | __lt__(self, other)
Writing class decorators may not be the first thing a new Python programmer attempts, but once you’ve discovered the relationship between Python’s special method names and the more familiar operator symbols, I think this recipe is remarkably straightforward.
convert = {
'__lt__': [('__gt__', lambda self, other: other < self),
('__le__', lambda self, other: not other < self),
('__ge__', lambda self, other: not self < other)],
'__le__': [('__ge__', lambda self, other: other <= self),
('__lt__', lambda self, other: not other <= self),
('__gt__', lambda self, other: not self <= other)],
'__gt__': [('__lt__', lambda self, other: other > self),
('__ge__', lambda self, other: not other > self),
('__le__', lambda self, other: not self > other)],
'__ge__': [('__le__', lambda self, other: other >= self),
('__gt__', lambda self, other: not other >= self),
('__lt__', lambda self, other: not self >= other)]
}
Before moving on to something more challenging, look again at one of the recipe’s key ingredients, the convert dict, which helps create the missing ordering functions from existing ones. As you can see, there’s much repetition here, and plenty of opportunities for cut-and-paste errors.
This block of code is an example of what programmers term boilerplate. By using the total ordering decorator, we can avoid boilerplating our own code.[1]
Filling in missing methods. C++
Python is dynamic and self-aware, happy to expose its internals for this kind of tinkering. It takes real wizardry to achieve similar results with a less flexible language, such as C++ — but it can be done.
Equality and Equivalence
If A <= B and B <= A then A and B must be equal, right?
Wrong, actually.
We could rank actors according to their Bacon number, for example. Hugh Grant and Daniel Day-Lewis both have a Bacon number of 2, but that doesn’t make them equal!
I think most programmers get the distinction between ordering and equality, but it’s easy to forget.
Part of the problem is the less-than-or-equal-to and greater-than-or-equal-to operators both mention equal. The standard programming representation of these operators includes a single equals symbol, whil the representation of equality has two equals symbols. Symbolically we might assume:
<= ∧ >= ⇒ ==
Wrong!
Another part of the problem is that we tend to think of numbers as archetypal objects: when in doubt, do as the ints do. For integers, it’s true, equality and equivalence are the same. The same is true of real numbers, but what about their floating point representations? A NaN doesn’t even equal itself. Complex numbers have no standard comparison operators.
>>> 3+4j == 4j+3 True >>> 3+4j <= 4j+3 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: no ordering relation is defined for complex numbers
To avoid trouble, remember that the equal in less-than-or-equal-to should really be equivalent.
Oh, and please don’t confuse equality with assignment.
Binary search revisited
Recap
Recently I wrote about C++’s standard binary search algorithms (yes, four of them!) which do such a fine job of:
- specifying exactly what kind of range a binary search requires
- separating the core algorithm from the details of the range it’s working on
- delivering precise results
To support these claims I included an implementation of a file iterator, suitable for use with std::binary_search() etc. to efficiently locate values in very large files.
Now, there are a couple of issues with this approach:
- we had to write a lot of code to make a file iterator suitable for use with standard algorithms
- this file iterator only works on highly structured files, where each value occupies a fixed number of bytes
In this follow up article we’ll consider each of these issues in a little more depth by working through two very different solutions to a related problem.
Man or man(1)?
How careless, we’d forgotten to configure log rotation. So our application had gone with a default designed for a less verbose age, rotating files as soon as they exceeded a megabyte in size, and never throwing any of them away. Oh, and it was putting these log files at the root of the file system where they’d somehow gone unnoticed for some time. As a consequence, the file system had become clogged up with squillions of files.
$ cd / $ ls ... server.log.736624 server.log.736625 server.log.736626 server.log.736627 ...
Binary search returns … ?
In an article inspired by Jon Bentley’s classic book, Programming Pearls, Mike Taylor invites his readers to implement the binary search algorithm. To spice things up, he requests we work:
- without reference to any existing implementation
-
without calling any library routine, such as
bsearch - without writing tests.
Mike Taylor doesn’t formally specify the problem. He’s confident his readers will know what a binary search is, and if not, the description he quotes from Programming Pearls should suffice:
Binary search solves the problem [of searching within a pre-sorted array] by keeping track of a range within the array in which T [i.e. the sought value] must be if it is anywhere in the array. Initially, the range is the entire array. The range is shrunk by comparing its middle element to T and discarding half the range. The process continues until T is discovered in the array, or until the range in which it must lie is known to be empty.
So could our binary search implementation simply return a binary result, true if T is in the array, false otherwise? Well, Yes. And No. A binary search can provide more information, as Mike Taylor hints when he mentions bsearch.
Jon Bentley and Mike Taylor are primarily interested in how often programmers make a mess of what appears to be a simple assignment and in how to avoid this mess. In this article, I’d like to point out that the problem specification needs attention too.
Think, quote, escape
Evidently Jamie had got in before me. Somehow he’d unpacked the new 2U server and balanced it on the side of his desk. He looked hassled. I didn’t ask. The server’s fans whirred noisily. On Jamie’s second monitor I could see the familiar chatter of Linux installing itself. I stepped over the cardboard and polystyrene, sat, woke my machine.
Moments later I heard the install disk eject. Jamie typed something, cursed. The disk clattered into the bin. The room grew quiet as the server shut down.
We were a small, new team. Nonetheless, we’d put something substantial together. It built on a Java framework and ran on Linux. We’d tweaked the framework, meaning we had to build it from source before building our own stuff, so a clean build took almost half an hour. Just over half an hour later, Jamie had burned a new install disk. He placed it in the server’s DVD drive. The fans roared up again. Ten minutes later, more cursing, another disk in the bin.
I walked across. Jamie was glaring at some code. A Perl list? His cursor was poised over an item in this list, a single-quoted string, inside which was a sed command, whose arguments themselves needed quoting, which was evidently meant to edit the contents of a double-quoted string in some configuration file. Think: quote and escape. I could see there were four DVDs in the bin. Jamie must have been in for some time. No wonder he looked hassled.
So, what’s up?
A trade show in the States. The salesman was out there. A bare machine had already been delivered and he needed the software, so Jamie had to cut a DVD which would directly install the operating system together with our application. Please don’t expect a salesman to download an ISO image and set up enough of a network to boot from it. A courier was booked for midday to collect the DVD. That’s what’s up. So leave me alone and let me get on with it.
He didn’t actually say that last bit. It’s true, though, he was the Linux expert. I stubbornly watched as he changed some double quotes for single ones, added a couple more backslashes, checked in the file, kicked off another build.
Back at my desk, I reviewed the version control change logs. Evidently Jamie was working on a post-install script which took the form of a list of actions which would be evaluated and executed in Perl. Looking at the file diffs, the sticking point seemed to be a sed command to edit the X display settings. Embedding sed within Perl was proving tricky.
Jamie’s edit-build-burn-install-check cycle seemed crazy to me. Why not recreate the broken post-install step as a standalone operation? Soon enough I’d found a way to reproduce the problem. After reading documentation and experimenting I figured out how to nest and escape the various strings. I admit, it took me longer than I expected. By this time Jamie solved the problem by trial and error anyway — and as proof he had an install disk which he knew worked. He may well have spent less time actually concentrating on the issue than I had; the build-burn-install phases of his process all ran as background activities.
In the event we needed to revise the software anyway. So the salesman had to download a new build at the last minute. Jamie stayed late the next day to talk him through the installation.
Beware the March of IDEs!
The March of IDEs
Visual Studio can be one of the programmer’s best friends, but over the years it has become increasingly pushy, domineering, and suffering from unsettling control issues. Should we just surrender to Visual Studio’s insistence on writing our code for us? Or is Visual Studio sapping our programming intelligence rather than augmenting it?
— Charles Petzold, Does Visual Studio Rot the Mind?
March the 15th is a good day to revisit a talk Charles Petzold gave to at the NYC .NET Developer’s Group some four and half years ago. The speaker is probably best known as the author of Programming Windows, now in its fifth edition, and during his talk he confesses to a love/hate relationship with the software most people use for Windows programming — Microsoft Visual Studio (VS).
If you haven’t read the talk, read it now; and if you have, give it another look. Charles Petzold frets about the addictive nature of VS: if you become hooked on intellisense will you become dependent and mentally flabby? And what about the code VS generates in directories which it chooses?
Beware! Your IDE is taking over: you’ll end up working for it.
Pi seconds is a nanocentury
π seconds is a nanocentury
— Jon Bentley, quoting Tom Duff, Programming Pearls
No, this isn’t a mysterious and fundamental connection between circles and the solar system. Actually, it’s incorrect:
- nanocentury ~3.15576 seconds
- ratio of a circle’s circumference to its diameter ~3.14159
Rather, Tom Duff’s rule of thumb is a useful mnemonic. One year is roundabout 3.14×107 seconds. The accuracy of this figure is more than good enough for back of an envelope calculations. (Such calculations would be easier if we adopted a decimal time system — 100 seconds in a minute, 100 minutes in an hour, say — but time is one of the few currencies with internationally agreed denominations. I can’t see it changing, and 100 days in a year doesn’t make sense anyway.)
Bike charts by Google
I’ve liked the Google chart API ever since I first discovered it. Pack a text definition of an image into a URL http://chart.apis.google.com/chart?YOUR-IMAGE-HERE and you’ll be served up a freshly cooked PNG. It’s free. There’s not even a watermark.
http://chart.apis.google.com/chart? # A chart, please
&chs=320x160 # sized 320x160 pixels
&cht=gom # of type swingometer
&chd=t:70 # with 70% swing
&chl=Nice! # labeled "Nice!"
Gone are the days when the documentation fitted on a single web-page. The API has fattened up and filled out. Every time I visit something new has been added: mathematical formulae written in TeX; a playground where you can sketch a chart directly; a validation option which tells you where you went wrong — much more helpful than a bare 404.
When you comment on a comment
@ianbicking these days, I very rarely bother reading anything where I cannot comment. — @drjtwit
§
You’ll notice there are no comments here. I hate discussing things via blog comments. If you’d like to talk, drop me a line. If you’d like to discuss things in public, post on your blog.
§
When you leave a comment on a comment, how often do you wonder what your rights are? Not too often, I’d guess. Over the years, it has become an accepted fact that content contributed to a website simply belongs to that website. If the website, or blog for today’s web, goes away then all of your contributions disappear along with it. A real world analogy would be sending in letters or artwork to a magazine. There’s usually that disclaimer which says the publication can do whatever with your submission. And, of course, they can’t return anything to you. It belongs to the magazine now.
— Daniel Ha, A commenter’s rights
§
In a recent blog post Dan Twining writes about blog comments and asks what I think of Disqus, the commenting service used here at Word Aligned. The question comes at an interesting time.
Power programming
Powerful or dangerous?
Recently I wrote about one of the Google Code Jam challenges, where, perhaps surprisingly, the best answer — the most elegant and obviously correct answer, requiring the fewest lines of code, with virtually zero space overhead, and running the quickest — the very best answer was coded in C++.
Why should this be surprising? C++ is a powerful language.
In my experience there is almost no limit to the damage that a sufficiently ingenious fool can do with C++. But there is also almost no limit to the degree of complexity that a skillful library designer can hide behind a simple, safe, and elegant C++ interface.
— Greg Colvin, “In the Spirit of C”
Yes. And yes! But in this article I wanted to discuss something C++ can’t do. Let’s start with another challenge from the same round of the 2009 Google Code Jam.
Decision trees
Decision trees — in particular, a type called classification trees — are data structures that are used to classify items into categories using features of those items. For example, each animal is either “cute” or not. For any given animal, we can decide whether it is cute by looking at the animal’s features and using the following decision tree.
(0.2 furry (0.81 fast (0.3) (0.2) ) (0.1 fishy (0.3 freshwater (0.01) (0.01) ) (0.1) ) )
The challenge goes on to describe the structure more formally, then steps through an example calculation. What is the probability, p, that a beaver is cute?
For example, a beaver is an animal that has two features:
furryandfreshwater. We start at the root withpequal to1. We multiplypby0.2, the weight of the root and move into the first sub-tree because the beaver has thefurryfeature. There, we multiplypby0.81, which makespequal to0.162. From there we move further down into the second sub-tree because the beaver does not have the fast feature. Finally, we multiplypby0.2and end up with0.0324— the probability that the beaver is cute.
The challenge itself involves processing input comprising a number of test cases. Each test case consists of a decision tree followed by a number of animals. A solution should parse the input and output the calculated cuteness probabilities.
Python, Surprise me!
A Simple Function
Here’s a simple function which converts the third item of a list into an integer and returns it, returning -1 if the list has fewer than three entries or if the third entry fails to convert.
def third_int(xs):
'''Convert the third item of xs into an int and return it.
Returns -1 on failure.
'''
try:
return int(xs[2])
except IndexError, ValueError:
return -1
Unfortunately this simple function is simply wrong. Evidently some exceptions aren’t being caught.
>>> third_int([1, 2, 3, 4])
3
>>> third_int([1])
-1
>>> third_int(('1', '2', '3', '4',))
3
>>> third_int(['one', 'two', 'three', 'four'])
Traceback (most recent call last):
....
ValueError: invalid literal for int() with base 10: 'three'
How ever did a ValueError sneak past the except clause?
Next permutation: When C++ gets it right
The Next Number Problem
Suppose you have a fixed list of digits chosen from the range 1..9. What numbers can you make with them? You’re allowed as many zeros as you want. Write the numbers in increasing order.
Exactly this puzzle came up in the recent Google Code Jam programming contest:
You are writing out a list of numbers. Your list contains all numbers with exactly Di digits in its decimal representation which are equal to i, for each i between 1 and 9, inclusive. You are writing them out in ascending order.For example, you might be writing every number with two ‘1’s and one ‘5’. Your list would begin 115, 151, 511, 1015, 1051.
Given N, the last number you wrote, compute what the next number in the list will be.
The competition has closed now, but if you’d like to give it a go sample input files can be found on the website, where you can also upload your results and have them checked.
Here’s a short section from a trial I ran on my computer. Input numbers are in the left-hand column: the corresponding output numbers are in the right-hand column.
50110812884911623516 → 50110812884911623561 82454322474161687049 → 82454322474161687094 82040229261723155710 → 82040229261723157015 43888989554234187388 → 43888989554234187838 76080994872481480636 → 76080994872481480663 31000989133449480678 → 31000989133449480687 20347716554681051891 → 20347716554681051918
Choice of Algorithm
Like many of the code jam challenges, you’ll need to write a program which runs fast enough; but choosing the right algorithm is more important than choosing the right language. Typically a high-level interpreted language like Python allows me to code and test a solution far more quickly than using a low-level language like C or C++.
In this particular case, though, like most successful candidates, I used C++. Here’s why.
Python on Ice
A moratorium on Python changes is probably a good thing—the last edition of my book nearly made my head explode. — @dabeaz
Python?
“Python?”
“Yes, Python. It’s a high-level language, we used it for the prototype. We can use it for parts of the system where performance isn’t critical. Connecting components together. The web server.”
“But what will we do when Python changes? It’s a developing language, right? How can we maintain our system.”
Not an issue, I explained. Python takes backwards compatibility very seriously. Besides, we choose which version of Python to deploy with, we choose when we migrate — maybe never. Look, you can download the source for every version of Python ever released. All you need is a C compiler. C is the porting layer, if you like, and C isn’t going anywhere in a hurry.
In all honesty, I expected more maintenance issues with the C++ parts of our product, where the language may not have changed in a decade but compilers are only just catching up with it; and in fact I didn’t have to argue for long to persuade senior management, not on this issue at least. They’d already seen how quickly I could get things up and running using Python. Even though the company had more experience with C, C++, Java, and even .Net, I convinced them Python had a role on the server-based system we were developing.
Nonetheless, I didn’t think it the right time to mention Python 3. Why confuse things?
Steady on Subversion
My secret shame
In a world of distributed version control systems I’m ashamed to confess I still use Subversion. We use it at work, exclusively. I use it at home, by default. Worse still, in a small way, I help promote and perpetuate this antiquated version control system: if you want to mirror a Subversion repository or set up a Subversion pre-commit hook, you may well find some faded notes I wrote on these subjects.
Whisper the words. I still like Subversion.
What do I like most? The command to revert a change. Merge it backwards.
$ svn merge --change -666
Branches and tags are one and the same. For someone who grew up with CVS, that’s quite a relief. Anyone can grasp the versioned file tree model. No one wants their version control system to surprise them. My boss, who doesn’t get to program as much as he’d like, has discovered a shiny Subversion client — and he doesn’t even use Windows. The sales team, who do use Windows, can use Subversion to collaborate on their office documents. TortoiseSVN interfaces to the Word diff tool, a nice touch. And software developers can surely find stable Subversion plug-ins for whatever tools they use.
The Subversion documentation is solid and has been for some while. I’m surprised anyone ever arrives at my website seeking tips, but arrive they do, and in ever-increasing numbers.
Subversion does enough. The hard parts of my job are deciding what software to write, writing it, and working as a team. Version control should be frictionless, the easy bit. Which it is.














