Accidental Emacs
Nice to see Emacs getting a bit of press recently. I’ve used it for almost 20 years now and it dominates my time at the keyboard. It isn’t perfect and I’m reluctant to recommend it but I wouldn’t want to be without it. Let me explain.
The best thing about Emacs is that it can do everything (including the things it can’t do yet). The worst thing about Emacs is finding out how it does anything. I wouldn’t call it discoverable. In fact, on several occasions, I’ve learned about Emacs by accident: you press the wrong key combination (easy to do when you’re holding down a couple of keys and stretching for a third) and, look, something interesting happens!
The rest of this article describes a few of these happy accidents: modes I never knew about and tricks I wish I’d learned earlier.
Scatter pictures with Google Charts
In a recent post on his blog Matt Cutts asks:
I almost wanted to call this post “Stupid Google Tricks” :-) What fun diagrams can you imagine making with the Google Charts Service?
Here’s a stupid trick: you can use the Python Imaging Library to convert a picture into a URL which Google charts will render as the original picture.
Here’s the original picture:
here’s the version served up by Google charts:
here’s the code:
import Image
import string
def scatter_pixels(img_file):
"""Return the URL of a scatter plot of the supplied image
The image will be rendered square and black on white. Adapt the
code if you want something else.
"""
# Use simple chart encoding. To make things really simple
# use a square image where each X or Y position corresponds
# to a single encode value.
simple = string.uppercase + string.lowercase + string.digits
rsimple = simple[::-1] # Google charts Y reverses PIL Y
w = len(simple)
W = w * 3
img = Image.open(img_file).resize((w, w)).convert("1")
pels = img.load()
black_pels = [(x, y) for x in range(w) for y in range(w)
if pels[x, y] == 0]
xs = "".join(simple[x] for x, _ in black_pels)
ys = "".join(rsimple[y] for _, y in black_pels)
sqside = 3.0
return (
"http://chart.apis.google.com/chart?"
"cht=s&" # Draw a scatter graph
"chd=s:%(xs)s,%(ys)s&" # using simple encoding and
"chm=s,000000,1,2.0,%(sqside)r,0&"# square black markers
"chs=%(W)rx%(W)r" # at this size.
) % locals()
and here’s the url it generates:
http://chart.apis.google.com/chart?cht=s&chd=s:DDEEEEEEE…&chs=186x186
Smallprint. Google charts may return a 400 error for an image with a long URL (meaning lots of black pixels in this case). The upper limit on URL length doesn’t seem to be documented but a quick trawl through topics on the google charts group suggests others have bumped into it too. Connoisseurs of whacky pictures should pay CSS Homer Simpson a visit.
Takewhile drops one
Here’s some naughty code.
from itertools import takewhile
def take_some(pred, xs):
while True:
for x in takewhile(pred, xs):
yield x
This code abuses the “iterator building block” foundations of Python’s itertools module. Once you’ve chopped a stream’s head off using takewhile you can’t resume processing its tail … Or can you?
Stop the clock, squash the bug
Which clock is the best?
We can easily rule the one which has stopped …
Or can we? In “The Rectory Umbrella” Lewis Carroll argues otherwise.
Which is better, a clock that is right only once a year, or a clock that is right twice every day?
“The latter,” you reply, “unquestionably.”
Very good, now attend. I have two clocks: one doesn’t go at all, and the other loses a minute a day: which would you prefer? “The losing one,” you answer, “without a doubt.” Now observe: the one which loses a minute a day has to lose twelve hours, or seven hundred and twenty minutes before it is right again, consequently it is only right once in two years, whereas the other is evidently right as often as the time it points to comes round, which happens twice a day.
Hunting down globals with nm
It was an old library, in need of attention — we all knew that — but it worked well, and we saw no reason to change it. Until, that is, we wanted more than one of it. The problem being, it was riddled with globals. A typical file looked something like this:
#include <string.h>
#define MSG_BUF_SIZE 256
char const * g_libname = "TOO.MANY.GLOBALS";
void initialise()
{
static int s_initialised = 0;
if (s_initialised == 0)
{
s_initialised = 1;
....
}
}
char g_msg_buf[MSG_BUF_SIZE];
static void clear_message()
{
memset(g_msg_buf, 0, sizeof(g_msg_buf));
}
....
In the snippet above, the g_msg_buf has external linkage. Other files in the library accessed it freely. The local static int, s_initialised, is better contained, but still stood in our way. How could we initialise two library instances?
Don’t worry, we’re not about to discuss the evils of globals and singletons. We all know what needs doing here: initialising the library should return clients a handle, and each client would use its returned handle for subsequent access to the library. Internally, the handle would be a pointer to a struct, the details of which would be private to the library, packaging its internal state.
Programming Nirvana, Plan B
Caging the Effects Monster
Simon Peyton Jones gave an outstanding keynote on functional programming at ACCU 2008. A language researcher at Microsoft in Cambridge, he’s perhaps best known as the man behind the Haskell programming language and GHC, the leading Haskell compiler. He also happens to be a superb presenter who positively exudes enthusiasm. Not many compiler writers connect so well with an audience.
Fun with Erlang, ACCU 2008
I’ve just got back from a one day course on Erlang given by its inventor, Joe Armstrong, at day 0 of the ACCU 2008 conference. Actually, he crammed as much as he could from a three day course into a single day. I’m not too disappointed we didn’t reach the stated aim of the course, of developing and running a networked application and changing it on the fly: I’m happy enough to have written some Erlang code and been exposed to some new ideas.
Armstrong is affable and enthusiastic and not afraid to voice his opinions. He’s a good teacher. I do recommend his book, “Programming Erlang: Software for a Concurrent World”, but found the tone a bit matey in places. In person he’s much more direct and engaging.
Erlang is a functional programming language which builds in support for multiple processes — these are not operating system processes; and in some ways, Erlang is the operating system. You define functions and other rules and controls using pattern matching. When patterns are used to dispatch message responses in a receive statement, the code reads well, and functions can be defined using patterns in an elegant and concise way — no if-this-then-that-else-other.
-module(accum).
-export([evens_and_odds/1]).
-import(lists, [reverse/1]).
evens_and_odds(L) -> evens_and_odds(L,[], []).
evens_and_odds([H|T], E, O) when H rem 2 =:= 0 -> evens_and_odds(T, [H|E], O);
evens_and_odds([H|T], E, O) -> evens_and_odds(T, E, [H|O]);
evens_and_odds([], E, O) -> {E, O}.
(I know this example should be coded using lists:partition, it’s just here to show the pattern syntax.)
Erlang is no academic pure functional language, though. It originated at Ericsson over 20 years ago and has been used to develop extremely reliable distributed concurrent systems. Hence the current interest: Erlang can take advantage of multiple processor cores on multiple machines, which is why it’s been adopted by up and coming projects like CouchDB. Given its proprietary origins, I think we’re lucky to find the language available under an open source license. (Armstrong has some stories to tell about that!) On the other hand, Armstrong admitted that some of the documentation was weak — at Ericsson you could always ask one of the Erlang developers if you didn’t understand something. Personally, I’d be wary of the OTP platform, a full-featured distributed application framework built on top of Erlang.
Processes communicate by messages and by generating exceptions. You can link processes together. This doesn’t mean designing a distributed system is easy, but I’d say it gives us an appropriate language for such systems. Or as Armstrong puts it:
“You cannot describe concurrent systems in sequential languages.”
By the way, I’ll be back at the conference on Thursday, to hear what Simon Peyton Jones has to say about functional programming and Haskell. See some of you then.
White black knight then black white knight
At the end of yesterday’s article I admitted defeat. I’d developed a script to render chess positions, using a suitable font as a source of scalable bitmasks to represent the pieces. Sadly, you could clearly see the board through the pieces, which meant white pieces on black squares looked wrong. I couldn’t see an easy fix.
Happily one of my readers was more resourceful:
You can make white pieces by drawing a “black” piece in white, then overlaying that with a “white” piece in black.
This is a clever trick which I wish I’d thought of! The redrawn pictures do look better.
We need three more lines of code and comments apiece.
def chess_position_font(fen, font_file, sq_size):
....
for sq, piece in filter(not_blank, zip(sqs, pieces)):
if is_white_piece(piece):
# Use the equivalent black piece, drawn white,
# for the 'body' of the piece, so the background
# square doesn't show through.
filler = unichr_pieces[piece.lower()]
put_piece(sq, filler, fill='white', font=font)
put_piece(sq, unichr_pieces[piece], fill='black', font=font)
return board
Note, in passing, that I don’t think comments can or should be entirely eliminated from source code — here’s a case where they help.
Even after this hack, the pictures aren’t pixel perfect. But I do like the grey mane you get when a white knight occupies a dark square.
Drawing Chess Positions
Dominus Connects
In a recent article Mark Dominus describes how he grew frustrated with his graphical editor and wrote a script to draw connectors:
Here’s what I did instead. I wrote a program that would read an input like this:
>-v-< '-+-`and produce a jpeg file that looks like this:
I haven’t tried running the software, which, Dominus admits, isn’t his most polished. What interests me is: the way he devises a mini-language for describing these connectors, then combines hand-built and standard tools to produce the required result; and how quickly he ditches the Gimp and settles on this approach. Clearly he’s done this sort of thing before.
Chessboards Revisited
Recently I wrote about a rather easier graphics problem, of drawing chessboards. My real mission, though, was to promote scripted graphics. A chessboard would make a good starting point, I thought. I planned to go on to describe a more advanced drawing problem, of putting pieces on the board — a problem requiring more pixel bashing and more thought about inputs.
This article tackles that follow-on problem. What I didn’t realise — but really should have guessed — is that it’s a problem which has been solved many times before in many different domains. You can find LaTeX packages and emacs modes for it. There’s even a MediaWiki macro. So if you need to draw chess positions please investigate what’s already out there.
That said, the rest of this article follows on from its predecessor. We’ll settle on a suitable notation for describing chess positions and use this as a basis for creating ASCII, Unicode + CSS, and PNG graphics. We’ll also discuss the advantages of using an interpreted, dynamic language for image processing.
Ima Lumberjack, (s)he’s OK
Introducing the team
Ima, the new recruit, paired with Alyssa, our most experienced programmer. The two of them worked on the user interface. Noah and Seymour looked after the persistent storage layer. Once Seymour got Noah into unit testing — well, you just couldn’t stop them. Cy managed the build, installation, porting, tools, and generally helped sweep up any regressions. What a star! That left me, Lem and Eva free to work on the core of the product. Ben led the team, but you know Ben — Ben kept a hand in.
He started it
This all started when Simon Sebright posted a question on an email reflector:
Just wondering what people think of the approaches of expressing the gender-neutrality of developers in particular when reading books/articles on the subject. The approach I have encountered most recently is the subtle mixing of “he” and “she” from paragraph to paragraph. Personally, I find this really grating, although better than he/she. My preference is for the 3rd person plural, i.e. they found that joining accu was useful for their career.
I don’t want to revive that discussion here, except to say that it caused me to rethink my personal approach to this problem, that I’ve always considered it unacceptable to use “he” to mean “he or she”, that I’ll avoid using “they” in place of “he” or “she” — oh, and that the best technical writers avoid any mention of gender without sounding stilted.
Sometimes though, however technical your subject, you may want to introduce a character to bring a story to life. At which point being gender-neutral becomes harder. You need a person. A person needs a name.
“Isobel might be more interested in becoming a programmer if she sees a few more women around when she visits her father at work.”
Name taking
One technique I like is to give these fictional characters fictional names — and a carefully invented name can be gender-neutral. Here are a few examples taken from some great technical writing:
- Alyssa P Hacker, Cy D Fect, Lem E Tweakit, Eva Lu Ator, Ben Bitdiddle, employees of Microshaft, a thriving high-technology company in the Boston area.
- Ima Lumberjack, who’s implemented a web 2.0 app for managing his sawmill, who’s not OK about Python 3000, and who wishes he’d been a girlie just like his dear Papa.
- Noah Shortcut, Seymour Checks and Mr Deadline, who fall into archetypal rôles in a late-running project. People, communicate!
Inventing names isn’t easy, which is why I shall continue to shamelessly plunder these ones and any more I come across. Call it a homage.
Drawing Chessboards
I wanted a picture of a chessboard. Rather than boot up some drawing software and cut and paste black and white squares I decided to write a program to create the picture.
If you’d like to know why anyone would ever create work for themselves in this way, skip to the end of this article, where you’ll find justification and a more challenging follow-on problem. Otherwise, please read on from top to bottom in the usual way.
The Python Imaging Library
Fredrik Lundh’s Python Imaging Library (commonly known as PIL) must surely rank as one of the most popular Python libraries which doesn’t come as standard[1]. It’s a fabulous tool which I’ve used to create the graphic above (though note that the double border around this graphic and subsequent ones is applied by a CSS style property). Here’s how.
Tracing function calls using Python decorators
Introduction
Let’s suppose you want to trace calls to a function. That is, every time the function gets called, you want to print out its name and the values of any arguments passed by the caller. To provide a concrete example, here’s a function which calculates the dot product of two vectors.
def dot(v, w):
"""Calculate the dot product of two vectors.
Example:
>>> dot((1, 2, 3), (3, 2, 1))
10
"""
return sum(vv * ww for vv, ww in zip(v, w))
To trace calls to the function you could just edit it and insert a print statement.
def dot(v, w):
print "dot(v=%r, w=%r)" % (v, w)
return sum(vv * ww for vv, ww in zip(v, w))
When you no longer want calls traced you can remove the print statement or even comment it out. This approach works well enough for a while but you soon discover there are more functions you want to trace; and you’ll eventually end up with lots of functions being traced and lots of commented-out tracing code. You might even end up with broken commented-out code:
def dot(vec1, vec2):
# print "dot(v=%r, w=%r)" % (v, w)
return sum(v1 * v2 for v1, v2 in zip(vec1, vec2))
At this point, you realise that calling a function and tracing these calls are orthogonal operations. Isn’t there a less invasive way to do this?
Sugar Pie
Question: in the code snippet bleow, what does the result stream, rs, approximate?
from itertools import count, ifilter, izip
from random import random as xy
from math import hypot
pt = lambda: (xy(), xy())
on = ifilter(lambda n: hypot(*pt()) < 1., count(1))
rs = (4. * j / i for i, j in izip(on, count(1)))
The code isn’t wilfully obscure but I’ll admit it’s unusual. Although written in a functional style, the source of the stream, pt, is utterly impure, generating a sequence of random results: it sprinkles points in a unit square. Despite this random input the results stream always tends to the same value. Well, in theory it should!
Here’s a picture of a round pie on a square baking tray being dusted with sugar.
Thanks again to Marius Gedminas for pointing me at
math.hypot, the best way to find the length of a 2D vector. (The
previous version of this note used abs(complex(*pt()), which it
claimed to be better than math.sqrt(x * x + y * y)).
The Price of Coffee
Kettle
I work on the Triangle, a fashionable part of Bristol crammed with clothes shops, bike shops, book shops and coffee shops. The larger book shops, Borders and Blackwells, contain their own coffee shops. It’s not hard to part with money. Unless you want a kettle, that is — you won’t find a kettle shop — so when ours broke I went out for a large Americano. I’d been trying to draw something using Neo Office and needed a pick-me-up.
The coffee came to £2.30.
Top Ten Percent
Recently I discussed two different approaches to finding the first N items, in order, from a collection of size S:
- full sort then slice
- heap-based partial sort
These two operations are respectively of complexity:
- S·log(S)
- S·log(N)
If we fix N and increase S, partial sort gets the job done most efficiently, while clearly conveying the intent of the code.
What if we fix the ratio S/N? As an example, fixing S/N at 10 would correspond to finding the top 10% of the collection. In this case, a third approach will beat both full and partial sort as S increases.










