Evolving Python in and for the real world

2007-06-22, , Comments

When I first became interested in Python — and in particular interested enough to discover the PEPs — I wondered what exactly Python 3000 was. From what I could make out Python 3000 seemed to be more of philosophical than practical importance. I assumed the term provided a framework for discussing how Python might have been given the luxury of hindsight; if only we didn’t have to worry about backwards compatibility; if only we could break existing code; if only …

Warts, wrinkles, backwards compatibility

I guess I was wrong. Python 3000 is real and it is happening. The benevolent dictator for life is exercising his prerogative and pushing through some dictatorial changes. Python’s warts and wrinkles aren’t being quietly tolerated or even deprecated. Instead, the language as a whole will be forked.

There’s been plenty of reaction to this in the usual places.

Undoubtedly if you need to maintain two parallel versions of a Python application, things will get messy; and simply leaping from Python 2.x to Python 3.x is going to require a lot more care than, say, hopping from 2.5 to 2.6.

That said, I think the move is the right thing for Python. For one thing, Python builds on a stable subset of a stable platform (i.e. C) and you can easily download and build any released version of Python [1], which in turn means you can avoid fiddling with code you don’t want to touch simply by bundling it with the version of Python it was developed against.

It’s a brave move too. All of a sudden everyone loves reduce. All of a sudden everyone has a suggestion to make.

I applaud Python 3000 because software should be soft, including the language it’s written in. Every week Isobel has to learn 10 spellings, which usually centre on the same vowel sounds spelled in different ways: try, sigh, pie, why? Why? Backwards compatibility[2]. She has to cope with inconsistent English but I’d rather she didn’t have to learn about two types of classes, “classic” and “new-style”. I don’t want her to have to prefer xrange to range, itertools.imap to map and so on. And however important it might be for a programmer to appreciate integral arithmetic, I’d prefer the division symbol to indicate division, not division and truncation.

I suspect the decision to push towards Python 3000 was one more easily taken by a dictator than by a committee, but I do find it encouraging that someone so intimately involved with the language can step back from the code-face and realise the importance of putting things right. As a programmer, I know how it feels to be responsible for a bug in code which got shipped — however successful that code might be. This too must motivate Guido van Rossum.

Design by Committee

Committees can make bold decisions too. The best true story I’ve ever read on the subject of computer language evolution has to be Bjarne Stroustrup’s “The Design and Evolution of C++” (also known as the D&E book). It’s not intended for students who want learn how to program in C++, but I regard it essential reading for anyone who really wants to make sense of the language (just as Isobel will need to appreciate the roots of English if she ever wants to really get to grips with it). The D&E book story ends in 1994 but C++ has continued to evolve. Stroustrup has started work on bringing the story up to date, and you can find a draft of his paper “Evolving a language in and for the real world: C++ 1991-2006” here. On the subject of committees, responsibility, and real world pressures, he writes:

Changing the definition of a widely used language is very different from simple design from first principles. Whenever we have a “good idea”, however major or minor, we must remember that:

  • there are millions of lines of code “out there” — most will not be rewritten however much gain might result from a rewrite
  • there are millions of programmers “out there” — won’t take out time to learn something new unless they consider it essential
  • there are decade old compilers still in use — programmers can’t use a language feature that doesn’t compile on every platform they support
  • there are many millions of outdated textbooks out there — many will still be in use in five years’ time

The committee considers these factors and obviously that gives a somewhat conservative bias. Among other things, the members of the committee are indirectly responsible for well over a million lines of code (as members of their organizations).

Yet despite these pressures, the C++ committee agreed to delay standarising C++ by a whole year so that the fabulous STL could be formally incorporated into the language. I think it’s not overstating things to say that the subsequent success of the STL has exposed problems in the C++ language itself, and that addressing these problems is the focus of its next formal revision, C++09. It’s also clear, though, that the C++ standards committee will make every effort to ensure C++09 is backwards compatible; meaning that it’s going to include all the old baggage and introduce a whole lot more.

Although I’m far from being a C++ language lawyer, I know C++ well enough to wield it safely. I also know C++ well enough to see that C++09 will squarely hit all the right targets: it’s going to allow us to write more compact, more readable, more efficient C++, and compiler error messages are going to make a lot more sense (once implementations are brought up to date, that is). That said, I think the ever growing C++ legacy is going to become a huge barrier to entry: if you haven’t grown up with C++, it’s going to be increasingly hard to use it, and it’s going to be a less-favoured choice for green field development.

A prediction

I would be surprised if Isobel or any of her classmates ever learn to program in C++. I expect a good few of them to learn Python.

The real world?

Stroustrup says the task facing a “real world” language like C++ isn’t as easy as a “simple design from first principles”. Well, I strive for simplicity but, paradoxically, there’s nothing easy about it. In my experience simplicity must be discovered, not designed. Thus a simple design derives from – and may indeed break – a more complex one. That’s what Python 3000 is doing.


[1] Even in this digital age, not quite any released version. The README in the Python downloads ftp site says:

If you find an older Python release (e.g. 0.9.8), we’re interested in getting a copy! mailto:webmaster@python.org

[2] The written form of human languages can be radically revised. In 1928 Kemal Atatürk didn’t just change the way Turkish was spelled: he changed the alphabet too.