Python on Ice

2009-10-28Comments

A moratorium on Python changes is probably a good thing—the last edition of my book nearly made my head explode. — @dabeaz

Python?

“Python?”

“Yes, Python. It’s a high-level language, we used it for the prototype. We can use it for parts of the system where performance isn’t critical. Connecting components together. The web server.”

“But what will we do when Python changes? It’s a developing language, right? How can we maintain our system.”

Not an issue, I explained. Python takes backwards compatibility very seriously. Besides, we choose which version of Python to deploy with, we choose when we migrate — maybe never. Look, you can download the source for every version of Python ever released. All you need is a C compiler. C is the porting layer, if you like, and C isn’t going anywhere in a hurry.

In all honesty, I expected more maintenance issues with the C++ parts of our product, where the language may not have changed in a decade but compilers are only just catching up with it; and in fact I didn’t have to argue for long to persuade senior management, not on this issue at least. They’d already seen how quickly I could get things up and running using Python. Even though the company had more experience with C, C++, Java, and even .Net, I convinced them Python had a role on the server-based system we were developing.

Nonetheless, I didn’t think it the right time to mention Python 3. Why confuse things?

Twisted 8.1 on Python 2.5

I won’t go into detail about the product. Data flowed through it, redirected dynamically using tees and filters, and robots were attached to the resulting streams to monitor them. A web UI presented controls and a view of the system. We used C and C++ for managing the bulk of the flow. The robots we coded in both Python and C++. We connected and coordinated everything using Twisted, a Python networking engine. Our initial deployment used Python 2.5 and Twisted 8.1. We’ve since upgraded to Python 2.6 and Twisted 8.2.

Python 3 on Word Aligned

At work there was no question of using Python 3 even though it became available when we started development. Twisted hadn’t been released against Python 3 (it still hasn’t) and even if it had been, we wouldn’t have trusted it immediately. Here at Word Aligned, though, I switched to Python 3 pretty much as soon as it was officially released. Since the start of 2009, any Python code published on this site has been written in Python 3.

Since then I’ve come to question my decision. I want people to visit my site and I want them to stay long enough to read any code here. Python is perfect because it’s readable and accessible. Anyone who’s ever written a program, whatever the language, can understand Python. But many times I’ve felt the need to explain my Python 3 code, not to Java, C#, C++ and C users, not even to Perl and Ruby users, but to Python users!

Note that in Python 3 … whereas in Python 2 … available in Python 3.1 only … you’d need to write … from __future__ import print_function

I wouldn’t have felt the need to say any of this if I’d stuck with Python 2.

Python 3 absent from Europython

Code on the Europython 2009 bag

At Europython 2009 I was struck by the absence of Python 3 from the agenda. None of the sessions covered Python 3, used Python 3, or even mentioned Python 3 (unless you count David Jones’ talk on Loving Old Versions of Python). The only Python 3 code I saw appeared on the conference bag; a lightly obfuscated script which printed out the conference destination. Note that print is being used in a way which works with both 2.x and 3.x — that is, with parentheses and taking a single parameter. Very few systems resolve /usr/bin/env python as Python 3, though[1], which is lucky since even this simple function raises an exception under Python 3 (and transforming the code using 2to3 makes it worse).

This Python 3 silence was at last broken during the question and answer session which followed the final keynote on the final day of the conference. An audibly nervous member of the audience asks Python Software Foundation supremo Steve Holden a question:

Audience member: A source of confusion is the Python 2 Python 3 thing. How are you going about getting people to move from Python 2 to Python 3?

Steve Holden: I’m not trying to get people to move to Python 3. [Audience applauds].

Steve Holden went on to round out this answer, saying that 2.6 is the recommended production version of Python. Anyone who took Python 3.0 into production, he said, would have been “kicked in the teeth by the fact that the IO subsystem performed execrably slowly, it was really dreadful” — a fact the 3.0 release notes failed to mention, but which has been fixed in Python 3.1. For teaching purposes, or for greenfield development which doesn’t need to reuse other people’s code, by all means try Python 3, he said. Python 3 is the future of Python. There’s a migration strategy in place.

And what about the overhead on the core Python development team, who now have two versions to maintain? Well, Steve Holden said, there are tools to automate patching and merging, but yes, there’s an overhead.

(To hear the question and full response, there’s a video at blip.tv. Fast forward to 52 minutes and 40 seconds.)

Python 3 Literature

Python Essential Reference, 4th edition
Dive into Python 3 cover

How have book authors reacted to Python 3? Mark Pilgrim has dived in with aplomb. His introductory book, Dive into Python 3, uses Python 3 as Python 3 was intended. For example, you won’t find % characters used in string formatting; {up to date braces are used exclusively}. It’s an engaging, painstakingly-written book, and (bonus!) the online version is an object lesson in how to craft HTML.

David Beazley attacks the problem in a different way, but then his subject is different. His comprehensive Python Essential Reference aims to cover the core language and its standard library in its entirety: think of it as the reference any serious Python programmer would like to have within reach. David Beazley’s approach is to concentrate on the common subset of Python 2 and 3, omitting features of 2 which aren’t in 3 and avoiding features of 3 which haven’t been backported to 2. His book succeeds but it does raise some awkward questions. Will Pythonistas find themselves maintaining parallel code-bases, and end up twisting their code until it fits into the intersection of two flavours of the language?

Safe Python Programming zone

Or will they simply avoid Python 3?

David Beazley eventually covers new Python 3 features in an appendix, by which time the strain has started to show:

Finally, even though Python 3.0 is described as the latest and greatest, it suffers from numerous performance and behavioral problems […] in the opinion of this author, Python 3.0 is really only suitable for experimental use by seasoned Python veterans.

The Cost of Python 3

David Beazley may have ended up feeling like a Python 3 beta tester, but, as discussed at the start of this article, most Python users have a free choice. We can live a little longer with Python 2 warts in exchange for a proven platform and an excellent set of supporting libraries. We can try and write to a language subset. We can use Python 2 and import much of the future. Or we can dive into Python 3.

The people who must find the language fork tough are the Python suppliers. Our choice, as consumers, means work for them: we’ve mentioned the core Python team, who must surely spend more time patching and testing; think of Python library writers (such as the wizards behind Twisted).

There’s another important class of supplier: the people working on alternative Python implementations, the ones which work on Java, or .Net, the ones which have no global interpreter lock, the ones which can run deeply recursive functions. Pythonistas are understandably excited about Unladen Swallow, a development branch of CPython 2.6. Just look at the project goals!

We want to make Python faster, but we also want to make it easy for large, well-established applications to switch to Unladen Swallow.

  1. Produce a version of Python at least 5x faster than CPython.
  2. Python application performance should be stable.
  3. Maintain source-level compatibility with CPython applications.
  4. Maintain source-level compatibility with CPython extension modules.
  5. We do not want to maintain a Python implementation forever; we view our work as a branch, not a fork.

In summary, if Unladen Swallow touches down safely, it will become CPython, and anyone using CPython will benefit from a high-level language capable of performing at native speeds.

I’d like to highlight the final goal, the one about maintaining, branching and forking. Unladen Swallow is a branch taken from CPython 2.6[2]. If it succeeds, much hard and unglamorous work will be needed to merge it to the latest CPython 2, and patch it across to CPython 3. Python implementers must pay a high price, in terms of increased workload, for the Python 2, Python 3 fork.

Could Python have evolved in a more linear way, by deprecating then removing features, while adding in new ones? I guess not, a mature language wouldn’t dare break backwards compatibility.

Would it?

Evolution of Python

My employers were right to characterise Python as a developing language. On the subject of incremental change, I wanted to highlight again the evolution of the multiset in Python, from Python 1.4, released in 2001, through to Python 3.1, just 4 months old.

Evolution of the Multiset in Python
def multiset_14(xs):
    multiset = {}
    for x in xs:
        if multiset.has_key(x):
            multiset[x] = multiset[x] + 1
        else:
            multiset[x] = 1
    return multiset

def multiset_15(xs):
    multiset = {}        
    for x in xs:
        multiset[x] = multiset.get(x, 0) + 1
    return multiset

import collections

def multiset_25(xs):
    multiset = collections.defaultdict(int)
    for x in xs:
        multiset[x] += 1
    return multiset

def multiset_31(xs):
    return collections.Counter(xs)

Lovely!

multiset_14() won’t work in Python 3, and multiset_31() won’t work in Python 2 (not yet, anyway).

Accepting Python 3

The main goal of the Python development community at this point should be to get widespread acceptance of Python 3000. — Guido van Rossum, 2009-10-21

Unlike Steve Holden, Guido van Rossum is trying to get people to move to Python 3, or at least to accept it. Here’s how:

I propose a moratorium on language changes. This would be a period of several years during which no changes to Python’s grammar or language semantics will be accepted. The reason is that frequent changes to the language cause pain for implementers of alternate implementations (Jython, IronPython, PyPy, and others probably already in the wings) at little or no benefit to the average user (who won’t see the changes for years to come and might not be in a position to upgrade to the latest version for years after).

Wow!

I’m not close enough to Python development to know exactly what’s involved here, but a scan of the email thread suggests this proposal has been widely accepted. I think it’s clear from the rest of this article that I sympathise with the motivation behind it. Yet I can’t help feeling uneasy about putting Python on ice. Yes, there have been changes to the language grammar over the past fifteen years. I wouldn’t say they’ve been frequent, and there aren’t many I’d want to do without, even if I only get to use them (in production) a year or two after they’ve been released. Yes, these changes cause pain to implementers[3], but that’s not the whole story. Who said implementing a language would be easy? Perhaps much of the pain comes from implementing the changes twice, once for 2 and once for 3.

Soon there’ll be a PEP stating more formally what exactly a moratorium on language changes will mean. That is,

A Python Enhancement Proposal which Proposes: Stop Enhancing Python!


[1] I wonder if anyone can guess why this code fails, just by looking at it? As a hint, highlight the rest of this paragraph. Some Python 2 codecs are byte rather than text oriented, and Python 3 prohibits this kind of confusion.

[2] More details of the Unladen Swallow branch approach.

In order to achieve our combination of performance and compatibility goals, we opt to modify CPython, rather than start our own implementation from scratch. In particular, we opt to start working on CPython 2.6.1: Python 2.6 nestles nicely between 2.4/2.5 (which most interesting applications are using) and 3.x (which is the eventual future). Starting from a CPython release allows us to avoid reimplementing a wealth of built-in functions, objects and standard library modules, and allows us to reuse the existing, well-used CPython C extension API. Starting from a 2.x CPython release allows us to more easily migrate existing applications; if we were to start with 3.x, and ask large application maintainers to first port their application, we feel this would be a non-starter for our intended audience.

[3]: C++ compiler writers, gear up for C++0x!