Python, Surprise me!

2009-12-15Comments

A Simple Function

Here’s a simple function which converts the third item of a list into an integer and returns it, returning -1 if the list has fewer than three entries or if the third entry fails to convert.

def third_int(xs):
    '''Convert the third item of xs into an int and return it.

    Returns -1 on failure.
    '''    
    try:
        return int(xs[2])
    except IndexError, ValueError:
        return -1

Unfortunately this simple function is simply wrong. Evidently some exceptions aren’t being caught.

>>> third_int([1, 2, 3, 4])
3
>>> third_int([1])
-1
>>> third_int(('1', '2', '3', '4',))
3
>>> third_int(['one', 'two', 'three', 'four'])
Traceback (most recent call last):
    ....
ValueError: invalid literal for int() with base 10: 'three'

How ever did a ValueError sneak past the except clause?

The Real Surprise

There’s nothing mysterious or surprising going on here, but I’ll delay answering this question for a moment. For me, the real surprise about Python is that, generally, I get it right first time. Python similarly caught Eric S. Raymond by surprise. His first surprise was that it took him just 20 minutes to get used to syntactically significant whitespace. And just 100 minutes later …

My second [surprise] came a couple of hours into the project, when I noticed (allowing for pauses needed to look up new features in Programming Python) I was generating working code nearly as fast as I could type. When I realized this, I was quite startled. An important measure of effort in coding is the frequency with which you write something that doesn’t actually match your mental representation of the problem, and have to backtrack on realizing that what you just typed won’t actually tell the language to do what you’re thinking. An important measure of good language design is how rapidly the percentage of missteps of this kind falls as you gain experience with the language.

— Eric S. Raymond, Why Python?

I certainly don’t generate working code as fast as I can type, and I’m not even a particularly quick typist, but I rarely make syntactic errors when writing Python — and I don’t often need to consult the documentation on such matters. As Chuck Allison memorably puts it: “the syntax is so clean it squeaks”.

Parentheses Required(?)

There are some oddities and gotchas though. I don’t object to the explicit self in methods, but I do sometimes forget to write it — especially if I’ve just switched over from C++.

A side-effect of the whitespace thing is that you can’t just wrap a long line. The line ending needs to be escaped.

if 1900 < year < 2100 and 1 <= month <= 12 \
    and 1 <= day <= 31 and 0 <= hour < 24 \
    and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date
    return 1

Alternatively, parenthesize.

if (1900 < year < 2100 and 1 <= month <= 12
    and 1 <= day <= 31 and 0 <= hour < 24
    and 0 <= minute < 60 and 0 <= second < 60): # Looks like a valid date
    return 1

In the above, the parentheses aren’t required to group terms, but instead serve to implicitly continue the line of code past a couple of newline characters.

Wikipedia Tree

Parentheses serve more than one role in Python’s syntax. As in all C-family languages, they can group expressions. They also get involved building tuples, (1, 2, 3) or ('red', 0xff0000) for example. Beware the special case: a one-tuple needs a trailing comma, ("singleton",). This isn’t something I forget or accidentally omit, but it can make things fiddly. Here’s a tuple-tised tree, where we represent a tree as a tuple whose first element is a node value, and any subsequent elements are sub-trees. Careful with those commas!

tree = (2, (7, (2,), (6, (5,), (11,))), (5, (9, (4,))))

Actually, tuples are just comma-separated lists of expressions — no parentheses required — so we might equally well have written.

tree = 2, (7, (2,), (6, (5,), (11,))), (5, (9, (4,)))

Here, the superfluous outermost parentheses have been omitted; the inner ones are still required for grouping.

How about we always append a trailing comma to our tuples so the one-tuple no longer looks different?

tree = 2, (7, (2,), (6, (5,), (11,))), (5, (9, (4,))),

That’s allowed and fine. Unless we need an empty tuple, that is, in which case the parentheses are required. And a comma would be wrong.

>>> ()
()
>>> (),
((),)
>>> ,
   ....
SyntaxError: invalid syntax
>>> (,)
   ....
SyntaxError: invalid syntax
>>> tuple()
()

Python 3 introduces a nice new syntax for set literals, reusing the braces which traditionally enclose dicts.

>>> ls = { 1, 11, 21, 1211, 111221, 312211 }

Again, beware the edge case: {} is an empty dict, not an empty set.

>>> zs = {}
>>> type(zs)
<class 'dict'>
>>> zs = set()
>>> type(zs)
<class 'set'>

Python 3 allows non-ascii characters in identifiers, but not any old character, so we cannot get away with

>>> ∅ = set()
      ^
SyntaxError: invalid character in identifier

Parentheses are used for function calls too, and also for generator expressions. Here’s a lazy list of squares of numbers less than a million.

>>> sqs = (x * x for x in range(1000000))

Here’s the sum of these numbers.

>>> sum((x * x for x in range(1000000)))
333332833333500000

Actually, we can omit the generator-expression parentheses in the sum. The function call parentheses magically turn the enclosed x * x for x in range(1000000) into a generator expression. As usual, Python does what we want.

>>> sum(x * x for x in range(1000000))
333332833333500000

Serious about Syntax

If you’ve read this far you may well be thinking: “So what?” I haven’t shown any gotchas, merely a few quirks and corner cases. As already mentioned, the real surprise is that Python fails to surprise. Part of this, as I hope I’ve shown here, can be attributed to the interpreter, which positively invites you to experiment; but mainly Python’s clean and transparent design takes the credit. Repeating Eric S. Raymond: you don’t have to “actually tell the language to do what you’re thinking”.

Since I first started using Python the syntax has grown considerably, yet the extensions and additions seem almost as if they’d been planned from the start1. Generator expressions complement list comprehensions. The yield statement fits nicely with iteration.

Even more remarkably, Python 3 has chosen to break backwards compatibility, so it can undo those few early choices which now seem mistakes. Which brings us back to the broken function at the top of this article. Here it is again, docstring omitted for brevity.

def third_int(xs):
    try:
        return int(xs[2])
    except IndexError, ValueError:
        return -1

I really did write a function like this, and I really did get it wrong in just this way. The code is syntactically valid, but I should have written

def third_int(xs):
    try:
        return int(xs[2])
    except (IndexError, ValueError):
        return -1

The parentheses in the except clause are crucial. The formal syntax of this form of try statement is

try1_stmt ::=  "try" ":" suite
               ("except" [expression [("as" | ",") target]] ":" suite)+
               ["else" ":" suite]
               ["finally" ":" suite]

In the corrected version of third_int(), the parentheses group IndexError, ValueError into a single expression, a tuple, and the except clause matches any object with class (or base class) IndexError or ValueError. The broken version is very different, as becomes clear if we use the alternative "as" form.

def third_int(xs):
    try:
        return int(xs[2])
    except IndexError as ValueError:
        return -1

Here, the except clause will match an object with class or base class IndexError, and assigns that object to the target, which is called ValueError (and which shadows the “real” ValueError in the rest of the function definition). If int() raises a ValueError, it will not be matched.

Won’t Get Fooled Again

Oh, I get it, now. It is a bit subtle, but I won’t make that mistake again.

Wait, there’s more! In Python 3k, my broken implementation is properly broken — a syntax error.

Python 3.1
>>> def third_int(xs):
...     try:
...         return int(xs[2])
...     except IndexError, ValueError:
  File "<stdin>", line 4
    except IndexError, ValueError:
                     ^
SyntaxError: invalid syntax

The Python 3k syntax of this form of try statement reads.

try1_stmt ::=  "try" ":" suite
               ("except" [expression "as" target]] ":" suite)+
               ["else" ":" suite]
               ["finally" ":" suite]

You can’t use a comma to capture the target any more. It’s an advance and a simplification. Why am I not surprised?


1: With the possible exception of conditional expressions, that is.