Zippy triples served with Python

2007-11-20, , , Comments

The Problem

Here’s a problem I encountered when writing the HTML generator for this site. Logically, Word Aligned is a time-ordered collection of articles. I wanted each article to link to its predecessor and successor. So the general problem is: How do you iterate through a collection yielding (previous, this, next) triples?

Specification

Some test cases make things clearer. Let’s name the function we’re developing prev_this_next().

>>> long_weekend = 'Fri', 'Sat', 'Sun', 'Mon',
>>> yesterday_today_tomorrow = prev_this_next(long_weekend)
>>> yesterday_today_tomorrow.next()
(None, 'Fri', 'Sat')
>>> yesterday_today_tomorrow.next()
('Fri', 'Sat', 'Sun')
>>> yesterday_today_tomorrow.next()
('Sat', 'Sun', 'Mon')
>>> yesterday_today_tomorrow.next()
('Sun', 'Mon', None)
>>> yesterday_today_tomorrow.next()
Traceback (most recent call last):
    ...
StopIteration
>>> for ptn in prev_this_next(range(5)):
...     print ptn
... 
(None, 0, 1)
(0, 1, 2)
(1, 2, 3)
(2, 3, 4)
(3, 4, None)
>>> print "\n".join(map(repr, prev_this_next("XYZ")))
(None, 'X', 'Y')
('X', 'Y', 'Z')
('Y', 'Z', None)

You’ll notice we’ve specified behaviour at the boundaries: the first item in the collection has no predecessor, thus the first triple returned has its first item set to None; and similarly the final triple has its third item set to None. We might equally well have chosen to return a user supplied default, or to wrap the collection at its ends. For now, let’s go with the simple behaviour shown.

You’ll also have noticed I’m writing Python — fair enough, since this web site is generated off-line using Python. The long_weekend example drives the Python iterator protocol by hand, calling yesterday_today_tomorrow.next() until a StopIteration exception terminates the iteration. It’s quite rare to use iterators in this way: more commonly, you just loop through them using for, or plug them into container operations. The second and third test cases show more typical usage.

First Implementation

If this were C++, we’d prefer our collection to support bi-directional iteration: think of a doubly-linked std::list, or a plain old random access std::vector. Then we could just decrement/increment each iterator from the collection to find its neighbours.

In Python, we might decide to assume a random access container and write something like this:

def get_default(items):
    "Return an item getter function."
    n_items = len(items)
    def inner(index):
        "Return items[index] or None if index is out of range."
        if index < 0 or index >= n_items:
            return None
        else:
            return items[index]
    return inner

def prev_this_next(items):
    get = get_default(items)
    for ix, item in enumerate(items):
        yield get(ix - 1), item, get(ix + 1)

This code isn’t elegant but it does pass our tests. Incidentally, an attempt to implement get_default using EAFP, as shown below, would fail. Can you see why?

def inner(index):
    try:
        return items[index]
    except IndexError:
        return None

This fails because accessing items[-1] doesn’t raise an IndexError (unless items is empty); it’s a convenient way to access the final element of items.

Even with the correct version of get_default, if our collection of items is a stream — by which I mean a lazily-evaluated iterable — this code raises an exception. We don’t know how long the stream will be (indeed, it could be infinite) and we can’t just access elements from it at random. For C++ programmers, think of sequentially reading a file using an input iterator.

Stream Test Case

Let’s adapt one of our test cases to expose this flaw.

>>> long_weekend = iter(('Fri', 'Sat', 'Sun', 'Mon'))
>>> yesterday_today_tomorrow = prev_this_next(long_weekend)
...

Running this code raises an exception:

Traceback (most recent call last):
    ...
TypeError: object of type 'tupleiterator' has no len()

Stream Solution

Thinking of this problem in terms of streams gives us a solution which is both more general and more simple. All we have to do is tee up three independent iterators into the stream, stagger them, then zip them back together. The itertools module supplies the components. We connect.

import itertools

def prev_this_next(items):
    extend = itertools.chain([None], items, [None])
    prev, this, next = itertools.tee(extend, 3)
    try:
        this.next()
        next.next()
        next.next()
    except StopIteration:
        pass
    return itertools.izip(prev, this, next)

This code works on any iterable, infinite, finite or empty, lazy or eager. Some more testcases:

>>> [ptn for ptn in prev_this_next(list())]
[]
>>> [ptn for ptn in prev_this_next(set([1]))]
[(None, 1, None)]
>>> ptn = prev_this_next(itertools.count())
>>> itertools.islice(ptn, 100, 101).next()
(99, 100, 101)

Triples Times Two

Now suppose you want to peel items from an iterable, three at a time. Let’s call this function three_at_a_time() and let’s specify its behaviour with some simple tests:

>>> t = three_at_a_time((1, 2, 3, 4, 5, 6))
>>> t.next()
(1, 2, 3)
>>> t.next()
(4, 5, 6)
>>> t.next()
Traceback (most recent call last):
    ...
StopIteration
>>> ttt = three_at_a_time((1, 2, 3, 4))
>>> [t for t in ttt]
[(1, 2, 3)]
>>> ttt = three_at_a_time(itertools.count())
>>> ttt = itertools.islice(ttt, 0, 9, 3)
>>> [t for t in ttt]
[(0, 1, 2), (9, 10, 11), (18, 19, 20)]
>>> "".join(chain(*three_at_a_time("Word Aligned")))
'Word Aligned'

Note that any trailing single element or pair at the end of the collection is discarded. We might equally have decided to pad the collection with a user-supplied default or throw an exception.

Here’s one implementation.

def three_at_a_time(items):
    it = iter(items)
    return itertools.izip(it, it, it)

Here’s another.

def n_at_a_time(items, n):
    it = iter(items)
    return itertools.izip(* [it] * n)

three_at_a_time = lambda items: n_at_a_time(items, 3)