Hiding iterator boilerplate behind a Boost facade

2010-07-07, , Comments

SS Great Britain

Filling in missing methods. Python

Here’s another wholesome recipe served up by Raymond Hettinger.

Total ordering class decorator
def total_ordering(cls):
    'Class decorator that fills-in missing ordering methods'    
    convert = {
        '__lt__': [('__gt__', lambda self, other: other < self),
                   ('__le__', lambda self, other: not other < self),
                   ('__ge__', lambda self, other: not self < other)],
        '__le__': [('__ge__', lambda self, other: other <= self),
                   ('__lt__', lambda self, other: not other <= self),
                   ('__gt__', lambda self, other: not self <= other)],
        '__gt__': [('__lt__', lambda self, other: other > self),
                   ('__ge__', lambda self, other: not other > self),
                   ('__le__', lambda self, other: not self > other)],
        '__ge__': [('__le__', lambda self, other: other >= self),
                   ('__gt__', lambda self, other: not other >= self),
                   ('__lt__', lambda self, other: not self >= other)]
    }
    roots = set(dir(cls)) & set(convert)
    assert roots, 'must define at least one ordering operation: < > <= >='
    root = max(roots)       # prefer __lt__ to __le__ to __gt__ to __ge__
    for opname, opfunc in convert[root]:
        if opname not in roots:
            opfunc.__name__ = opname
            opfunc.__doc__ = getattr(int, opname).__doc__
            setattr(cls, opname, opfunc)
    return cls

If you have a class, X, which implements one or more of the ordering operators, <, <=, >, >= then total_ordering(X) adapts and returns the class with the missing operators filled-in. Alternatively, use standard decorator syntax to adapt a class. If we apply @total_ordering to a Point class

@total_ordering
class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y
    def __lt__(self, other):
        return (self.x, self.y) < (other.x, other.y)

then we can compare points however we like

>>> p = Point(1,2)
>>> q = Point(1,3)
>>> p < q, p > q, p >= q, p <= q
(True, False, False, True)

Here’s a nice touch: the freshly-baked methods even have documentation!

>>> help(Point)
Help on class Point in module __main__:

class Point
 |  Methods defined here:
 |  
 |  __ge__(self, other)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __gt__(self, other)
 |      x.__gt__(y) <==> x>y
 |  
 |  __init__(self, x, y)
 |  
 |  __le__(self, other)
 |      x.__le__(y) <==> x<=y
 |  
 |  __lt__(self, other)

Writing class decorators may not be the first thing a new Python programmer attempts, but once you’ve discovered the relationship between Python’s special method names and the more familiar operator symbols, I think this recipe is remarkably straightforward.

convert = {
    '__lt__': [('__gt__', lambda self, other: other < self),
               ('__le__', lambda self, other: not other < self),
               ('__ge__', lambda self, other: not self < other)],
    '__le__': [('__ge__', lambda self, other: other <= self),
               ('__lt__', lambda self, other: not other <= self),
               ('__gt__', lambda self, other: not self <= other)],
    '__gt__': [('__lt__', lambda self, other: other > self),
               ('__ge__', lambda self, other: not other > self),
               ('__le__', lambda self, other: not self > other)],
    '__ge__': [('__le__', lambda self, other: other >= self),
               ('__gt__', lambda self, other: not other >= self),
               ('__lt__', lambda self, other: not self >= other)]
}

Before moving on to something more challenging, look again at one of the recipe’s key ingredients, the convert dict, which helps create the missing ordering functions from existing ones. As you can see, there’s much repetition here, and plenty of opportunities for cut-and-paste errors.

This block of code is an example of what programmers term boilerplate. By using the total ordering decorator, we can avoid boilerplating our own code.[1]

Filling in missing methods. C++

Python is dynamic and self-aware, happy to expose its internals for this kind of tinkering. It takes real wizardry to achieve similar results with a less flexible language, such as C++ — but it can be done.

In a previous article we developed a random access file iterator in C++. At its heart, this iterator simply repositioned itself using file-seeks and dereferenced itself using file-reads. There wasn’t much to it.

Unfortunately we had to fill-out the iterator with the various members required to make it comply with the standard random access iterator requirements (which was the whole point, since we wanted something we could use with standard binary search algorithms).

We had to expose standard typedefs:

typedef std::random_access_iterator_tag iterator_category;
typedef item value_type;
typedef std::streamoff difference_type;
typedef item * pointer;
typedef item & reference;

Worse, we had to implement a full set of comparison, iteration, step and access functions. Please, page down past the following code block! I only include it here so you can see how long it goes on for.

Iterator boilerplate
public: // Comparison
    bool operator<(iter const & other) const
    {
        return pos < other.pos;
    }
    
    bool operator>(iter const & other) const
    {
        return pos > other.pos;
    }
    
    bool operator==(iter const & other) const
    {
        return pos == other.pos;
    }
    
    bool operator!=(iter const & other) const
    {
        return pos != other.pos;
    }
    
public: // Iteration
    iter & operator++()
    {
        return *this += 1;
    }
    
    iter & operator--()
    {
        return *this -= 1;
    }
    
    iter operator++(int)
    {
        iter tmp(*this);
        ++(*this);
        return tmp;
    }
    
    iter operator--(int)
    {
        iter tmp(*this);
        --(*this);
        return tmp;
    }
    
public: // Step
    iter & operator+=(difference_type n)
    {
        advance(n);
        return *this;
    }
    
    iter & operator-=(difference_type n)
    {
        advance(-n);
        return *this;
    }
    
    iter operator+(difference_type n)
    {
        iter result(*this);
        return result += n;
    }
    
    iter operator-(difference_type n)
    {
        iter result(*this);
        return result -= n;
    }
    
public: // Distance
    difference_type operator-(iter & other)
    {
        return pos - other.pos;
    }
    
public: // Access
    value_type operator*()
    {
        return (*this)[0];
    }

value_type operator[](difference_type n)
    {
        std::streampos restore = getpos();
        advance(n);
        value_type const result = read();
        setpos(restore);
        return result;
    }
    ....

How tiresome! Most of these member functions are directly and unsurprisingly implemented in a standard way. It would be nice if we could write (and test!) what we actually needed to and have a decorator fill in the rest.

Library - Ephesus

Enter Boost iterators

Actually, we can! I’m grateful to proggitor dzorz for telling me how.

A nicer solution would use boost::iterator_facade and just implement dereference, equal, increment, decrement, advance and distance_to.

Like many programmers I have mixed feelings about C++ — when it’s good it’s very very good, but when it’s bad it’s horrid — and these feelings are only amplified by the Boost library. Boost is superb, so long as you stick to the good parts.

So which parts are good? It depends. On you, who you work with, and the platforms you’re working on.

In my previous article I used an ingenious iterator adaptor from the Boost.Spirit parser library to disastrous effect. If only I’d looked a little more carefully I’d have discovered something more useful in a more obvious place. Boost.Iterator could have helped.

As dzorz points out, boost::iterator_facade can work with any C++ iterable. Implement whatever subset of

  • dereference
  • equal
  • increment
  • decrement
  • advance
  • distance_to

is appropriate and iterator_facade will fill in the boilerplate required to standardise your iterator.

In our case, we’ll need the full set. That’s because we’re after a random access iterator. Other iterators need rather less. Here’s a table showing the relationship between core operations and iterator concepts.

iterator_facade Core Operations

Using boost::iterator_facade

The Boost.Iterator documentation is well-written but daunting. Read it from top-to bottom and you’ll get:

  • rationale and theory
  • plans for standardisation (which don’t seem correct [2])
  • usage notes
  • some subtle points on the implementation and its predecessor
  • a namecheck for the curiously recurring template pattern
  • a fat reference section detailing the boilerplate which this library allows you to forget
  • a tutorial

If you’re tempted to skip to the end of the page, you’ll see this code block.

#include <boost/type_traits/is_convertible.hpp>
#include <boost/utility/enable_if.hpp>
  
  ....
  
private:
  struct enabler {};
  
public:
  template <class OtherValue>
  node_iter(
      node_iter<OtherValue> const& other
    , typename boost::enable_if<
          boost::is_convertible<OtherValue*,Value*>
        , enabler
      >::type = enabler()
  )
    : m_node(other.m_node) {}

According to the surrounding documentation this is “magic”. I find it scary.

Luckily it turns out the library is straightforward to use. What you really want, as a newcomer, are the usage notes and the tutorial example.

The tutorial walks through the process of skinning a singly-linked list with a forwards iterator facade. This is a different use case to ours: the tutorial shows a basic class which implements what it should, and the facade allows it to be treated as a forwards iterator. In our case we’ve already created a full-blown random access iterator. We can retrospectively apply iterator_facade to strip our class back to basics.

Templates and Traits

Where we had:

template <typename item>
class text_file_iter
{
public: // Traits typedefs, which make this class usable with
        // algorithms which need a random access iterator.
    typedef std::random_access_iterator_tag iterator_category;
    typedef item value_type;
    typedef std::streamoff difference_type;
    typedef item * pointer;
    typedef item & reference;
....
};

We now need (my thanks here to Giuseppe for correcting the code I originally posted here):

template <typename value>
class text_file_iter
    : public boost::iterator_facade<
      text_file_iter<value>
    , value
    , std::random_access_iterator_tag
    , value
    , std::streamoff
    >
....
};

(Yes, the class accepts itself as a template parameter. That’s the curious recursion.)

Constructors, destructors and operators

We still need iterator constructors and destructors — these are unchanged — but we can eliminate every single operator shown in the “Iterator boilerplate” code block above.

Here’s what we need instead, to ensure iterator_facade can do its job. The read() member function we had before doesn’t need changing.

    ....
private: // Everything Boost's iterator facade needs
    friend class boost::iterator_core_access;
    
    value dereference() const
    {
        return read();
    }
    
    bool equal(iter const & other) const
    {
        return pos == other.pos;
    }
    
    void increment()
    {
        advance(1);
    }
    
    void decrement()
    {
        advance(-1);
    }
    
    void advance(std::streamoff n)
    {
        in.seekg(n, std::ios_base::cur);
        pos = in.tellg();
    }
    
    std::streamoff distance_to(iter const & other) const
    {
        return other.pos - pos;
    }

And that really is all there is to it. I’m impressed.

Notice, by the way, that friend is used to expose the primitive, private member functions for use by the boost::iterator_core_access class. This follows the example set by the tutorial. I’ve written enough C and Python to question C++’s sophisticated access rules — you have public, protected and private, but that’s still not enough, so you need friend declaration to cut through it all — which tempts me to simply make dereference(), equal() etc. public, but then the facade wouldn’t be a proper facade. Users should treat the final class exactly as they would any other random access iterator, and designating these members as private means they’ll have to.

Wrinkles

You’ll notice the dereference() member function has a const signature. However, the read() member function is non-const.

    // Return the item at the current position
    value_type read()
    {
        value_type n = 0;
        
        // Reverse till we hit whitespace or the start of the file
        while (in && !isspace(in.peek()))
        {
            in.unget();
        }
        in.clear();
        
        in >> n;
        return n;
    }

Here, in is a data member of type std::ifstream, and clearly the read call modifies it. That’s what this alarming compiler error is trying to tell us.

boost_binary_search_text_file.cpp: In member function 'value text_file_iter<value>::read() const [with value = long long int]':
boost_binary_search_text_file.cpp:90:   instantiated from 'value text_file_iter<value>::dereference() const [with value = long long int]'
/opt/local/include/boost/iterator/iterator_facade.hpp:516:   instantiated from 'static typename Facade::reference boost::iterator_core_access::dereference(const Facade&) [with Facade = text_file_iter<long long int>]'
/opt/local/include/boost/iterator/iterator_facade.hpp:634:   instantiated from 'Reference boost::iterator_facade<I, V, TC, R, D>::operator*() const [with Derived = text_file_iter<long long int>, Value = long long int, CategoryOrTraversal = boost::random_access_traversal_tag, Reference = long long int, Difference = long long int]'
/usr/include/c++/4.2.1/bits/stl_algo.h:4240:   instantiated from 'bool std::binary_search(_ForwardIterator, _ForwardIterator, const _Tp&) [with _ForwardIterator = text_file_iter<long long int>, _Tp = main::number]'
boost_binary_search_text_file.cpp:203:   instantiated from here
boost_binary_search_text_file.cpp:174: error: passing 'const std::basic_ifstream<char, std::char_traits<char> >' as 'this' argument of 'typename std::basic_istream<_CharT, _Traits>::int_type std::basic_istream<_CharT, _Traits>::peek() [with _CharT = char, _Traits = std::char_traits<char>]' discards qualifiers
boost_binary_search_text_file.cpp:176: error: passing 'const std::basic_ifstream<char, std::char_traits<char> >' as 'this' argument of 'std::basic_istream<_CharT, _Traits>& std::basic_istream<_CharT, _Traits>::unget() [with _CharT = char, _Traits = std::char_traits<char>]' discards qualifiers
boost_binary_search_text_file.cpp:178: error: passing 'const std::basic_ifstream<char, std::char_traits<char> >' as 'this' argument of 'void std::basic_ios<_CharT, _Traits>::clear(std::_Ios_Iostate) [with _CharT = char, _Traits = std::char_traits<char>]' discards qualifiers
boost_binary_search_text_file.cpp:180: error: passing 'const std::basic_ifstream<char, std::char_traits<char> >' as 'this' argument of 'std::basic_istream<_CharT, _Traits>& std::basic_istream<_CharT, _Traits>::operator>>(long long int&) [with _CharT = char, _Traits = std::char_traits<char>]' discards qualifiers

Related to this, the Reference template parameter (shown in bold in the listing below) is actually a value, rather than the (default) value &. As we originally implemented it, our file iterator reads values lazily, only when clients request them. We have no reference to return.

template <typename value>
class text_file_iter
    : public boost::iterator_facade<
      text_file_iter<value>
    , value
    , std::random_access_iterator_tag
    , value
    , std::streamoff
    >
};

I faced a dilemma here. Either I could modify my original file iterator, including a current value data member, which I would take care to update every time we repositioned the file read position. Then our references could be real references and read() would naturally be const, simply returning a reference to this member. Or, I could make the in input stream data member mutable.

Mutable makes me uneasy for the same reason that friend does — if you can shake off the rigours of const-correctness so easily, then why bother with it? — and for this reason the first option appealed. However, a read-only file is an unusual container: we do not change it, but we cannot supply const references to its elements without reading them in, and that will mean changes to the file input stream. The easier option, involving the smallest code change, was to make in mutable. So that’s what I did.

Less code, more software

By employing two of my least favourite C++ keywords I now had a class which provided the functions it should, and, thanks to the magic worked by Boost’s iterator facade, I also had a class which I could use as a standard random access iterator. Most of the code changes were the deletion of boilerplate — very satisfying. I added code too, since I decided to invest a little more effort in tests. I didn’t have any doubts about the Boost library’s correctness but I thought I might not have been using it correctly. Happily these tests all passed.

Performance

Let’s not forget why we originally wanted a random access file iterator: we had a large sorted file, too large to read into memory, and we wanted to test for the presence of the number in this file.

For test purposes I created a file just over 4GB in size. A simple linear search through this file took around 180 seconds on my (aging laptop) computer, and was light on memory use. By creating a random access file iterator, boilerplate and all, we took advantage of the standard binary search algorithm and reduced the time to around 4 milliseconds.

How would our version using Boost iterator facade do? I wasn’t expecting it to be faster than the original, but I wouldn’t have been surprised if it gave it a close run: using Boost doesn’t usually involve compromise. In fact, over repeated runs of my performance tests there was no significant difference between the two iterator versions — or at least there wasn’t once a helpful reader had discovered and fixed a bug in my program, which was causing it to run correctly but slowly.

To trust a facade I guess you need some knowledge of what lies behind it.

§

Note: During my original performance tests, reported in the first version of this article, the Boost iterator performed woefully, far slower even than a linear search. By this time I’d lost patience, and it was left up to a reader, Giuseppe, to point out my mistake. I’d been using a boost::random_access_traversal_tag template parameter, with the result that std::distance() was using repeated increments rather than calling distance_to() to get an immediate result, and consequently ran very slowly. I should have used std::random_access_iterator_tag. I modified my code accordingly and confirmed that the Boost version does indeed perform on a par with the original version.


My thanks to Chris P for permission to use his photograph of the Library of Celsus at Ephesus, or at rather its beautiful facade. Ephesus is famous for the Temple of Artemis, one of the seven wonders of the ancient world, of which only fragments remain. Thanks too to Dave Hamster for the boilerplate photo — actually a detail from the hull of the SS Great Britain.

If you’d like to continue this experiment the code and the tests I used are available via anonymous SVN access from http://svn.wordaligned.org/svn/etc/search_text_file.


[1]: As of Python 2.7 and 3.2, the standard library will include a version of this recipe. It’s in the functools module. For some reason, your class “should supply an __eq__()” method.

[2]: According to the Boost.Iterator documentation:

Both iterator_facade and iterator_adaptor as well as many of the specialized adaptors mentioned below have been proposed for standardization, and accepted into the first C++ technical report; see our [Standard Proposal For Iterator Facade and Adaptor (PDF)][tr1proposal] for more details.

I assumed this meant there’d be tr1::iterator_(facade|adaptor) classes, but I don’t think that’s the case. Unlike other (good) bits of Boost, the Iterator library doesn’t seem likely to be part of the next C++ release.