Set.insert or set.add?

2010-11-17, Comments

Get set, go!

Suppose you have an element e to put in a set S.

Should you:

S.add(e)

or:

S.insert(e)

?

It depends on which language you’re using. I use C++ and Python and I usually get it wrong.

>>> S.insert(e)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'set' object has no attribute 'insert'

Try again!

error: 'class std::set<int, std::less<int>, std::allocator<int> >' 
has no member named 'add'

Maybe my IDE should auto-complete the correct member function but it doesn’t, or at least I haven’t configured it to, so instead I’ve worked out how to remember.

Now, neither C++ nor Python pins down how a set should be implemented — read the language standard and reference manual respectively and all you’ll get is an interface and some hints. Read between the lines of these references, though, or study the implementations, and you’ll soon realise a Python set is an unordered container designed for fast membership, union, intersection, and differencing operations — much like the mathematical sets I learned about at school — whereas a C++ set is an ordered container, featuring logarithmic access times and persistent iterators.

Think: C++ set ≈ binary tree; Python set ≈ hashed array.

It’s apparent which method is correct for which language now. To put something into a binary tree you must recurse down the tree and find where to insert it. Hence std::set::insert() is correct C++. To put something into a hashed array you hash it and add it right there. Hence set.add() is proper Python.

How long is a string?

I’m suggesting programmers should know at least some of what goes on in their standard language library implementations. Appreciating an API isn’t always enough. You insert into trees and add to hashes: so if your set is a tree, call S.insert(), and if it’s a hash, S.add().

Such logical arguments don’t always deliver.

Question: Suppose now that S is a string and you’re after its length. Should you use S.length() or S.size()?

Answer: Neither or both.

string [how long?]

In Python a string is a standard sequence and as for all other sequences len(S) does the trick. In C++ a string is a standard container and as for all other containers S.size() returns the number of elements; but, being std::string, S.length() does too.

Oh, and the next revision of C++ features an unordered_set (available now as std::tr1::unordered_set) which is a hashed container. I think unordered_set is a poor name for something which models a set better than std::set does but that’s the price it pays for coming late to the party. And you don’t std::unordered_set::add elements to it, you std::unordered_set::insert them.


My thanks to the|G|™ for permission to use his string.