Singly Linked Lists in C++

2013-02-07, , Comments

In a recent post I wrote about removing items from a singly linked list. I presented a couple of alternative implementations, and in the comments section readers suggested yet more versions.

My implementations were written in C: the post was inspired by something Linus Torvalds had said about low-level programming skills, and I’m guessing he meant C programming skills. The fact is, C programmers are condemned to reimplement these basic functions on this basic structure because the C standard library has nothing to say about singly linked lists. Until recently the C++ standard library was similarly silent on the subject, only offering a doubly linked list.

C++ introduces … the linked list!

That’s all changed now with the introduction of std::forward_list. The class interface doesn’t mention links or pointers but a quick glance through its contents makes it clear that if you imagine the container to be a templated version of a classic singly-linked list, you won’t go far wrong.

This gives forward_list some members you won’t find elsewhere in the std:: namespace. For example, std::forward_list::before_begin(), which returns an iterator just before the beginning of the list — much as the more familiar end() is just past the end.

Why is before_begin() necessary? You can’t dereference it and you can’t reverse through a forward list till you get to it. Well, since forward list iterators can only go forwards, instead of the familiar sequence insert(), erase() and emplace() methods we have insert_after(), erase_after() and emplace_after(), not to mention splice_after(). The before-the-beginning iterator allows you to use these operations to modify the node at the head of the list.

Quick question: what’s the complexity of std::list::size()?

Trick question: and how about std::forward_list::size()?

Remove_if for forward lists

Using pointers-to-pointers to modify linked lists gives this elegant and compact C implementation of remove_if(), which de-lists all nodes which match a supplied predicate.

void remove_if(node ** head, remove_fn rm)
{
    for (node** curr = head; *curr; )
    {
        node * entry = *curr;
        if (rm(entry))
        {
            *curr = entry->next;
            free(entry);
        }
        else
            curr = &entry->next;
    }
}

How does the C++ standard library support this algorithm?

Std::remove_if() operates on an iterator range, remove_if(first, last, pred). All it requires is that the iterators are forward iterators so it should just work on a forward_list.

Hang on though: what if pred(*first) is true? How can a node be removed from a linked list without reference to its predecessor? Actually, the node isn’t removed — the value it contains gets overwritten by the value in the first node (if any!) for which the predicate returns false. In fact, remove_if doesn’t remove anything from the container! What it does is return an iterator, call it new_last, such that the range (first, new_last) holds the elements which have been kept, and (new_last, last) holds those which have been “removed”, which is to say they can still be dereferenced but their value is implementation dependent.

Remove_if usually pairs up with erase:

container.erase(remove_if(first, last, pred), last);

There is no std::forward_list::erase(iterator) — remember, we can only erase after — so the usual remove_if algorithm is of little use for forward lists.

Forward_list::remove_if()

As usual, we should prefer member functions to algorithms with the same names. C++’s forward_list has its very own remove_if which manipulates pointers rather than moves values, and which really does remove and destroy items.

Forward_list::remove_if() operates on the list as a whole, not an iterator range — as we’ve seen, an iterator cannot remove itself. I took a look at a couple of implementations of this function to see how it’s done.

Here’s LLVM’s libc++ implementation:

template <class _Tp, class _Alloc>
template <class _Predicate>
void
forward_list<_Tp, _Alloc>::remove_if(_Predicate __pred)
{
    iterator __e = end();
    for (iterator __i = before_begin(); __i.__ptr_->__next_ != nullptr;)
    {
        if (__pred(__i.__ptr_->__next_->__value_))
        {
            iterator __j = _VSTD::next(__i, 2);
            for (; __j != __e && __pred(*__j); ++__j)
                ;
            erase_after(__i, __j);
            if (__j == __e)
                break;
            __i = __j;
        }
        else
            ++__i;
    }
}

There’s no need for any special treatment of the first list node here, since we have its predecessor, the before_begin() node. The function does double back though, figuring out the next range to erase before going back to erase it — and the erase function isn’t pretty.

template <class _Tp, class _Alloc>
typename forward_list<_Tp, _Alloc>::iterator
forward_list<_Tp, _Alloc>::erase_after(const_iterator __f, const_iterator __l)
{
    __node_pointer __e = const_cast<__node_pointer>(__l.__ptr_);
    if (__f != __l)
    {
        __node_pointer __p = const_cast<__node_pointer>(__f.__ptr_);
        __node_pointer __n = __p->__next_;
        if (__n != __e)
        {
            __p->__next_ = __e;
            __node_allocator& __a = base::__alloc();
            do
            {
                __p = __n->__next_;
                __node_traits::destroy(__a, _VSTD::addressof(__n->__value_));
                __node_traits::deallocate(__a, __n, 1);
                __n = __p;
            } while (__n != __e);
        }
    }
    return iterator(__e);
}

For comparison, here’s how GCC’s libstdc++ does the same thing.

template<typename _Tp, typename _Alloc>
    template<typename _Pred>
      void
      forward_list<_Tp, _Alloc>::
      remove_if(_Pred __pred)
      {
    _Node* __curr = static_cast<_Node*>(&this->_M_impl._M_head);
        while (_Node* __tmp = static_cast<_Node*>(__curr->_M_next))
          {
            if (__pred(*__tmp->_M_valptr()))
              this->_M_erase_after(__curr);
            else
              __curr = static_cast<_Node*>(__curr->_M_next);
          }
      }

Here, erasing (after a) node reads:

template<typename _Tp, typename _Alloc>
    _Fwd_list_node_base*
    _Fwd_list_base<_Tp, _Alloc>::
    _M_erase_after(_Fwd_list_node_base* __pos)
    {
      _Node* __curr = static_cast<_Node*>(__pos->_M_next);
      __pos->_M_next = __curr->_M_next;
      _Tp_alloc_type __a(_M_get_Node_allocator());
      allocator_traits<_Tp_alloc_type>::destroy(__a, __curr->_M_valptr());
      __curr->~_Node();
      _M_put_node(__curr);
      return __pos->_M_next;
    }

This version walks through the list removing nodes which match the predicate as it finds them. Don’t be confused by &this->_M_impl._M_head: it’s not the node at the head of the list, it’s the one before.

Lessons from C++

Maybe this code wouldn’t persaude Linus Torvalds to rethink his opinion of C++, but if you can see past the angle brackets, underscores and allocators, it’s simple enough.

It’s also:

  • subtle, so I’m glad someone else has written and checked it
  • generic, so I can put what I want in a list without casting or indirection
  • standard, so I know what to expect
  • supported

The before-begin node idea serves equally well in C, enabling list modifiers which have no need of double indirection or special case code for the list head.

void remove_after(node * prev, remove_fn rm)
{
    while (prev->next != NULL)
    {
        node * curr = prev->next;
        if (rm(curr))
        {
            prev->next = curr->next;
            free(curr);
        }
        else
            prev = curr;
    }
}

Pass this function the before-begin node to remove all items from the list which match the predicate.