Singly Linked Lists in C++
In a recent post I wrote about removing items from a singly linked list. I presented a couple of alternative implementations, and in the comments section readers suggested yet more versions.
My implementations were written in C: the post was inspired by something Linus Torvalds had said about low-level programming skills, and I’m guessing he meant C programming skills. The fact is, C programmers are condemned to reimplement these basic functions on this basic structure because the C standard library has nothing to say about singly linked lists. Until recently the C++ standard library was similarly silent on the subject, only offering a doubly linked list.
C++ introduces … the linked list!
That’s all changed now with the introduction of std::forward_list. The class interface doesn’t mention links or pointers but a quick glance through its contents makes it clear that if you imagine the container to be a templated version of a classic singly-linked list, you won’t go far wrong.
This gives forward_list some members you won’t find elsewhere in the std:: namespace. For example, std::forward_list::before_begin(), which returns an iterator just before the beginning of the list — much as the more familiar end() is just past the end.
Why is before_begin() necessary? You can’t dereference it and you can’t reverse through a forward list till you get to it. Well, since forward list iterators can only go forwards, instead of the familiar sequence insert(), erase() and emplace() methods we have insert_after(), erase_after() and emplace_after(), not to mention splice_after(). The before-the-beginning iterator allows you to use these operations to modify the node at the head of the list.
Quick question: what’s the complexity of std::list::size()?
Trick question: and how about std::forward_list::size()?
Remove_if for forward lists
Using pointers-to-pointers to modify linked lists gives this elegant and compact C implementation of remove_if(), which de-lists all nodes which match a supplied predicate.
void remove_if(node ** head, remove_fn rm)
{
for (node** curr = head; *curr; )
{
node * entry = *curr;
if (rm(entry))
{
*curr = entry->next;
free(entry);
}
else
curr = &entry->next;
}
}
How does the C++ standard library support this algorithm?
Std::remove_if() operates on an iterator range, remove_if(first, last, pred). All it requires is that the iterators are forward iterators so it should just work on a forward_list.
Hang on though: what if pred(*first) is true? How can a node be removed from a linked list without reference to its predecessor? Actually, the node isn’t removed — the value it contains gets overwritten by the value in the first node (if any!) for which the predicate returns false. In fact, remove_if doesn’t remove anything from the container! What it does is return an iterator, call it new_last, such that the range (first, new_last) holds the elements which have been kept, and (new_last, last) holds those which have been “removed”, which is to say they can still be dereferenced but their value is implementation dependent.
Remove_if usually pairs up with erase:
container.erase(remove_if(first, last, pred), last);
There is no std::forward_list::erase(iterator) — remember, we can only erase after — so the usual remove_if algorithm is of little use for forward lists.
Forward_list::remove_if()
As usual, we should prefer member functions to algorithms with the same names. C++’s forward_list has its very own remove_if which manipulates pointers rather than moves values, and which really does remove and destroy items.
Forward_list::remove_if() operates on the list as a whole, not an iterator range — as we’ve seen, an iterator cannot remove itself. I took a look at a couple of implementations of this function to see how it’s done.
Here’s LLVM’s libc++ implementation:
template <class _Tp, class _Alloc>
template <class _Predicate>
void
forward_list<_Tp, _Alloc>::remove_if(_Predicate __pred)
{
iterator __e = end();
for (iterator __i = before_begin(); __i.__ptr_->__next_ != nullptr;)
{
if (__pred(__i.__ptr_->__next_->__value_))
{
iterator __j = _VSTD::next(__i, 2);
for (; __j != __e && __pred(*__j); ++__j)
;
erase_after(__i, __j);
if (__j == __e)
break;
__i = __j;
}
else
++__i;
}
}
There’s no need for any special treatment of the first list node here, since we have its predecessor, the before_begin() node. The function does double back though, figuring out the next range to erase before going back to erase it — and the erase function isn’t pretty.
template <class _Tp, class _Alloc>
typename forward_list<_Tp, _Alloc>::iterator
forward_list<_Tp, _Alloc>::erase_after(const_iterator __f, const_iterator __l)
{
__node_pointer __e = const_cast<__node_pointer>(__l.__ptr_);
if (__f != __l)
{
__node_pointer __p = const_cast<__node_pointer>(__f.__ptr_);
__node_pointer __n = __p->__next_;
if (__n != __e)
{
__p->__next_ = __e;
__node_allocator& __a = base::__alloc();
do
{
__p = __n->__next_;
__node_traits::destroy(__a, _VSTD::addressof(__n->__value_));
__node_traits::deallocate(__a, __n, 1);
__n = __p;
} while (__n != __e);
}
}
return iterator(__e);
}
For comparison, here’s how GCC’s libstdc++ does the same thing.
template<typename _Tp, typename _Alloc>
template<typename _Pred>
void
forward_list<_Tp, _Alloc>::
remove_if(_Pred __pred)
{
_Node* __curr = static_cast<_Node*>(&this->_M_impl._M_head);
while (_Node* __tmp = static_cast<_Node*>(__curr->_M_next))
{
if (__pred(*__tmp->_M_valptr()))
this->_M_erase_after(__curr);
else
__curr = static_cast<_Node*>(__curr->_M_next);
}
}
Here, erasing (after a) node reads:
template<typename _Tp, typename _Alloc>
_Fwd_list_node_base*
_Fwd_list_base<_Tp, _Alloc>::
_M_erase_after(_Fwd_list_node_base* __pos)
{
_Node* __curr = static_cast<_Node*>(__pos->_M_next);
__pos->_M_next = __curr->_M_next;
_Tp_alloc_type __a(_M_get_Node_allocator());
allocator_traits<_Tp_alloc_type>::destroy(__a, __curr->_M_valptr());
__curr->~_Node();
_M_put_node(__curr);
return __pos->_M_next;
}
This version walks through the list removing nodes which match the predicate as it finds them. Don’t be confused by &this->_M_impl._M_head: it’s not the node at the head of the list, it’s the one before.
Lessons from C++
Maybe this code wouldn’t persaude Linus Torvalds to rethink his opinion of C++, but if you can see past the angle brackets, underscores and allocators, it’s simple enough.
It’s also:
- subtle, so I’m glad someone else has written and checked it
- generic, so I can put what I want in a list without casting or indirection
- standard, so I know what to expect
- supported
The before-begin node idea serves equally well in C, enabling list modifiers which have no need of double indirection or special case code for the list head.
void remove_after(node * prev, remove_fn rm)
{
while (prev->next != NULL)
{
node * curr = prev->next;
if (rm(curr))
{
prev->next = curr->next;
free(curr);
}
else
prev = curr;
}
}
Pass this function the before-begin node to remove all items from the list which match the predicate.