Tripped up by Rstrip

2021-05-14Comments

Here’s a simple function which is supposed to strip any ".jpg" extension from a file name.

def photo_name(photo_file):
    "Return the name of the photo"
    return photo_file.rstrip('.jpg')

assert photo_name('cat.jpg') == 'cat'
assert photo_name('selfie.jpg') == 'selfie'
assert photo_name('emoji.gif') == 'emoji.gif' # A GIF is not a photo

As shown, it passes a few simple tests.

Unfortunately it turns out to be broken.

>>> photo_name('dog.jpg')
'do'
>>> photo_name('tile.png')
'tile.pn'

There’s no mystery here. A check of the documentation shows that the optional chars parameter to str.rstrip specifies a set of trailing characters to be removed from the source string.

So, in the example above, '.jpg' means: strip trailing characters in the set {'.', 'j', 'p', 'g'}. In the case of 'dog.jpg' that includes the final 'g' of 'dog'. Similarly the final 'g' of 'tile.png' gets stripped.

That said, it’s a common misunderstanding, and one which has been made for the eighteen years since the optional chars argument got added to rstrip. At Python 3.9 the documentation takes care to explain:

The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a suffix; rather, all combinations of its values are stripped.

and points out that the new removesuffix method might be what you’re really after.

Out of interest I tracked the documentation back to Python 2.3, when the function description was less clear.

If given and not None, chars must be a string; the characters in the string will be stripped from the end of the string this method is called on. Changed in version 2.2.2: Support for the chars argument.

Luckily misuse of rstrip to remove extensions will usually get spotted soon enough, even if — as shown — it evades can cursory inspection and testing.

It’s worth reviewing why the confusion persists.

  1. str.rstrip() to remove trailing whitespace is a common thing to want to do
  2. removing a suffix is also a common thing to want to do
  3. removing a set of trailing chars from a string is less common (except the special case of stripping whitespace)
  4. the chars parameter to str.rstrip() is not a set, it is an ordered sequence
  5. s.rstrip('.jpg') (for example) will remove any '.jpg' suffix from s, so it sort-of works