Tripped up by Rstrip
Here’s a simple function which is supposed to strip any ".jpg" extension
from a file name.
def photo_name(photo_file):
"Return the name of the photo"
return photo_file.rstrip('.jpg')
assert photo_name('cat.jpg') == 'cat'
assert photo_name('selfie.jpg') == 'selfie'
assert photo_name('emoji.gif') == 'emoji.gif' # A GIF is not a photo
As shown, it passes a few simple tests.
Unfortunately it turns out to be broken.
>>> photo_name('dog.jpg')
'do'
>>> photo_name('tile.png')
'tile.pn'
There’s no mystery here. A check of the documentation shows
that the optional chars parameter to str.rstrip specifies a set of
trailing characters to be removed from the source string.
So, in the example above, '.jpg' means: strip trailing
characters in the set {'.', 'j', 'p', 'g'}. In the case of
'dog.jpg' that includes the final 'g' of 'dog'. Similarly
the final 'g' of 'tile.png' gets stripped.
That said, it’s a common misunderstanding, and one which has been made
for the eighteen years since the optional chars argument got added
to rstrip. At Python 3.9 the documentation takes care to
explain:
The
charsargument is a string specifying the set of characters to be removed. If omitted orNone, thecharsargument defaults to removing whitespace. Thecharsargument is not a suffix; rather, all combinations of its values are stripped.
and points out that the new removesuffix method might be what you’re really after.
Out of interest I tracked the documentation back to Python 2.3, when the function description was less clear.
If given and not None,
charsmust be a string; the characters in the string will be stripped from the end of the string this method is called on. Changed in version 2.2.2: Support for thecharsargument.
Luckily misuse of rstrip to remove extensions will usually get
spotted soon enough, even if — as shown — it evades can cursory inspection and testing.
It’s worth reviewing why the confusion persists.
str.rstrip()to remove trailing whitespace is a common thing to want to do- removing a suffix is also a common thing to want to do
- removing a set of trailing chars from a string is less common (except the special case of stripping whitespace)
- the
charsparameter tostr.rstrip()is not a set, it is an ordered sequence s.rstrip('.jpg')(for example) will remove any'.jpg'suffix froms, so it sort-of works