Tripped up by Rstrip
Here’s a simple function which is supposed to strip any ".jpg"
extension
from a file name.
def photo_name(photo_file): "Return the name of the photo" return photo_file.rstrip('.jpg') assert photo_name('cat.jpg') == 'cat' assert photo_name('selfie.jpg') == 'selfie' assert photo_name('emoji.gif') == 'emoji.gif' # A GIF is not a photo
As shown, it passes a few simple tests.
Unfortunately it turns out to be broken.
>>> photo_name('dog.jpg') 'do' >>> photo_name('tile.png') 'tile.pn'
There’s no mystery here. A check of the documentation shows
that the optional chars
parameter to str.rstrip
specifies a set of
trailing characters to be removed from the source string.
So, in the example above, '.jpg'
means: strip trailing
characters in the set {'.', 'j', 'p', 'g'}
. In the case of
'dog.jpg'
that includes the final 'g'
of 'dog'
. Similarly
the final 'g'
of 'tile.png'
gets stripped.
That said, it’s a common misunderstanding, and one which has been made
for the eighteen years since the optional chars
argument got added
to rstrip
. At Python 3.9 the documentation takes care to
explain:
The
chars
argument is a string specifying the set of characters to be removed. If omitted orNone
, thechars
argument defaults to removing whitespace. Thechars
argument is not a suffix; rather, all combinations of its values are stripped.
and points out that the new removesuffix
method might be what you’re really after.
Out of interest I tracked the documentation back to Python 2.3, when the function description was less clear.
If given and not None,
chars
must be a string; the characters in the string will be stripped from the end of the string this method is called on. Changed in version 2.2.2: Support for thechars
argument.
Luckily misuse of rstrip
to remove extensions will usually get
spotted soon enough, even if — as shown — it evades can cursory inspection and testing.
It’s worth reviewing why the confusion persists.
str.rstrip()
to remove trailing whitespace is a common thing to want to do- removing a suffix is also a common thing to want to do
- removing a set of trailing chars from a string is less common (except the special case of stripping whitespace)
- the
chars
parameter tostr.rstrip()
is not a set, it is an ordered sequence s.rstrip('.jpg')
(for example) will remove any'.jpg'
suffix froms
, so it sort-of works