Rewriting String.Left()

2008-07-13Comments

Over at Coding Horror, Jeff Atwood codes up String.Left() in C#.

For example, in C#, there is no String.Left() function. Fair enough; we can roll up our sleeves and write our own function lickety-split:

public static string Left(string s, int len)
{
    if (len == 0 || s.Length == 0)
        return "";
    else if (s.Length <= len)
        return s;
    else
        return s.Substring(0, len);
}

He’s using this function to introduce C# extension-methods, but I’m more interested in making some sweeping generalisations about Java code.

In the article’s comments section PRMan points out this function “blows up” if the input string is null, and offers a simple fix:

public static string Left(string s, int len)
{
    if (s == null)
        return s;
    else if (len == 0 || s.Length == 0)
        return "";
    else if (s.Length <= len)
        return s;
    else
        return s.Substring(0, len);
}

Now, I’ve never written any C#. In fact I don’t even have a C# compiler to hand. But this code looks like everything which ever bugged me about Java. What’s the point of writing a string function which accepts a null input? All we’ve done is force clients of String.Left() to handle null inputs too. Just (implicitly) raise a null pointer exception (which is not the same as blowing up) and be done with it.

It’s unfair to suggest this bogus defensive coding has anything to do with Java — you can do the same thing in any language — but in my personal experience lots of Java code gets written this way, to the extent that I used to regard these nervy null checks as being idiomatic. So I was relieved when a Michael Feathers article put me straight:

Spurious null checks are a symptom of bad code.

Feathers’ examples are coded in Java, so maybe I wasn’t being unfair after all. If you’re really concerned about String.Left() handling whatever clients might throw at it, then it’s more important to deal with negative values of len.

So we’re back to Atwood’s original version which reads the way a programmer might think about this problem:

The leftmost n characters is just s.Substring(0, len), but, careful!, what if len is too big, and what about if s is empty, or if len is zero?

Hence we end up with a three way conditional with three separate and rather different looking return values, in which the most interesting case appears last. I think this code could be more readable. First (unless I’m missing something about C#) we can get rid of the special case of (len == 0 || s.Length == 0) since the subsequent clauses do the right thing. (By the way, I’m switching to Java now since I have no C# compiler to hand.)

    public static String left(String s, int len)
    {
        if (s.length() <= len) {
            return s;
        } else {
            return s.substring(0, len);
        }
    }

I personally would transform this to:

    public static String left(String s, int len){
        return s.substring(0, Math.min(len, s.length()));
    }

It’s not that I’m particularly against multiple returns, and I don’t claim this tweak is a big improvement: I just prefer a more declarative style, which clearly says we’re returning a substring of s, starting at index 0, of length len or s.length() if smaller.

In fact, I don’t think many Java programmers would agree I’ve improved things. Java seems to favour explicit control flow over even this trivial functional composition. Which is why I don’t miss programming in Java, and why I’m in no hurry to try C#.

I also wonder if Jeff Atwood’s original version of String.Left() is, in theory, more efficient. Will a return value of "" be pooled? Does it make sense to return the original string whenever possible, to avoid creating unnecessary objects? I don’t know enough about C# or Java say if there’s any truth in these suggestions. I do know that such savings are extremely unlikely to make any real difference for real strings in real programs, and that sacrificing readability for theoretical efficiency is a mistake.

On the subject of readability, String.Left() is really just a special case of sequence slicing — a Python feature which I’d like to monkey-patch into other languages.