Octal Literals

2006-08-12, , Comments

I recently discovered that you could write binary literals directly using Ruby, which I thought a good idea. Programming languages have to think in binary so it’s important that a language should support them naturally. I also spotted that Ruby extends the usual C convention for octal literals. In this case, I think Ruby makes the mistake of building on a broken design.

Octal Integers in Ruby

Here’s an example:

irb(main):001:0> puts 0O377, 0377, 377
255
255
377
=> nil

You did notice the Latin Capital Letter O didn’t you? The one next to the number 0. You didn’t! Then let’s try a variation:

irb(main):001:0> puts 0o377, 0377, 377
255
255
377
=> nil

Perhaps the Latin Small Letter O was more obvious?

The Problem with Octal Numbers

The optional O (that’s the letter O) to explicitly indicate the base does make some sort of sense — it’s consistent with the X for hexadecimal and indeed the B which Ruby adds for binary. But really it’s just adding confusion to an already confusing design.

Consider the following C array of numbers:

static int const countdown[] = {
    100,
    099,
    098,
    097,
    ....
    000
};

Here, a novice programmer has padded the numbers in the countdown with leading zeros to make them line up nicely. Fortunately the compiler catches the problem in this case:

invalid digit "9" in octal constant

We might not have been so lucky, though. Here’s some dangerously broken Python:

roman_numerals = {
    "C" : 100,
    "L" : 050,
    "X" : 010,
    "V" : 005,
    "I" : 001,
}

This runs through the interpreter without raising a SyntaxError. We do have a semantic error, though. "L" and "X" map to octal literals with decimal values 40 and 8 respectively. 0ops!

When are Octal Literals Needed?

I’ve never really needed them. 8 is a power of two, but so is 16: if it’s a binary number we need, a binary literal would be better; and, in the absence of language support for binary literals, a hexadecimal number is more useful than an octal since two hex numbers make up a byte.

Occasionally octals are useful ways to insert a non-printable character into a string literal. Here’s an example:

Octal value in a String Literal
std::string s = "ABC\177DEF";

Here, the escaped octal value 177 is embedded into the string. Octal 177 equals hexadecimal 7F, but we run into trouble if we try:

Very large Hex value in a String Literal
std::string s = "ABC\x7FDEF";

Here, the "DEF" characters are valid hexadecimal and therefore become part of the number we’re embedding; so we’ve tried to put the hex number 7FDEF into a byte. If we’re lucky our compiler will warn us:

warning: hex escape sequence out of range

If we’re unlucky or if we don’t act on the warning, the result is implementation defined. In any case, it’s certainly not what we wanted. Of course, embedded octal escape sequences suffer from the exact same problem if succeeded by one of the letters "0" - "7".

The workaround is simple:

Hex 7F in a String Literal
std::string s = "ABC\x7F" "DEF";

or even:

Hex 7F in a String Literal
std::string s = "ABC" "\x7F" "DEF";

In other words, even this use of octal values is of limited practical use.

Optional Radices for Integral Literals

Octal literals — as implemented in the C family of languages — are problematic and not especially useful. However, it is occasionally useful to be able to write numbers using a different radix (and probably more useful than we realise since we’ve never been able to try it). I’ve already said why I think binary numbers are desirable. Hexadecimal numbers, which pack so neatly into bytes, are also of special interest.

But why restrict ourselves to radices 10, 16, 8 and 2? A bit of Googling found this suggestion from Andrew Koenig on a Python mail list archive:

I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value.