Octal Literals
I recently discovered that you could write binary literals directly using Ruby, which I thought a good idea. Programming languages have to think in binary so it’s important that a language should support them naturally. I also spotted that Ruby extends the usual C convention for octal literals. In this case, I think Ruby makes the mistake of building on a broken design.
Octal Integers in Ruby
Here’s an example:
irb(main):001:0> puts 0O377, 0377, 377 255 255 377 => nil
You did notice the Latin Capital Letter O didn’t you? The one next to the number 0. You didn’t! Then let’s try a variation:
irb(main):001:0> puts 0o377, 0377, 377 255 255 377 => nil
Perhaps the Latin Small Letter O was more obvious?
The Problem with Octal Numbers
The optional O (that’s the letter O) to explicitly indicate the base does make some sort of sense — it’s consistent with the X for hexadecimal and indeed the B which Ruby adds for binary. But really it’s just adding confusion to an already confusing design.
Consider the following C array of numbers:
static int const countdown[] = { 100, 099, 098, 097, .... 000 };
Here, a novice programmer has padded the numbers in the countdown with leading zeros to make them line up nicely. Fortunately the compiler catches the problem in this case:
invalid digit "9" in octal constant
We might not have been so lucky, though. Here’s some dangerously broken Python:
roman_numerals = { "C" : 100, "L" : 050, "X" : 010, "V" : 005, "I" : 001, }
This runs through the interpreter without raising a
SyntaxError
. We do have a semantic error, though. "L"
and "X"
map to
octal literals with decimal values 40 and 8 respectively. 0ops!
When are Octal Literals Needed?
I’ve never really needed them. 8 is a power of two, but so is 16: if it’s a binary number we need, a binary literal would be better; and, in the absence of language support for binary literals, a hexadecimal number is more useful than an octal since two hex numbers make up a byte.
Occasionally octals are useful ways to insert a non-printable character into a string literal. Here’s an example:
std::string s = "ABC\177DEF";
Here, the escaped octal value 177
is embedded into the string. Octal 177
equals hexadecimal 7F
, but we run into trouble if we try:
std::string s = "ABC\x7FDEF";
Here, the "DEF"
characters are valid hexadecimal and therefore
become part of the number we’re embedding; so we’ve tried to put the
hex number 7FDEF
into a byte. If we’re lucky our
compiler will warn us:
warning: hex escape sequence out of range
If we’re unlucky or if we don’t act on the warning, the result is
implementation defined. In any case, it’s certainly not what we
wanted. Of course, embedded octal escape sequences suffer from the
exact same problem if succeeded by one of the letters "0" - "7"
.
The workaround is simple:
std::string s = "ABC\x7F" "DEF";
or even:
std::string s = "ABC" "\x7F" "DEF";
In other words, even this use of octal values is of limited practical use.
Optional Radices for Integral Literals
Octal literals — as implemented in the C family of languages — are problematic and not especially useful. However, it is occasionally useful to be able to write numbers using a different radix (and probably more useful than we realise since we’ve never been able to try it). I’ve already said why I think binary numbers are desirable. Hexadecimal numbers, which pack so neatly into bytes, are also of special interest.
But why restrict ourselves to radices 10, 16, 8 and 2? A bit of Googling found this suggestion from Andrew Koenig on a Python mail list archive:
I am personally partial to allowing an optional radix (in decimal) followed by the letter r at the beginning of a literal, so 19, 8r23, and 16r13 would all represent the same value.