Binary Literals

2006-08-06, Comments

When I examined Ruby’s syntax for integral literals I found the usual suspects — decimal, octal, and hexadecimal numbers. There were also a couple of pleasant surprises.

Binary Literals

Ruby supports binary literals. It’s always been mystery to me why other languages don’t support these – you always need them when you need to pack and unpack binary data. Many of us old C/C++ hackers probably use hexadecimal numbers for this purpose, to the extent that we probably know the binary values of the first sixteen hex numbers — but they still don’t read well. For example, here, in different languages, are bitmasks to get at bits 2 to 5 (inclusive) of a number. I’m counting the least significant bit as bit 0 here.

mask = 0b111100    # Ruby, binary literal
mask = 0x3C        # Python, hexadecimal numeric literal
unsigned long const mask
     = 0x3C;       // C++, hexadecimal numeric literal

It’s clear that the Ruby literal is easiest to interpret as a bit mask.

In C++ — and other C-family languages — we could equally try and express our intent of extracting bits 2, 3, 4, 5 more directly:

 unsigned long const mask
     = ~(~0u << 6 - 2) << 2;

With a bit of squinting, we read this as: “set bits in the half open range from bits 2 to 6”.

Note in passing that the subtraction operator, -, binds more tightly than the left shift operator, <<, despite what you might expect, so we are in fact left shifting ~0u by 4.

In C++ we might even consider working around the lack of binary literals using a bitset.

unsigned long const mask
    = std::bitset<6>(std::string("111100")).to_ulong();

This is, however, non-idiomatic and inefficient, and really just exposes the language’s lack of binary literals.

Underscore as Separators

Binary literals soon get tricky to read. Runs of 1’s and 0’s can be hard on the eye. How readable, for example, would our Ruby literal be if we wanted to mask out bits 9, 10, 11?

mask = 0b111000000000

Fortunately, Ruby allows us to place underscores in numeric literals, which, gives us an equivalent number:

mask = 0b1110_0000_0000

Here, I’ve inserted the underscores every 4 bits — that is, at the boundaries of nibbles. Nice one Ruby!