Converting integer literals in C++ and Python

2009-08-06, Comments

An integral literal in a C program can be decimal, hexadecimal or octal.

int percent = 110;
unsigned flags = 0x80;
unsigned agent = 007;

This snippet would be equivalent to (e.g.):

int percent = 0156;
unsigned flags = 128;
unsigned agent = 0x7;

So programmers can choose the best of these options when including numbers in their code.

Python adopted this same C syntax, but has recently gone on to extend and modify it. Some Python 2.6 numbers:

Python 2.6
>>> 0x80, 110, 007, 0O7, 0o7, 0b10000000
(128, 110, 7, 7, 7, 128)

I’m pleased to see support for binary literals, which are useful for (e.g.) bitmasks. I’ve never really seen the point of octals; nonetheless, they’ve been enhanced for Python 3. Python 2.6 backports the new improved octal literal syntax whilst retaining support for classic C-style octals. Python 3 drops C-style octals.

Python 3.1
>>> 007
  File "<stdin>", line 1
SyntaxError: invalid token
>>> 0O7

Now consider the compiler/interpreter writer’s problem. Clearly it must be possible to take a string representing an integer literal and work out what number it represents. At a first glance, the int() builtin isn’t quite smart enough to do the job without us supplying an explicit base for the conversion:

>>> int('0xff')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '0xff'
>>> int('0xff', 16)

We might consider reading any prefix from the literal and dispatching the string to an appropriate handler. Something like this:

def integer_literal_value(s):
    if s.startswith('0x'):
        return int(s, 16)
    if s.startswith('0b'):
        return int(s, 2)

Yuck! Surely there’s an easier way to do something this fundamental? Well, there’s always eval(), which turns the interpreter on itself.

>>> def integer_literal_value(s): return eval(s)
>>> v = integer_literal_value
>>> v('0x80'), v('0o7'), v('0b1010101'), v('42')
(128, 7, 85, 42)

We should have looked more carefully at the int() documentation:

int([x[, radix]]) … The radix parameter gives the base for the conversion (which is 10 by default) and may be any integer in the range [2, 36], or zero. If radix is zero, the proper radix is determined based on the contents of string; the interpretation is the same as for integer literals.


>>> from functools import partial
>>> integer_literal_value = partial(int, base=0)
>>> v = integer_literal_value
>>> v('0x80'), v('0o7'), v('0b1010101'), v('42')
(128, 7, 85, 42)

(Notice, by the way, that radix is used in the online documentation but the actual argument name is base. I’ll confess that before I wrote this note I hadn’t spotted this use of zero as a special value for string→integer conversions even though it’s been available since Python 2.1)

C++ also offers a way to convert integer literals into the numbers they represent, but it’s not very well known. As is usual for format conversions, we use streams — stringstreams typically, but here I show an example using standard input and output. The trick is to disable any numeric formatting of the input stream.

#include <iostream>

int main()
    int x;
    while (std::cin >> x)
        std::cout << x << '\n';
    return std::cin.eof() ? 0 : 1;

It works by magic.

$ g++ integer_literal_value.cpp -o integer_literal_value
$ echo 007 0x80 110 | ./integer_literal_value