Converting integer literals in C++ and Python
An integral literal in a C program can be decimal, hexadecimal or octal.
int percent = 110; unsigned flags = 0x80; unsigned agent = 007;
This snippet would be equivalent to (e.g.):
int percent = 0156; unsigned flags = 128; unsigned agent = 0x7;
So programmers can choose the best of these options when including numbers in their code.
Python adopted this same C syntax, but has recently gone on to extend and modify it. Some Python 2.6 numbers:
Python 2.6 >>> 0x80, 110, 007, 0O7, 0o7, 0b10000000 (128, 110, 7, 7, 7, 128)
I’m pleased to see support for binary literals, which are useful for (e.g.) bitmasks. I’ve never really seen the point of octals; nonetheless, they’ve been enhanced for Python 3. Python 2.6 backports the new improved octal literal syntax whilst retaining support for classic C-style octals. Python 3 drops C-style octals.
Python 3.1 >>> 007 File "<stdin>", line 1 007 ^ SyntaxError: invalid token >>> 0O7 7
Now consider the compiler/interpreter writer’s problem. Clearly it must be possible to take a string representing an integer literal and work out what number it represents. At a first glance, the int() builtin isn’t quite smart enough to do the job without us supplying an explicit base for the conversion:
>>> int('0xff') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: invalid literal for int() with base 10: '0xff' >>> int('0xff', 16) 255
We might consider reading any prefix from the literal and dispatching the string to an appropriate handler. Something like this:
def integer_literal_value(s): if s.startswith('0x'): return int(s, 16) if s.startswith('0b'): return int(s, 2) ...
Yuck! Surely there’s an easier way to do something this fundamental? Well, there’s always eval(), which turns the interpreter on itself.
>>> def integer_literal_value(s): return eval(s) ... >>> v = integer_literal_value >>> v('0x80'), v('0o7'), v('0b1010101'), v('42') (128, 7, 85, 42)
We should have looked more carefully at the int() documentation:
int([x[, radix]]) … The radix parameter gives the base for the conversion (which is 10 by default) and may be any integer in the range [2, 36], or zero. If radix is zero, the proper radix is determined based on the contents of string; the interpretation is the same as for integer literals.
Perfect!
>>> from functools import partial >>> integer_literal_value = partial(int, base=0) >>> v = integer_literal_value >>> v('0x80'), v('0o7'), v('0b1010101'), v('42') (128, 7, 85, 42)
(Notice, by the way, that radix is used in the online documentation but the actual argument name is base. I’ll confess that before I wrote this note I hadn’t spotted this use of zero as a special value for string→integer conversions even though it’s been available since Python 2.1)
C++ also offers a way to convert integer literals into the numbers they represent, but it’s not very well known. As is usual for format conversions, we use streams — stringstreams typically, but here I show an example using standard input and output. The trick is to disable any numeric formatting of the input stream.
#include <iostream> int main() { int x; std::cin.unsetf(std::ios::basefield); while (std::cin >> x) { std::cout << x << '\n'; } return std::cin.eof() ? 0 : 1; }
It works by magic.
$ g++ integer_literal_value.cpp -o integer_literal_value $ echo 007 0x80 110 | ./integer_literal_value 7 128 110