The program which follows is a short but non-trivial Python script. It makes use of a couple of text codecs from the Python standard library to generate a C++ function. This C++ function converts a single character from ISO 8859-9 encoding into UTF-8 encoded Unicode.
def warnGenerated():
'''Return a standard "generated code" warning.'''
import sys, time
return (
'// GENERATED CODE. DO NOT EDIT!\n'
'// generated by %s, %s' %
(' '.join(sys.argv),
time.asctime(time.localtime()))
)
def functionHeader(codec):
'''Return the decode function header.'''
return '''\
/**
* @brief Convert from %(codec)s into UTF-8 encoded Unicode
* @param %(codec)s An %(codec)s encoded character
* @param it Reference to an output iterator
*
* @note If the input character is invalid, the Unicode
* replacement character U+FFFD will be returned.
*/
template <typename output_iterator>
void
%(codec)s_to_utf8(
unsigned char %(codec)s,
output_iterator & it)''' % { 'codec' : codec }
def convertCh(ch, codec):
'''Return the 'case' statement converting
the input character using the supplied codec'''
from unicodedata import name
ucs = chr(ch).decode(codec, 'replace')
utf = ucs.encode('utf-8')
ucname = name(ucs, 'Control code')
action = '; '.join(['*it++ = 0x%02x' % ord(c)
for c in utf])
return '''case 0x%02x: // %s
%s;
break;''' % (ch, ucname, action)
def codeBlock(prefix, body, indent = ' ' * 4):
'''Return an indented code block.
This code block will be formatted:
prefix
{
body
}'''
import re
indent_re = re.compile('^', re.MULTILINE)
return '''%s
{
%s
}''' % (prefix, indent_re.sub(indent, body))
codec = 'iso8859_9'
print warnGenerated()
print codeBlock(
functionHeader(codec),
codeBlock(
'switch(%s)' % codec,
# iso8859-* encodings are 8-bit
'\n'.join([convertCh(ch, codec)
for ch in range(0x100)]),
indent = '' # don't indent case: labels
)
)
By now, it should go without saying that this script is a metaprogram. Before discussing why I think it's a good use of metaprogramming, some notes:
warnGenerated()
is used to place a standard warning in
front of the generated C++ function. If users of this C++ function edit
it by hand, their changes will be overwritten next time the script is
run: hence the warning.
I like this script since it makes use of the standard Python library to create
code we can use in a C++ program. The hard work goes on in the calls to
encode()
and decode()
and we don't even have to look at the
implementations of these functions, let alone maintain them. Their speed does
not affect the speed of our C++ function and I am willing to trust their
correctness, meaning I don't have to locate or purchase the ISO 8859 standards.
The second big win is that all the boilerplate code is generated without effort. If, at some point in the future, we need a fuller range of ISO 8859 text converters, then we tweak the script so the final section reads, for example:
codecs = ['iso8859_%d' % n for n in range(1, 10)]
print warnGenerated()
for codec in codecs:
print codeBlock(
functionHeader(codec)
....
)
and let it run. And should we decide on a different strategy for handling invalid input data, again, the metaprogram is our friend.
Copyright © 2005 Thomas Guest |