Recall that our second pass takes each file found during the first pass and relocates it, updating any internal references to file paths.
The main loop for the second pass is:
# Create the new root directory for relocated files
new_root = '../relocate'
os.makedirs(new_root)
# Now actually perform the relocation
for srcdst in files_map.items():
relocate(file, new_root, files_map)
The function os.makedirs
recursively creates a directory path. It throws an
exception if the directory path already exists. We will not catch this
exception since we want to ensure the files are being relocated to a
clean directory.
import shutil
def relocate(srcdst, dst_root, files_map):
"""Relocate a file, correcting internal file references."""
dst_file = os.path.join(dst_root, srcdst[1])
dst_dir = os.path.dirname(dst_file)
if not os.path.isdir(dst_dir):
os.makedirs(dst_dir)
if isSourceFile(dst_file):
relocateSource(srcdst, dst_file, files_map)
else:
shutil.copyfile(srcdst[0], dst_file)
There aren't many new features to comment on here. The first parameter to the
function, srcdst
, is a (key, value)
item from the
files_map
dictionary, so srcdst[0]
is the path to the original file,
and srcdst[1]
is the path to the relocated file, relative to the new root
directory.
We create the destination directory unless it already exists. Then, if our
file is a source file, we call relocateSource()
; otherwise, we
simply copy the file across.
I admit this isn't the most object-oriented of functions. The meaning of the literal 0 and 1 isn't transparent, and switching on type often indicates unfamiliarity with polymorphism. It isn't Python that's to blame here, nor a lack of familiarity with polymorphism: rather a reluctance on my part to add such sophistication to a simple script.
import re
def isSourceFile(file):
"""Return True if the input file is a C/C++ source file."""
src_re = re.compile(r'\.(c|h)(pp)?$',
re.IGNORECASE)
return src_re.search(file) is not None
We identify source files using a regular expression pattern: r'\.(c|h)(pp)?$'
.
Here, the r
stands for raw string, which means that backslashes are not
handled in any special way by Python – the string literal is passed directly
on to the regular expression module.
Regular expression patterns in Python are as powerful, concise and downright confusing to the uninitiated as they are elswhere. I would say that subsequent use of regular expression matches is a little more friendly.
In this case, the regex reads:
match a '.' followed by either a 'c' or an 'h' followed by one or no 'pp's, followed by the end of the string
The re.IGNORECASE
flag tells the regex compiler to ignore case.
So, we expect:
assert(isSourceFile('a.cpp'))
assert(isSourceFile('a.C'))
assert(not isSourceFile('a.cc'))
assert(not isSourceFile('a.cppp'))
assert(isSourceFile('a.cc.h'))
This time, the assertions hold.
def relocateSource(srcdst, dstfile, files_map):
"""Relocate a source file, correcting included file paths to included."""
fin = file(srcdst[0], 'r')
fout = file(dstfile, 'w')
for line in fin
fout.write(
processSourceLine(line,
srcdst,
files_map)
)
fin.close()
fout.close()
The function relocateSource()
simply reads the input file line by line. Each
line is converted and written to the output file.
def processSourceLine(line, srcdst, files_map):
"""Process a line from a source file, correcting included file paths."""
include_re = re.compile(
r'^\s*#\s*include\s*"'
r'(?P<inc_file>\S+)'
'"')
match = include_re.match(line)
if match:
mapped_inc_file = mapIncludeFile(
match.group('inc_file'),
srcdst,
files_map
)
line = line[:match.start('inc_file')] + \
mapped_inc_file + \
line[match.end('inc_file'):]
return line
The function processSourceLine
has a rather more complicated regex at its
core. Essentially we want to spot lines similar to: #include "UserIF/Wgts/Menu.hpp"
and extract the double-quoted file path. The complication is that there may be any
amount of whitespace at several points on the line – hence the appearances of
\s*
, which reads zero or more whitespace characters.
The three raw strings which comprise the regex will be concatenated before the regex is compiled in the same way that adjacent string literals in C/C++ get joined together in an early phase of compilation. I have split the string in this way to emphasise its meaning.
The bizarre (#P<inc_file>\S+)
syntax creates a named group:
essentially, it allows us to identify the sub-group of a match object using
the group name inc_file
.
So, the function looks for lines of the form: #include "inc_file"
then calls mapIncludeFile(inc_file...)
to find what should now be included,
and returns: #include "mapped_inc_file"
.
Incidentally, I am assuming here that the angle brackets are reserved for
inclusion of standard library files – or at least not the files we are
moving. That is, we don't try and alter lines such as: #include <vector>
.
import sys
def mapIncludeFile(inc, srcdst, files_map):
"""Determine the remapped include file path."""
# First, obtain a path to the include file relative to
# the original source root
if os.path.dirname(inc):
pass # Assumption 1) - "inc" is our relative path
else:
# Assumption 2) The file must be located in the same
# directory as the source file which includes it.
inc = os.path.join(
os.path.dirname(srcdst[0]), inc)
# Look up the new home for the file
try:
mapped_inc = files_map[inc]
if (os.path.dirname(mapped_inc) ==
os.path.dirname(srcdst[1])):
mapped_inc = os.path.basename(mapped_inc)
except KeyError:
print 'Failed to locate [%s] (included by [%s]) ' \
'relative to source root.' % (
include, srcdst[0])
sys.exit(1)
return mapped_inc
The function mapIncludeFile()
is actually quite simple, though only because
of an assumption I have made about the way include paths are used in this
source tree. The assumption is:
All #include directives give a path name relative to the root of the source
tree, except when the included file is present in the same directory as the
source file – in which case the file can be included directly by its
basename. Furthermore, there are no source files at the top-level of the
source tree (there are only directories at this level). |
For source trees with more complex include paths, and correspondingly
more subtle #include
directives, this function will need fairly heavy-weight
adaptation. (Alternatively, run another script to simplify your include
paths first.)
If this assumption holds, we can easily determine the original path to the
included files, then use our files_map
dictionary to look up the new path. If
the assumption doesn't hold, then the dictionary look up will fail, raising a
KeyError
exception. The exception is caught, a diagnostic printed, then
the script exits with status 1.
We could test whether mapped_inc
is a key in our dictionary before
attempting to use it; and if it were absent, we could simply print an error
and continue. However, we choose to view such an absence as exceptional since
it undermines the assumptions made by the script. We do not wish to risk
moving thousands of files without being sure of what we're doing.
Copyright © 2004 Thomas Guest |