We want to map existing files to their new locations. In Python, the built-in mapping type is called a dictionary. The output of pass one will be a dictionary, which we initialise to be empty.
files_map = {}
There are two standard modules which support file and directory operations:
os
Miscellaneous operating system interfaces
os.path
Common pathname manipulations
Both of these will be of use to our script. In fact, both provide a mechanism for traversing a directory tree:
os.path.walk
, which calls back a supplied function at each subdirectory,
passing that function the subdirectory name and a list of the files it
contains.
os.walk
, which generates a 3-tuple (dirpath, dirnames, filenames)
for
each subdirectory in the tree.
The second option, os.walk
, only came into being in Python 2.3 (2.3 strengthens the
language's support for generators). I prefer it since it makes the script more
direct.
import os
# Initialise a dictionary to map current file path
# to new file path.
files_map = {}
# Fill the dictionary by remapping all files beneath
# the current working directory.
for (subdir, dirs, files) in os.walk('.'):
print "Mapping files in subdir [%s]" % subdir
files_map.update(
mapFiles(subdir, files)
)
Note the general absence of visible symbols to delimit blocks and expressions
– a colon marks the end of the for
condition, and that's about
it. Expressions are terminated by a newline, unless the newline is escaped
with a backslash or the expression requires a closing bracket to
complete it. Thus the print statement terminates at the newline, but the
dictionary update statement spreads over three lines. Note also that
statements can be grouped into a block by placing them at the same indentation
level: the body of the for loop is a block of two statements.
To a C/C++ programmer these syntactical rules may seem unusual, dangerous even – attaching meaning to whitespace!? – but I would argue that they actually encourage clean and well laid out scripts.
Incidentally, the default behaviour of the print statement is to add a newline after printing. Appending a trailing comma would print a space instead of this newline.
If we attempt to run the script as it stands, we'll see an exception thrown:
NameError: name 'mapFiles' is not defined
which is as we'd expect. We need to define the function:
def mapFiles(dirname, files):
"""Return a dictionary mapping files to their new locations."""
new_dir = mapDirectory(dirname)
print "mapDirectory [%s] -> [%s]" % (dirname, new_dir)
fm = {}
for f in files:
fm[os.path.join(dirname, f)] = \
os.path.join(new_dir, f)
return fm
The Python interpreter needs to know about this function before it can use it, so we'll place it before the path traversal loop.
The first statement of the function body is the function's (optional) documentation string, or docstring. The Python documentation explains why it's worth getting the habit of using docstrings and the conventions for their use.
The function fills a dictionary mapping files to their new location. It uses
os.path.join
from the os.path
module to construct a file
path. The backslash is there to escape a newline, allowing the dictionary
item-setter to continue onto a second line.
If we were to start an interactive Python shell and load the mapFiles
function, we could then query it and its attributes:
>>> mapFiles
<function mapFiles at 0x01106630>
>>> dir(mapFiles)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict',
'func_doc', 'func_globals', 'func_name']
>>> mapFiles.func_doc
'Return a dictionary mapping files in dir to their new locations.'
I like to switch between editor and interpreter when developing scripts (on
Windows, the PythonWin IDE makes this easy to do, or, alternatively, Python's
-i option), since it helps me understand both how my script works and how
Python works. Here we can see the name mapFiles
refers to a function object
which has a list of attributes.
The Python interpreter also allows functions to be exercised immediately:
>>> mapFiles('png', ('pngRead.h', 'pngWrite.h'))
The final component of the first pass is the function mapDirectory
, which
maps an existing directory to its new location.
def mapDirectory(dname):
"""Return the new location of the input directory."""
# The following dictionary maps existing
# directories to their new locations.
dirmap = {
'png' : 'graphics/thirdparty/png',
'jpeg' : 'graphics/thirdparty/jpeg',
'bitmap' : 'graphics/common/bitmap',
'UserIF' : 'ui',
'UserIF/Wgts' : 'ui/widgets',
'os' : 'platform/os',
'os/hpux' : 'platform/os/hpux10'
}
# Successively reduce the directory path until it
# matches one of the keys in the dictionary.
mapped_dir = p = dname
while p and not p in dirmap:
p = os.path.dirname(p)
if p:
mapped_dir = os.path.join(dirmap[p],
dname[len(p) + 1:])
return mapped_dir
The directory rearrangement described earlier in this article has been represented as a dictionary. The input directory is reduced until we match a key in this dictionary. As soon as we find such a match, we construct our return value from the value at this key and the un-matched tail of the input directory; or, if no such match is found, the input value is returned unmodified.
The expression dname[len(p) + 1:]
is a slice operation applied to a
string. Bearing in mind that p
is the first len(p)
characters in dname
, this expression returns what's left of
dname
, omitting the slash which separates the head from the tail of
this path.
For example, when mapping the directory os/hpux/include
we would
expect to exit the while loop when p == os/hpux
, and return the
result of joining the path platform/os/hpux10
to
include
.
mapDirectory('os/hpux/include')
–> os.path.join('platform/os/hpux10', 'include')
–> 'platform/os/hpux10/include'
Let's test these expectations by adding the following lines to our script:
assert(mapDirectory('os/hpux/include')
== 'platform/os/hpux10/include')
assert(mapDirectory('os/win32')
== 'platform/os/win32')
assert(mapDirectory('unittests')
== 'unittests')
These tests pass on Unix platforms, but if you run them on Windows the first
two tests raise AssertionError
exceptions (although the final one passes). For
now, I'll leave you to work out why – but promise a more platform independent
solution in the final version of the script.
Copyright © 2004 Thomas Guest |