PrevUpHomeNext

First Pass

Iterating Over Files

We want to map existing files to their new locations. In Python, the built-in mapping type is called a dictionary. The output of pass one will be a dictionary, which we initialise to be empty.


files_map = {}

There are two standard modules which support file and directory operations:

Both of these will be of use to our script. In fact, both provide a mechanism for traversing a directory tree:

The second option, os.walk, only came into being in Python 2.3 (2.3 strengthens the language's support for generators). I prefer it since it makes the script more direct.


import os

# Initialise a dictionary to map current file path
# to new file path.
files_map = {}

# Fill the dictionary by remapping all files beneath
# the current working directory.
for (subdir, dirs, files) in os.walk('.'):
    print "Mapping files in subdir [%s]" % subdir 
    files_map.update(
        mapFiles(subdir, files)
        )

Note the general absence of visible symbols to delimit blocks and expressions – a colon marks the end of the for condition, and that's about it. Expressions are terminated by a newline, unless the newline is escaped with a backslash or the expression requires a closing bracket to complete it. Thus the print statement terminates at the newline, but the dictionary update statement spreads over three lines. Note also that statements can be grouped into a block by placing them at the same indentation level: the body of the for loop is a block of two statements.

To a C/C++ programmer these syntactical rules may seem unusual, dangerous even – attaching meaning to whitespace!? – but I would argue that they actually encourage clean and well laid out scripts.

Incidentally, the default behaviour of the print statement is to add a newline after printing. Appending a trailing comma would print a space instead of this newline.

Mapping Files

If we attempt to run the script as it stands, we'll see an exception thrown:


NameError: name 'mapFiles' is not defined

which is as we'd expect. We need to define the function:


def mapFiles(dirname, files):
    """Return a dictionary mapping files to their new locations."""
    new_dir = mapDirectory(dirname)
    print "mapDirectory [%s] -> [%s]" % (dirname, new_dir)
    fm = {}
    for f in files:
        fm[os.path.join(dirname, f)] = \
           os.path.join(new_dir, f)
    return fm

The Python interpreter needs to know about this function before it can use it, so we'll place it before the path traversal loop.

The first statement of the function body is the function's (optional) documentation string, or docstring. The Python documentation explains why it's worth getting the habit of using docstrings and the conventions for their use.

The function fills a dictionary mapping files to their new location. It uses os.path.join from the os.path module to construct a file path. The backslash is there to escape a newline, allowing the dictionary item-setter to continue onto a second line.

Examining Objects with the Python Interpreter

If we were to start an interactive Python shell and load the mapFiles function, we could then query it and its attributes:


>>> mapFiles
<function mapFiles at 0x01106630>
>>> dir(mapFiles)
['__call__', '__class__', '__delattr__', '__dict__', '__doc__', '__get__',
'__getattribute__', '__hash__', '__init__', '__module__', '__name__',
'__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__',
'__str__', 'func_closure', 'func_code', 'func_defaults', 'func_dict',
'func_doc', 'func_globals', 'func_name']
>>> mapFiles.func_doc
'Return a dictionary mapping files in dir to their new locations.'

I like to switch between editor and interpreter when developing scripts (on Windows, the PythonWin IDE makes this easy to do, or, alternatively, Python's -i option), since it helps me understand both how my script works and how Python works. Here we can see the name mapFiles refers to a function object which has a list of attributes.

The Python interpreter also allows functions to be exercised immediately:


>>> mapFiles('png', ('pngRead.h', 'pngWrite.h'))

Mapping Directories

The final component of the first pass is the function mapDirectory, which maps an existing directory to its new location.


def mapDirectory(dname):
   """Return the new location of the input directory."""
   
   # The following dictionary maps existing
   # directories to their new locations.
   dirmap = {
       'png'     : 'graphics/thirdparty/png',
       'jpeg'    : 'graphics/thirdparty/jpeg',
       'bitmap'  : 'graphics/common/bitmap',
       'UserIF'  : 'ui',
       'UserIF/Wgts' : 'ui/widgets',
       'os'      : 'platform/os',
       'os/hpux' : 'platform/os/hpux10'
   }
   # Successively reduce the directory path until it
   # matches one of the keys in the dictionary.
   mapped_dir = p = dname
   while p and not p in dirmap:
       p = os.path.dirname(p)
       
   if p:
       mapped_dir = os.path.join(dirmap[p], 
                                 dname[len(p) + 1:])
   
   return mapped_dir

The directory rearrangement described earlier in this article has been represented as a dictionary. The input directory is reduced until we match a key in this dictionary. As soon as we find such a match, we construct our return value from the value at this key and the un-matched tail of the input directory; or, if no such match is found, the input value is returned unmodified.

The expression dname[len(p) + 1:] is a slice operation applied to a string. Bearing in mind that p is the first len(p) characters in dname, this expression returns what's left of dname, omitting the slash which separates the head from the tail of this path.

For example, when mapping the directory os/hpux/include we would expect to exit the while loop when p == os/hpux, and return the result of joining the path platform/os/hpux10 to include.


    mapDirectory('os/hpux/include') 
        –> os.path.join('platform/os/hpux10', 'include')
        –> 'platform/os/hpux10/include' 

Testing

Let's test these expectations by adding the following lines to our script:


assert(mapDirectory('os/hpux/include')
       == 'platform/os/hpux10/include')
       
assert(mapDirectory('os/win32')
       == 'platform/os/win32')

assert(mapDirectory('unittests')
       == 'unittests')

These tests pass on Unix platforms, but if you run them on Windows the first two tests raise AssertionError exceptions (although the final one passes). For now, I'll leave you to work out why – but promise a more platform independent solution in the final version of the script.

Copyright © 2004 Thomas Guest

PrevUpHomeNext