File shifting using lftp and rsync

2008-01-06, , Comments

On a daily basis I work on at least three different platforms, hosted locally, virtually, remotely. Shifting files from place to place is a problem I need to resolve every day, and I have more than one solution.

I’m not a fan of file browsers, graphical ftp clients and similar. They clutter the desktop, vary from platform to platform, take ages to drive — especially with a touch pad — and prompt for input at all the wrong times. It’s hard to undo an operation when your pointer slips. By contrast, using simple commands in a shell window puts the power back at your fingertips, whatever platform you’re on. Recovering from mistakes is as easy as recalling your command history.

Local File Systems

For simple operations on a local file system, I tend to use cp or Emacs dired mode simply because my immediate context is usually Emacs or a shell window, and often both. For bulky and recursive directory operations, a good starting point is:

$ tar c SRC | tar x -C DST 

You can vary this command line to reorganise file systems, though sometimes sprinkling a few soft links around may be worth considering.

Remote File Systems

Things can get tricky for remote file systems. Preserving permissions and ownership causes problems, as does the security layer. NFS and Samba may seem like the right solutions for a private network but I’ve grown to regard them as troublesome; they work best on stable networks with well known machines at well-known addresses, and, as usual, I prefer a dynamic model to a static one.

Again, command line tools can do the job. To save the overhead of re-entering your username/password credentials, you’ll want to store SSH keys on the machines you frequent. The most basic remote copy command is scp. Use it much like cp, but specify a destination machine on the command line.

Extended Tar

For more complex filesystems, we can extend our tar command using ssh on the far side of the pipeline. The following command tars up the local SRC directory then extracts the archive on the REMOTE machine in directory DST.

$ tar c SRC | ssh REMOTE tar x -C DST 

If this isn’t possible, I sometimes use netcat to listen at a port on the remote machine:

Listen to port 2345
nc -l -p 2345 | tar x -C DST

Then, on the source machine, kick off the tar process:

tar c SRC > /dev/tcp/DOTTED.IP.OF.MIRROR/2345

Lftp

Suppose you want or need to transfer files using the venerable FTP protocol. If you haven’t already discovered lftp, then it’s time to investigate. When you connect to a remote machine using lftp it’s rather like having a shell session open on that machine: you can navigate using tab completion and the usual shell tools relating to file and directory operations are there, as well as extra goodies like mirror and a decent help system.

Rsync

Rsync is another great command-line file-system shifter. It’s designed to keep two directory structures in sync, and to do so efficiently by just transmitting deltas between the two. Typically the source and destination directories reside on separate machines, and rsync is often invoked automatically as a scheduled job. Rsync forms the backbone of many a backup system. I’ve often used it to complement more heavy-weight coporate backup systems which would require me to ask an administrator to restore my own files.

I use rsync to post updates to this website, and indeed to mirror this website to other machines I use. My publish script is as simple as:

publish.sh
#! /bin/sh
rsync -avz www wordaligned@wordaligned.org:~

Here, local directory structure www will be mirrored to ~wordaligned/www on remote machine wordaligned.org. I supply the remote username wordaligned explicitly since it differs from my local username tag. The -v verbose option gives me a warm fuzzy feeling that the updates I want to post are indeed being posted, the -z compress option reduces network traffic by compressing file data, and the -a archive shorthand option recurses and preserves permissions and ownerships.

By the way, I’m implicitly using ssh (the rsync default) to access the remote machine. No password is required for user tag to copy files to user wordaligned’s home directory since I’ve configured SSH to allow this.

Rsync comes with many more options, but they’re all well documented. A simple -a is usually all that’s required.

More Thoughts on File Shifting

At the start of this note I unfairly dismissed GUI driven file system tools. The truth is that I do often use them. I’m generally unprincipled and promiscuous when it comes to tool selection: whatever works and is to hand will do. Thus, while both lftp and rsync come with a plethora of options — lftp does everything any GUI driven FTP client can do, and probably more, and rsync similarly defeats graphical file browsers — the irony is that I only use them for basic stuff, and may well resort to something with a GUI when attempting something out of the ordinary. A bit of interactive pointing and clicking often appeals more than paging through a rather dry manual.

What scp and rsync won’t do is find a directory on a remote file system; you can’t use TAB completion at the far end 1. An interactive lftp session does support basic TAB completion on a remote filesystem, but not more powerful tools like find or locate.

In general you can reduce this problem by adopting a disciplined approach to structuring your workspace on whatever platforms you use. If you find yourself typing a command-line like:

$ scp -r ~/tmp/dev-2008-01-06 cromarty:~/scratch/work-copy2

then I’d suggest something has gone wrong.

One way to combat this disorganisation is to place your home directory under version control. Make sure the version control system you use for this is flexible enough to allow you to rename entries, though. If you do adopt this model, your version control system becomes the home for all your files, and transfers between machines become a matter of check-in, check-out.

I use Subversion in this way, to a degree. There are plenty of files, though, which I don’t version control — in general, large files or files which only make sense on certain platforms. I’ve often found it useful to make these available for access via a webserver, either somewhere on a Wiki, or just served by a lighttpd instance with directory listing enabled.


1 I had a suspicion when I wrote this I’d turn out to be wrong! Michael Kedzierski emailled me:

I’m actually using bash completion on Ubuntu and I get remote-side tab completion with scp, it’s great.