Shells, Logs and Pipes

2007-03-05, , , , Comments

I needed to make sense of a lengthy log file. The server had been running over the weekend and the log file showed an increasing packet error count. (Don’t worry what a packet error is, it doesn’t really matter as far as this post goes). This post explains how the standard Unix tools helped me dissect the log file, and indeed any other log file.

Heads and Tails

First, I needed the initial and final error counts.

Initial and final error counts
$ head logfile | grep "packet errors"
Info: Fri 17:34:05, packet errors: 0
$ tail logfile | grep "packet errors"
Info: Mon 08:32:11, packet errors: 11323

So what exactly had gone wrong, and when? Paging through the filtered output would take some time since the number of reported packet error counts ran to five figures.

Paging through the filtered output
$ grep "packet errors" logfile | less

How many errors were reported?
$ grep "packet errors" logfile | wc -l
125220

Uniq

Part of the problem was that whatever logged the message simply printed out the error count every couple of seconds, whether or not this error count had changed. Going back to Friday, reconfiguring the logger and re-running the experiment wasn’t an option, but luckily I had too much information rather than too little: all I had to do was process it correctly.

What looked more useful was filtering just the lines at which the error count had changed. Filtering out repeated lines using uniq wouldn’t work, since every line’s timestamp made it unique. Happily, uniq has an option to ignore a specified number of fields.

Finding when the error count changed
$ grep "packet errors" logfile | uniq -f 3

Piping this command to wc -l indicated that the number of interesting lines had been thinned down to around three thousand — the kind of report which could be paged through comfortably enough.

Counting error count changes
$ grep "packet errors" logfile | uniq -f 3 | wc -l
3432

Sort

It’s also possible to home in on other points of interest. For example, we could investigate the maximum period of stability by finding when the error count doesn’t change for longest. This can be done using the -c option to uniq, which precedes each output line with a count of the the number of times that line was repeated in the input; then sorting using this count field as a numeric key; then picking the last output value from the pipeline.

Locating the most stable period
$ grep "packet errors" logfile | \
  uniq -f 3 -c | sort -k1 -n | tail -1
    277 Info: Sat 23:00:15, packet errors: 4645

This tells us that on Saturday evening the error count remained stable for longest — in fact, an error count of 4645 repeated 277 times in the logfile, a duration of just 9 minutes (given that the packet error count repeated every couple of seconds).

Et cetera

This is just a flavour of what the Unix shell tools can do when piped together. We’ve combined some simple tools (less, head, tail, uniq, sort, wc, grep) into complex and powerful commands using just a few keystrokes. Imagine trying to design a GUI which would allow you to analyse a log file as shown above. Now try designing a GUI which can analyse any log file ever.

I wouldn’t pretend the Unix shell tools are user-friendly. In fact some are distinctly arcane (find comes to mind). When I started out, I found the Unix manual difficult to use, and although info has improved things it’s still hard to find out about a command without knowing its name. And if there’s a decent, hyper-linked online reference, I’ve yet to find it.

Extending the Toolset

Shell tools aren’t always capable of even the kind of text processing described in this article. For example, if the logfile had more structure to it — or even was written out as XML — then we’d probably be find ourselves struggling. Fortunately a scripting language is very happy to help out in such cases. Perl remains a good choice, if it’s one-liners you like. Python and Ruby are better for longer-lived scripts.

Equally, there’s a point at which you need to convert text into something more visual — a graph or a histogram, perhaps. Text processing can be used to generate a format suitable for importing into a plotting package.

Further Reading

A Unix expert once told me that the difference between a Unix expert and a non-expert wasn’t aptitude or years of accumulated knowledge — rather it came down to a willingness to read carefully through the manual. I’m no expert, but regard this as sound advice.

For more on the Unix philosophy, and how it applies to software development in general, I recommend “The Art of Unix Programming” by Eric S. Raymond.