Find the average of a collection of tuples or dicts using Python

2014-12-03Comments

You’ve been running some tests, each of which returns a 3-tuple of numerical results — (real, user, sys) times, maybe — and you’d like to combine these into a single 3-tuple, the average result.

Easy!

def average(times):
    N = float(len(times))
    return (sum(t[0] for t in times)/N,
            sum(t[1] for t in times)/N,
            sum(t[2] for t in times)/N)

If you want a more generic solution, one which works when the tuples might have any number of elements, you could do this:

def average(xs):
    N = float(len(xs))
    R = len(xs[0])
    return tuple(sum(x[i] for x in xs)/N for i in range(R))

or this:

def average(xs):
    N = float(len(xs))
    return tuple(sum(col)/N for col in zip(*xs))

The second generic variant uses zip to transpose its inputs.

Now suppose we have keyed collections of results which we want to average:

>>> times = [{'real': 34.4, 'user': 26.2, 'sys': 7.3},
             {'real': 28.7, 'user': 21.5, 'sys': 6.4},
             {'real': 29.3, 'user': 22.0, 'sys': 6.9}]

If, as in the example above, each result has the same set of keys, the average result could be calculated like this:

>>> N = float(len(times))
>>> { k : sum(t[k] for t in times)/N for k in times[0] }
{'real': 30.8, 'sys': 6.9, 'user': 23.2}

What if the inputs don’t have the same keys? Consider the contents of four fridges.

>>> fridges = [
    { 'egg': 5, 'milk': 1.700, 'sausage': 6 },
    { 'beer': 6, 'milk': 0.568, 'egg': 1 },
    { 'egg': 3, 'sausage': 4, 'milk': 0.125, 'lettuce': 1 },
    { 'carrot': 4 }]

A Counter can collect and calculate the average fridge contents.

>>> from collections import Counter
>>> total = sum(map(Counter, fridges), Counter())
>>> N = float(len(fridges))
>>> { k: v/N for k, v in total.items() }
{'sausage': 2.5, 'lettuce': 0.25, 'beer': 1.5, 'carrot': 1.0, 
 'egg': 2.25, 'milk': 0.59825}

Note that although Counters were primarily designed to work with positive integers to represent counts, there’s nothing stopping us from using floating point numbers (amount of milk in our example) in the values field.