## Running Sums in Python

Suppose we want to generate the running sum series `r`

formed by sums of `n`

consecutive elements taken from a series `s`

. For example, to sum consecutive pairs taken from the first 6 integers:

>>> n = 2 >>> s = 0, 1, 2, 3, 4, 5 >>> running_sum(s, 2) [1, 3, 5, 7, 9]

One approach would be to combine the `sum`

built-in function with list slices and comprehensions.

>>> def running_sum(s, n): ... return [sum(s[lo:lo + n]) for lo in range(len(s) - n + 1)] ... >>> running_sum([0, 1, 2, 3, 4, 5], 2) [1, 3, 5, 7, 9]

This is fine if:

`s`

is finite`s`

supports slice access (i.e.`s[lo:hi]`

doesn’t raise a`TypeError`

)`n`

isn’t too big

With just a little extra thought we can address all these issues.

To deal with the first two points we return to the specification. What exactly do we require of `s`

in order to generate `r`

? Well, all that’s really needed is for `s`

to be iterable — which is to say we can advance along it — then our running sum function can arrange to buffer `n`

items from `s`

and yield their sums. For maximum flexibility the result series `r`

should also be iterable, allowing clients to choose how to consume it^{}1.

In Python an object, `o`

, becomes iterable by implementing the iterator protocol: `o.__iter__()`

should return an iterator, `i`

, over the container, which `i.next()`

advances, raising a `StopIteration`

exception when done.

>>> r = [1, 2, 3] >>> i = r.__iter__() >>> i.next() 1 >>> i.next() 2 >>> i.next() 3 >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration

Python programs typically don’t expose this protocol directly since we can build more convenient looping constructs on top of it. By using the yield statement our running sum filter needn’t implement the iterator protocol directly either. Here’s a generator function which uses `itertools.islice`

in place of the original list slices.

import itertools def running_sum(s, n): while True: r, s = itertools.tee(s) yield sum(itertools.islice(r, n)) s.next()

As you can see, objects returned by this function implement the iterator protocol.

# Running sum of pairs from 0, 1, 2, 3, ... >>> rs = running_sum(itertools.count(), 2) >>> i = rs.__iter__() >>> i.next() 1 >>> i.next() 3 >>> for s in rs: print s ... 5 7 9 11 13 .... Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyboardInterrupt

We have to kill the for loop by interrupting it since `rs`

, being infinite (in this particular case), never raises a `StopIteration`

. In fact, this particular version of `running_sum()`

fails badly on finite inputs for reasons we’ll touch on later.

I won’t dwell on this flawed variant of `running_sum()`

, except to note in passing that the built-in sum function doesn’t buffer `n`

items from its input stream — it’s a lazy function which accumulates these items one at a time. There’s no magic here, though: behind the scenes, as the teed iterators `r`

and `s`

diverge, the in-between values must be stored somewhere!

Each slice of items from `s`

overlaps the one before: if we visualise the sliced range sliding along the series, at each stage an element gets pushed in at the top and an element gets popped out from the bottom. Rather than repeatedly summing all the elements of these slices, we can calculate a single sum at the start of the series then adjust it as we progress^{}2.

def running_sum(s, n): lo, hi = itertools.tee(s) rs = sum(itertools.islice(hi, n)) while True: yield rs rs += hi.next() - lo.next()

Before we sign this function off, there’s a bug to fix. What if `n`

is larger than the length of the input series? We’d expect the output series to be empty, but:

>>> list(running_sum([1, 2, 3], 4)) [6]

Oops! The problem here is in passing an `itertools.islice`

series to `sum()`

, which happily swallows the `StopIteration`

exception without knowing if the sliced stream reached its end or if we reached the end of the slice, or indeed both.

>>> i = itertools.islice([1, 2, 3], 4) >>> i.next(), i.next(), i.next() (1, 2, 3) >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration

A fix is to pass `sum`

a list comprehension. If `n`

is too big a `StopIteration`

exception gets raised before `sum`

ever sees this list.

def running_sum(s, n): lo, hi = itertools.tee(s) rs = sum([hi.next() for _ in range(n)]) while True: yield rs rs += hi.next() - lo.next()

As a final tweak, we can make `lo`

and `hi`

iterator.next functions rather than iterators, which saves a few attribute access calls.

def running_sum(s, n): '''Generate the series of running sums of n elements of s. >>> list(running_sum([1, 2, 3, 4], 2)) [3, 5, 7] >>> rs = running_sum(itertools.count(), 3) >>> rs.next(), rs.next(), rs.next() (3, 6, 9) >>> list(running_sum([1, 2], 3)) [] ''' lo, hi = [i.next for i in itertools.tee(s)] rs = sum([hi() for _ in range(n)]) while True: yield rs rs += hi() - lo()

Thanks to doubtingthomas for pointing out an off-by-one error in the original version of this article.

1 In a recent post on his blog, Jake clears up some misconceptions about Haskell. In doing so he analyses the difference between lazy and strict types with clarity and insight. Recommended reading!

This parallel indicates pretty clearly that recursively operating on each element of a stream is an infinite loop.

`Stream`

is acontrol structure. It doesn’t exist to persist data across many parts of a program. It exists to feed data into a function one element at a time. In contrast, a`Vector`

exists to persist an intermediate or finalresultof some 3-dimensional computation. In short,`Stream`

is forcontrolling computation, and`Vector`

is forstoring data. This generalizes to lazy and strict types, respectively.

2 There may be situations where we really want each running sum to be generated directly from `n`

consecutive elements of the source stream: for example, if we are dealing with a series of floating point numbers, then addition is not exact and we must take care to avoid accumulated errors.