Running Sums in Python
Suppose we want to generate the running sum series r
formed by sums of n
consecutive elements taken from a series s
. For example, to sum consecutive pairs taken from the first 6 integers:
>>> n = 2 >>> s = 0, 1, 2, 3, 4, 5 >>> running_sum(s, 2) [1, 3, 5, 7, 9]
One approach would be to combine the sum
builtin function with list slices and comprehensions.
>>> def running_sum(s, n): ... return [sum(s[lo:lo + n]) for lo in range(len(s)  n + 1)] ... >>> running_sum([0, 1, 2, 3, 4, 5], 2) [1, 3, 5, 7, 9]
This is fine if:

s
is finite 
s
supports slice access (i.e.s[lo:hi]
doesn’t raise aTypeError
) 
n
isn’t too big
With just a little extra thought we can address all these issues.
To deal with the first two points we return to the specification. What exactly do we require of s
in order to generate r
? Well, all that’s really needed is for s
to be iterable — which is to say we can advance along it — then our running sum function can arrange to buffer n
items from s
and yield their sums. For maximum flexibility the result series r
should also be iterable, allowing clients to choose how to consume it^{[1]}.
In Python an object, o
, becomes iterable by implementing the iterator protocol: o.__iter__()
should return an iterator, i
, over the container, which i.next()
advances, raising a StopIteration
exception when done.
>>> r = [1, 2, 3] >>> i = r.__iter__() >>> i.next() 1 >>> i.next() 2 >>> i.next() 3 >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
Python programs typically don’t expose this protocol directly since we can build more convenient looping constructs on top of it. By using the yield statement our running sum filter needn’t implement the iterator protocol directly either. Here’s a generator function which uses itertools.islice
in place of the original list slices.
import itertools def running_sum(s, n): while True: r, s = itertools.tee(s) yield sum(itertools.islice(r, n)) s.next()
As you can see, objects returned by this function implement the iterator protocol.
# Running sum of pairs from 0, 1, 2, 3, ... >>> rs = running_sum(itertools.count(), 2) >>> i = rs.__iter__() >>> i.next() 1 >>> i.next() 3 >>> for s in rs: print s ... 5 7 9 11 13 .... Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyboardInterrupt
We have to kill the for loop by interrupting it since rs
, being infinite (in this particular case), never raises a StopIteration
. In fact, this particular version of running_sum()
fails badly on finite inputs for reasons we’ll touch on later.
I won’t dwell on this flawed variant of running_sum()
, except to note in passing that the builtin sum function doesn’t buffer n
items from its input stream — it’s a lazy function which accumulates these items one at a time. There’s no magic here, though: behind the scenes, as the teed iterators r
and s
diverge, the inbetween values must be stored somewhere!
Each slice of items from s
overlaps the one before: if we visualise the sliced range sliding along the series, at each stage an element gets pushed in at the top and an element gets popped out from the bottom. Rather than repeatedly summing all the elements of these slices, we can calculate a single sum at the start of the series then adjust it as we progress^{[2]}.
def running_sum(s, n): lo, hi = itertools.tee(s) rs = sum(itertools.islice(hi, n)) while True: yield rs rs += hi.next()  lo.next()
Before we sign this function off, there’s a bug to fix. What if n
is larger than the length of the input series? We’d expect the output series to be empty, but:
>>> list(running_sum([1, 2, 3], 4)) [6]
Oops! The problem here is in passing an itertools.islice
series to sum()
, which happily swallows the StopIteration
exception without knowing if the sliced stream reached its end or if we reached the end of the slice, or indeed both.
>>> i = itertools.islice([1, 2, 3], 4) >>> i.next(), i.next(), i.next() (1, 2, 3) >>> i.next() Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration
A fix is to pass sum
a list comprehension. If n
is too big a StopIteration
exception gets raised before sum
ever sees this list.
def running_sum(s, n): lo, hi = itertools.tee(s) rs = sum([hi.next() for _ in range(n)]) while True: yield rs rs += hi.next()  lo.next()
As a final tweak, we can make lo
and hi
iterator.next functions rather than iterators, which saves a few attribute access calls.
def running_sum(s, n): '''Generate the series of running sums of n elements of s. >>> list(running_sum([1, 2, 3, 4], 2)) [3, 5, 7] >>> rs = running_sum(itertools.count(), 3) >>> rs.next(), rs.next(), rs.next() (3, 6, 9) >>> list(running_sum([1, 2], 3)) [] ''' lo, hi = [i.next for i in itertools.tee(s)] rs = sum([hi() for _ in range(n)]) while True: yield rs rs += hi()  lo()
Thanks to doubtingthomas for pointing out an offbyone error in the original version of this article.
[1] In a recent post on his blog, Jake clears up some misconceptions about Haskell. In doing so he analyses the difference between lazy and strict types with clarity and insight. Recommended reading!
This parallel indicates pretty clearly that recursively operating on each element of a stream is an infinite loop.
Stream
is a control structure. It doesn’t exist to persist data across many parts of a program. It exists to feed data into a function one element at a time. In contrast, aVector
exists to persist an intermediate or final result of some 3dimensional computation. In short,Stream
is for controlling computation, andVector
is for storing data. This generalizes to lazy and strict types, respectively.
[2] There may be situations where we really want each running sum to be generated directly from n
consecutive elements of the source stream: for example, if we are dealing with a series of floating point numbers, then addition is not exact and we must take care to avoid accumulated errors.