Group When
Phil Nash’s recent tweet intrigued me.
Functional people: I often (in F#) need to process a seq into a smaller list or seq – where items from the input are grouped in some way…
— Phil Nash (@phil_nash) July 15, 2014… the need to group may not be known until after the first item in the group.
I struggle to find a nicely functional way to do this. Ideas?
— Phil Nash (@phil_nash) July 15, 2014He later clarified what he was after — and had now found — linking to a solution posted a couple of years ago by Tomas Petricek. The function groupWhen splits a sequence into groups, starting a new group whenever the predicate returns true.
module Seq =
/// Iterates over elements of the input sequence and groups adjacent elements.
/// A new group is started when the specified predicate holds about the element
/// of the sequence (and at the beginning of the iteration).
///
/// For example:
/// Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]]
let groupWhen f (input:seq<_>) = seq {
use en = input.GetEnumerator()
let running = ref true
// Generate a group starting with the current element. Stops generating
// when it founds element such that 'f en.Current' is 'true'
let rec group() =
[ yield en.Current
if en.MoveNext() then
if not (f en.Current) then yield! group()
else running := false ]
if en.MoveNext() then
// While there are still elements, start a new group
while running.Value do
yield group() |> Seq.ofList }
Here’s a nice Haskell version coded up by @sdarlington.
Maybe takewhile and dropwhile could power a Python solution, but my first choice would be itertools.groupby. Groupby chops a sequence into subsequences, where the elements of each subsequence have the same key value. A suitable key function, in this case, must change its return value every time the sequence yields an element for which the predicate holds. It could toggle between a pair of values, for example. Or it could just count the number of times the predicate holds.
class count_p:
''' Return a value which increments every time the predicate holds.
'''
def __init__(self, pred):
self._n = 0
self._pred = pred
def __call__(self, v):
self._n += self._pred(v)
return self._n
def group_when(pred, xs):
return (gp for _, gp in groupby(xs, count_p(pred)))
Here, group_when accepts an iterable and returns an iterable sequence of iterable groups. Clients choose how to consume the results.
>>> def odd(v): return v % 2 >>> xs = group_when(odd, [3, 3, 2, 4, 1, 2]) >>> print([list(g) for g in xs]) [[3], [3, 2, 4], [1, 2]]
Note that count_p does something very like itertools.accumulate. Here’s another version of group_when which takes advantage of this.
def group_when(pred, xs):
xs, ys = tee(xs)
accu = accumulate(map(pred, ys))
return (gp for _, gp in groupby(xs, lambda _: next(accu)))
§
After a short break, here’s a third version of group_when. This is the first time I’ve found a use for takewhile and dropwhile. Beware: as the teed streams xs and ys diverge, the amount of backing storage required will grow … only for the stored values to then be dropped!
from itertools import *
def group_when(p, xs):
def notp(x): return not p(x)
xs = iter(xs)
while True:
x = next(xs)
xs, ys = tee(xs)
yield chain([x], takewhile(notp, xs))
xs = dropwhile(notp, ys)
def odd(x):
return x % 2
[list(g) for g in group_when(odd, [3, 3, 2, 4, 1, 2])] # [[3], [3, 2, 4], [1, 2]]