Group When
Phil Nash’s recent tweet intrigued me.
Functional people: I often (in F#) need to process a seq into a smaller list or seq – where items from the input are grouped in some way…
— Phil Nash (@phil_nash) July 15, 2014
… the need to group may not be known until after the first item in the group.
I struggle to find a nicely functional way to do this. Ideas?
— Phil Nash (@phil_nash) July 15, 2014
He later clarified what he was after — and had now found — linking to a solution posted a couple of years ago by Tomas Petricek. The function groupWhen
splits a sequence into groups, starting a new group whenever the predicate returns true.
module Seq = /// Iterates over elements of the input sequence and groups adjacent elements. /// A new group is started when the specified predicate holds about the element /// of the sequence (and at the beginning of the iteration). /// /// For example: /// Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]] let groupWhen f (input:seq<_>) = seq { use en = input.GetEnumerator() let running = ref true // Generate a group starting with the current element. Stops generating // when it founds element such that 'f en.Current' is 'true' let rec group() = [ yield en.Current if en.MoveNext() then if not (f en.Current) then yield! group() else running := false ] if en.MoveNext() then // While there are still elements, start a new group while running.Value do yield group() |> Seq.ofList }
Here’s a nice Haskell version coded up by @sdarlington.
Maybe takewhile and dropwhile could power a Python solution, but my first choice would be itertools.groupby. Groupby
chops a sequence into subsequences, where the elements of each subsequence have the same key value. A suitable key function, in this case, must change its return value every time the sequence yields an element for which the predicate holds. It could toggle between a pair of values, for example. Or it could just count the number of times the predicate holds.
class count_p: ''' Return a value which increments every time the predicate holds. ''' def __init__(self, pred): self._n = 0 self._pred = pred def __call__(self, v): self._n += self._pred(v) return self._n def group_when(pred, xs): return (gp for _, gp in groupby(xs, count_p(pred)))
Here, group_when
accepts an iterable and returns an iterable sequence of iterable groups. Clients choose how to consume the results.
>>> def odd(v): return v % 2 >>> xs = group_when(odd, [3, 3, 2, 4, 1, 2]) >>> print([list(g) for g in xs]) [[3], [3, 2, 4], [1, 2]]
Note that count_p
does something very like itertools.accumulate. Here’s another version of group_when
which takes advantage of this.
def group_when(pred, xs): xs, ys = tee(xs) accu = accumulate(map(pred, ys)) return (gp for _, gp in groupby(xs, lambda _: next(accu)))
§
After a short break, here’s a third version of group_when
. This is the first time I’ve found a use for takewhile
and dropwhile
. Beware: as the teed streams xs
and ys
diverge, the amount of backing storage required will grow … only for the stored values to then be dropped!
from itertools import * def group_when(p, xs): def notp(x): return not p(x) xs = iter(xs) while True: x = next(xs) xs, ys = tee(xs) yield chain([x], takewhile(notp, xs)) xs = dropwhile(notp, ys) def odd(x): return x % 2 [list(g) for g in group_when(odd, [3, 3, 2, 4, 1, 2])] # [[3], [3, 2, 4], [1, 2]]