Group When

2014-07-16, , Comments

Phil Nash’s recent tweet intrigued me.

He later clarified what he was after — and had now found — linking to a solution posted a couple of years ago by Tomas Petricek. The function groupWhen splits a sequence into groups, starting a new group whenever the predicate returns true.

 module Seq =
   /// Iterates over elements of the input sequence and groups adjacent elements.
   /// A new group is started when the specified predicate holds about the element
   /// of the sequence (and at the beginning of the iteration).
   ///
   /// For example: 
   ///    Seq.groupWhen isOdd [3;3;2;4;1;2] = seq [[3]; [3; 2; 4]; [1; 2]]
   let groupWhen f (input:seq<_>) = seq {
     use en = input.GetEnumerator()
     let running = ref true

     // Generate a group starting with the current element. Stops generating
     // when it founds element such that 'f en.Current' is 'true'
     let rec group() = 
       [ yield en.Current
         if en.MoveNext() then
           if not (f en.Current) then yield! group() 
         else running := false ]

     if en.MoveNext() then
       // While there are still elements, start a new group
       while running.Value do
         yield group() |> Seq.ofList }

Here’s a nice Haskell version coded up by @sdarlington.

Maybe takewhile and dropwhile could power a Python solution, but my first choice would be itertools.groupby. Groupby chops a sequence into subsequences, where the elements of each subsequence have the same key value. A suitable key function, in this case, must change its return value every time the sequence yields an element for which the predicate holds. It could toggle between a pair of values, for example. Or it could just count the number of times the predicate holds.

class count_p:
    ''' Return a value which increments every time the predicate holds.
    '''
    def __init__(self, pred):
        self._n = 0
        self._pred = pred

    def __call__(self, v):
        self._n += self._pred(v)
        return self._n

def group_when(pred, xs):
    return (gp for _, gp in groupby(xs, count_p(pred)))

Here, group_when accepts an iterable and returns an iterable sequence of iterable groups. Clients choose how to consume the results.

>>> def odd(v): return v % 2
>>> xs = group_when(odd, [3, 3, 2, 4, 1, 2])
>>> print([list(g) for g in xs])
[[3], [3, 2, 4], [1, 2]]

Note that count_p does something very like itertools.accumulate. Here’s another version of group_when which takes advantage of this.

def group_when(pred, xs):
    xs, ys = tee(xs)
    accu = accumulate(map(pred, ys))
    return (gp for _, gp in groupby(xs, lambda _: next(accu)))

§

After a short break, here’s a third version of group_when. This is the first time I’ve found a use for takewhile and dropwhile. Beware: as the teed streams xs and ys diverge, the amount of backing storage required will grow … only for the stored values to then be dropped!

from itertools import *

def group_when(p, xs):
    def notp(x): return not p(x)
    xs = iter(xs)
    while True:
        x = next(xs)
        xs, ys = tee(xs)
        yield chain([x], takewhile(notp, xs))
        xs = dropwhile(notp, ys)

def odd(x):
    return x % 2

[list(g) for g in group_when(odd, [3, 3, 2, 4, 1, 2])] # [[3], [3, 2, 4], [1, 2]]