Creating a dict of lists in Python

2018-04-29, Comments

Suppose you have a list of objects which you want to convert into a dict mapping from some object key to the (sub-)list of objects with that key. To provide a simple example, let’s start with a list of fruits.

from collections import namedtuple

Fruit = namedtuple('Fruit', 'name colour')

def banana():     return Fruit('banana', 'yellow')
def grape():      return Fruit('grape', 'green')
def pear():       return Fruit('pear', 'green')
def strawberry(): return Fruit('strawberry', 'red')
def cherry():     return Fruit('cherry', 'red')

fruits = [
    banana(), pear(), cherry(), cherry(), pear(),
    grape(), banana(), grape(), cherry(), grape(),
    strawberry(), pear(), grape(), cherry()]

We’d like to arrange a fruitbowl — a dict which groups fruits by colour. This can be done by creating an empty bowl, then iterating through the fruits placing each in the correct list.

fruitbowl = {}

for fruit in fruits:
    fruitbowl.setdefault(fruit.colour, []).append(fruit)

Dict.setdefault is a bit of an oddity in Python, both doing something and returning a value, but it’s a convenient shorthand in this case. Despite this convenience it’s more common to use a defaultdict.

from collections import defaultdict

fruitbowl = defaultdict(list)

for fruit in fruits:
    fruitbowl[fruit.colour].append(fruit)

Here’s a function to display the fruitbowl.

def print_bowl(bowl):
    print('\n'.join(
        '{}: {}'.format(colour,
                        ', '.join(f.name for f in fruits))
        for colour, fruits in bowl.items()))

If we call this function, we see the fruits have indeed been grouped by colour.

>>> print_bowl(fruitbowl)
yellow: banana, banana
green: pear, pear, grape, grape, grape, pear, grape
red: cherry, cherry, cherry, strawberry, cherry

This is all fine and idiomatic Python, but whenever I see an empty dict being created followed by a loop to populate it, I wonder if a comprehension could be used.

Is there a way to declare and initialise the dict in a single expression? Here’s the best I came up with.

from operator import attrgetter
from itertools import groupby

colour = attrgetter('colour')

fruitbowl = {
    col: list(fts)
    for col, fts in groupby(sorted(fruits, key=colour), colour)}

Is this better than the defaultdict solution. Probably not, but it’s a technique worth remembering. Maybe the fruitbowl isn’t needed, and what we actually want is to iterate through the fruits grouped by colour. For example, which colour is most popular?

>>> max(fruitbowl.items(), key=lambda kv: len(kv[1]))[0]
'green'

Using groupby, we don’t need the bowl.

>>> def grouplen(k_gp):
...     return sum(1 for _ in k_gp[1])
>>> max(groupby(sorted(fruits, key=colour), colour), key=grouplen)[0]
>>> 'green'

In this case, we don’t need groupby either. There is more than one way to do it.

>>> from collections import Counter
>>> Counter(map(colour, fruits)).most_common(1)
[('green', 7)]