Creating a dict of lists in Python
Suppose you have a list of objects which you want to convert into a dict mapping from some object key to the (sub-)list of objects with that key. To provide a simple example, let’s start with a list of fruits.
from collections import namedtuple
Fruit = namedtuple('Fruit', 'name colour')
def banana(): return Fruit('banana', 'yellow')
def grape(): return Fruit('grape', 'green')
def pear(): return Fruit('pear', 'green')
def strawberry(): return Fruit('strawberry', 'red')
def cherry(): return Fruit('cherry', 'red')
fruits = [
banana(), pear(), cherry(), cherry(), pear(),
grape(), banana(), grape(), cherry(), grape(),
strawberry(), pear(), grape(), cherry()]
We’d like to arrange a fruitbowl — a dict which groups fruits by colour. This can be done by creating an empty bowl, then iterating through the fruits placing each in the correct list.
fruitbowl = {}
for fruit in fruits:
fruitbowl.setdefault(fruit.colour, []).append(fruit)
Dict.setdefault
is a bit of an oddity in Python, both doing something and returning a value, but it’s a convenient shorthand in this case. Despite this convenience it’s more common to use a defaultdict
.
from collections import defaultdict
fruitbowl = defaultdict(list)
for fruit in fruits:
fruitbowl[fruit.colour].append(fruit)
Here’s a function to display the fruitbowl.
def print_bowl(bowl):
print('\n'.join(
'{}: {}'.format(colour,
', '.join(f.name for f in fruits))
for colour, fruits in bowl.items()))
If we call this function, we see the fruits have indeed been grouped by colour.
>>> print_bowl(fruitbowl)
yellow: banana, banana
green: pear, pear, grape, grape, grape, pear, grape
red: cherry, cherry, cherry, strawberry, cherry
This is all fine and idiomatic Python, but whenever I see an empty dict being created followed by a loop to populate it, I wonder if a comprehension could be used.
Is there a way to declare and initialise the dict in a single expression? Here’s the best I came up with.
from operator import attrgetter
from itertools import groupby
colour = attrgetter('colour')
fruitbowl = {
col: list(fts)
for col, fts in groupby(sorted(fruits, key=colour), colour)}
Is this better than the defaultdict
solution. Probably not, but it’s a technique worth remembering. Maybe the fruitbowl
isn’t needed, and we actually just need to iterate through the fruits grouped by colour. For example, which colour is most popular?
>>> max(fruitbowl.items(), key=lambda kv: len(kv[1]))[0]
'green'
Using groupby
, we don’t need the bowl.
>>> def grouplen(k_gp):
... return sum(1 for _ in k_gp[1])
>>> max(groupby(sorted(fruits, key=colour), colour), key=grouplen)[0]
>>> 'green'
In this case, we don’t need groupby
either. There is more than one way to do it.
>>> from collections import Counter
>>> Counter(map(colour, fruits)).most_common(1)
[('green', 7)]