## Binary search gets a sort key

Suppose you have an list of distinct elements which has been sorted and rotated. How would you look up an element within that list?

For example, the list:

``````[7, 11, 13, 19, 2, 3, 5]
``````

is sorted (the first 7 primes, in order) and rotated (to put 7 first).

With this list as input, then:

• look up `13` returns `2` since `13` is at index `2`
• look up `2` returns `4`
• look up `4` returns the sentinel value `-1`

The obvious technique is to just search the list:

``````def lookup(values, v):
try:
return values.index(v)
except IndexError:
return -1
``````

This is a linear algorithm which processes the entire list. Is there a way to take advantage of its sorted+rotated-ness?

If the list was sorted then we could apply a binary search for a logarithmic lookup. And in fact, by applying a custom ordering, the list is sorted.

How can we apply a custom ordering in Python?

The way to do this has changed as Python has developed. The table below shows the evolution of standard Python functions which sort and compare.

Version Year Functions Notes
2.0 2000 max, min, list.sort([cmp]), bisect.bisect Note that [cmp] slows the sorting process down considerably
2.3 2003 heapq Also known as the priority queue algorithm
2.4 2004 sorted(iterable[, cmp[, key[, reverse]]]), list.sort([cmp[, key[, reverse]]]), itertools.groupby(iterable[, key) In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function
2.4 2004 heapq.nlargest, heapq.nsmallest Heapq extended
2.5 2006 max([key]), min([key]), heapq.nlargest(…[, key]), heapq.smallest(…[, key])
2.6 2008 heapq.merge Heapq extended again
3.0 2008 sorted(iterable, key[, reverse]), list.sort(key[, reverse]) No more `cmp`
3.5 2015 heapq.merge(…, key)
3.10 2021 bisect.bisect(…, key) etc That leaves the low-level `heapq` functions

The earliest versions of Python allowed you to sort lists, and the sort was customised using a `cmp` function — though the documentation warned there would be a performance penalty[*]. The builtin `min` and `max` functions could not be cutomised, and nor could the comparison used in the `bisect` module — which is Python’s binary search implementation.

At 2.3 the `heapq` module appeared, but, like `bisect`, there was no way to customise the ordering of heap elements.

2.4 introduced the `key` argument to customise ordering, noting this should be preferred to `cmp`. Unlike `cmp` which compares two elements, `key` takes a single element and returns a sort rank for that element. The `key` argument was added alongside `cmp` in `list.sort` and the new `sorted` builtin function; and `key` was the only way to customise the grouping in `itertools.groupby`.

2.5 added `key` to `min` and `max`, and also to `heapq.nlargest`, `heapq.nsmallest`. Although these `heapq` functions now accept a `key`, the lower level heap functions to heapify, push, pop and replace do not.

2.6 introduced `heapq.merge`, a handy function to merge sorted inputs using a heap, but with no option to specify a sort key.

3.0 got rid of `cmp`, making `key` the only way to customise sorting. To migrate from Python 2 to 3 any `cmp` functions need converting to `key` functions. As with Python 2.5, at 3.0 you could apply a key to `list.sort`, `sorted`, `min`, `max`, `itertools.groupby`, `heapq.nlargest`, `heapq.nsmallest` — but not to `bisect` or other `heapq` functions.

3.5 added `key` to `heapq.merge`, aligning it with `heapq.nlargest` and `heapq.nsmallest`, though it remained impossible to use a sort key with the lower level heap functions.

The next change came in 3.10, when the `key` parameter was added to the `bisect` module. As far as I can tell that means the only part of standard Python which doesn’t let you fully customise ordering is the `heapq` module.

## Bisect with a search key

So, to return to the original puzzle, consider our example sorted+rotated list:

``````>>> values = [7, 11, 13, 19, 2, 3, 5]
``````

The first four elements are greater than or equal to the first element, 7. The last three elements are less than 7.

``````>>> [v < values for v in values]
[False, False, False, False, True, True, True]
``````

Since `False` < `True` the result of the list comprehension above is sorted. Extending this idea, the following comprehension is also sorted.

``````>>> [(v < values, v) for v in values]
[(False, 7), (False, 11), (False, 13), (False, 19), (True, 2), (True, 3), (True, 5)]
``````

Now we have a logarithmic lookup using binary search:

``````from bisect import bisect_left

def lookup(values, v):
key = lambda x: (x < values, x)
i = bisect_left(values, key(v), key=key)
return i if (i < len(values) and values[i] == v) else -1
``````

## Don’t search for v, search for its key

I was surprised to realise the value `v` being looked up has to have the `key` function applied before calling `bisect_left`. That is, to find where `v` should go in `values` to maintain the sort order, we pass `key(v)` to `bisect_left`. This doesn’t match the interface to binary search in other languages. It also means in our example we have to handle empty lists as a special case.

``````def lookup(values, v):
if not values: return -1
key = lambda x: (x < values, x)
i = bisect_left(values, key(v), key=key)
return i if (i < len(values) and values[i] == v) else -1
``````

[*] Before the introduction of the sort `key`, the standard pattern was decorate-sort-undecorate.