## Binary search gets a sort key

Suppose you have an list of distinct elements which has been sorted and rotated. How would you look up an element within that list?

For example, the list:

```
[7, 11, 13, 19, 2, 3, 5]
```

is sorted (the first 7 primes, in order) and rotated (to put 7 first).

With this list as input, then:

- look up
`13`

returns`2`

since`13`

is at index`2`

- look up
`2`

returns`4`

- look up
`4`

returns the sentinel value`-1`

The obvious technique is to just search the list:

```
def lookup(values, v):
try:
return values.index(v)
except IndexError:
return -1
```

This is a linear algorithm which processes the entire list. Is there a way to take advantage of its sorted+rotated-ness?

If the list **was** sorted then we could apply a binary search for a
logarithmic lookup. And in fact, by applying a custom ordering, the
list **is** sorted.

How can we apply a custom ordering in Python?

The way to do this has changed as Python has developed. The table below shows the evolution of standard Python functions which sort and compare.

Version | Year | Functions | Notes |
---|---|---|---|

2.0 | 2000 | max, min, list.sort([cmp]), bisect.bisect | Note that [cmp] slows the sorting process down considerably |

2.3 | 2003 | heapq | Also known as the priority queue algorithm |

2.4 | 2004 | sorted(iterable[, cmp[, key[, reverse]]]), list.sort([cmp[, key[, reverse]]]), itertools.groupby(iterable[, key) | In general, the key and reverse conversion processes are much faster than specifying an equivalent cmp function |

2.4 | 2004 | heapq.nlargest, heapq.nsmallest | Heapq extended |

2.5 | 2006 | max([key]), min([key]), heapq.nlargest(…[, key]), heapq.smallest(…[, key]) | |

2.6 | 2008 | heapq.merge | Heapq extended again |

3.0 | 2008 | sorted(iterable, key[, reverse]), list.sort(key[, reverse]) | No more `cmp` |

3.5 | 2015 | heapq.merge(…, key) | |

3.10 | 2021 | bisect.bisect(…, key) etc | That leaves the low-level `heapq` functions |

The earliest versions of Python allowed you to sort lists, and
the sort was customised using a `cmp`

function — though the documentation
warned there would be a performance penalty[*]. The builtin `min`

and `max`

functions could not be cutomised, and nor could the comparison used in
the `bisect`

module — which is Python’s binary search implementation.

At 2.3 the `heapq`

module appeared, but, like `bisect`

, there was no way
to customise the ordering of heap elements.

2.4 introduced the `key`

argument to customise ordering,
noting this should be preferred to `cmp`

. Unlike `cmp`

which compares two
elements, `key`

takes a single element and returns a sort rank for that element.
The `key`

argument was added alongside `cmp`

in `list.sort`

and the new `sorted`

builtin function; and `key`

was the only way to customise the grouping in
`itertools.groupby`

.

2.5 added `key`

to `min`

and `max`

, and also to `heapq.nlargest`

, `heapq.nsmallest`

.
Although these `heapq`

functions now accept a `key`

, the lower level heap functions
to heapify, push, pop and replace do not.

2.6 introduced `heapq.merge`

, a handy function to merge sorted inputs
using a heap, but with no option to specify a sort key.

3.0 got rid of `cmp`

, making `key`

the only way to customise
sorting. To migrate from Python 2 to 3 any `cmp`

functions need
converting to `key`

functions. As with Python 2.5, at 3.0 you could
apply a key to `list.sort`

, `sorted`

, `min`

, `max`

,
`itertools.groupby`

, `heapq.nlargest`

, `heapq.nsmallest`

— but not to
`bisect`

or other `heapq`

functions.

3.5 added `key`

to `heapq.merge`

, aligning it with `heapq.nlargest`

and `heapq.nsmallest`

, though it remained impossible to use a sort key
with the lower level heap functions.

The next change came in 3.10, when the `key`

parameter was
added to the `bisect`

module. As far as I can tell that means the only part of
standard Python which doesn’t let you fully customise ordering is the `heapq`

module.

## Bisect with a search key

So, to return to the original puzzle, consider our example sorted+rotated list:

```
>>> values = [7, 11, 13, 19, 2, 3, 5]
```

The first four elements are greater than or equal to the first element, 7. The last three elements are less than 7.

```
>>> [v < values[0] for v in values]
[False, False, False, False, True, True, True]
```

Since `False`

< `True`

the result of the list comprehension above is sorted.
Extending this idea, the following comprehension is also sorted.

```
>>> [(v < values[0], v) for v in values]
[(False, 7), (False, 11), (False, 13), (False, 19), (True, 2), (True, 3), (True, 5)]
```

Now we have a logarithmic lookup using binary search:

```
from bisect import bisect_left
def lookup(values, v):
key = lambda x: (x < values[0], x)
i = bisect_left(values, key(v), key=key)
return i if (i < len(values) and values[i] == v) else -1
```

## Don’t search for v, search for its key

I was surprised to realise the value `v`

being looked up has to have the `key`

function applied **before** calling `bisect_left`

. That is, to find where `v`

should go in `values`

to maintain the sort order, we pass `key(v)`

to `bisect_left`

.
This doesn’t match the interface to binary search in other languages. It also means
in our example we have to handle empty lists as a special case.

```
def lookup(values, v):
if not values: return -1
key = lambda x: (x < values[0], x)
i = bisect_left(values, key(v), key=key)
return i if (i < len(values) and values[i] == v) else -1
```

[*] Before the introduction of the sort `key`

, the standard pattern was
decorate-sort-undecorate.