# A Faster Weighted Random Choice

Consider this problem: You have a list of items and a list of their weights. You want to randomly choose an item, with probability proportional to the item's weight. For these examples, the algorithms will take a list of weights and return the index of the randomly selected item.

## Linear Scan

The simplest algorithm works by generating a random number and walking along the list of weights in a linear fashion. This has the advantage of requiring no additional space, but it runs in `O(N)` time, where `N` is the number of weights.

### Linear Scan Demo

This algorithm can be improved slightly by sorting the weights in descending order so that the bigger weights will be reached more quickly, but this is only a constant speedup factor–the algorithm is still `O(N)` after the sorting has been done.

With a little bit of preprocessing, it's possible to speed up the algorithm by storing the running totals of the weights, and using a binary search instead of a linear scan. This adds an additional storage cost of `O(N)`, but it speeds the algorithm up to `O(log(N))` for each random selection. Personally, I'm not a big fan of implementing binary search algorithms, because it's so easy to make off-by-one errors. This is a pretty dang fast algorithm though.

## Hopscotch Selection

However, it's possible to do even better by using the following algorithm that I've come up with. The algorithm works by first sorting the weights in descending order, then hopping along the list to find the selected element. The size of hops is calculated based on the invariant that all weights after the current position are not larger than the current weight. The algorithm tends to quickly hop over runs of weights with similar size, but it sometimes makes overly conservative hops when the weight size changes. Fortunately, when the weight size changes, it always gets smaller during the search, which means that there is a corresponding decrease in the probability that the search will need to progress further. This means that although the worst case scenario is a linear traversal of the whole list, the average number of iterations is much smaller than the average number of iterations for the binary search algorithm (for large lists).

### Amortized Analysis

Performing a good amortized analysis of this algorithm is rather difficult, because its runtime depends on the distribution of the weights. The two most extreme cases I can think of are if the weights are all exactly the same, or if the weights are successive powers of 2.

#### Uniform Weights

If all weights are equal, then the algorithm always terminates within 2 iterations. If it needs to make a hop, the hop will go to exactly where it needs to. Thus, the runtime is `O(1)` in all cases.

#### Power of 2 Weights

If the weights are successive powers of 2 (e.g. [32,16,8,4,2,1]), then if the algorithm generates a very high random number, it might traverse the whole list of weights, one element at a time. This is because every weight is strictly greater than the sum of all smaller weights, so the calculation guesses it will only need to hop one index. However, it's important to remember that this is not a search algorithm, it's a weighted random number generator, which means that each successive item is half as likely to be chosen as the previous one, so across k calls, the average number of iterations will be <2k. In other words, as the time to find an element increases linearly, the probability of selecting that element decreases exponentially, so the amortized runtime in this case works out to be `O(1)`. If the weights are powers of some number larger than 2, then the search time is still the same, but it's even less likely to need to walk to the later elements. If the weights are powers of some number between 1 and 2, then the algorithm will be able to jump ahead multiple elements at a time, since the sum of remaining weights won't necessarily be smaller than the current weight.

These are the two most extreme cases I could think of to analyze, and the algorithm runs in amortized constant time for both. The algorithm quickly jumps across regions with similar-sized weights, and slowdowns caused by variance in weight sizes are offset by the fact that bigger weights near the front of the list (and hence faster) are more likely to be selected when there's more variance in the weight sizes. My hunch is that this algorithm runs in amortized constant time for any weight distribution, but I've been unable to prove or disprove that hunch.

## Tests

Here are some tests that show how the different algorithms perform as the number of weights increases, using different weight distributions. The graphs use a logarithmic x-axis, and you can see that the binary search algorithm is roughly a straight line, corresponding to its `O(log(N))` runtime, and the linear scan variants have an exponential curve, corresponding to `O(N)`. The hopscotch algorithm looks to me as if it's asymptotically approaching a constant value, which would correspond to `O(1)`, though it's hard to say. It certainly does appear to be sublogarithmic though.    ## Final Words

Python's `random.random()` generates numbers in the half-open interval `[0,1)`, and the implementations here all assume that `random()` will never return 1.0 exactly. It's important to be wary of things like Python's `random.uniform(a,b)`, which generates results in the closed interval `[a,b]`, because this can break some of the implementations here.

The implementations here also assume that all weights are strictly greater than zero. If you might have zero-weight items, the algorithm needs to be sure to never return those items, even if `random()` returns some number that exactly lines up with one of the running totals.

The version of the hopscotch algorithm I present here is the version that I initially conceived, and I think it's the easiest to grok. However, if your weights do vary by many orders of magnitude, it is probably best to store the running totals from lowest to highest weight, to minimize floating point errors. It's best to add small numbers together until the running total gets big, and then start adding big numbers to that, rather than adding very small numbers to a big running total. This requires adjusting the algorithm somewhat, but if you're worried about floating point errors, then you're probably more than capable of tweaking the code.

If anyone is able to do a good amortized analysis of this algorithm's runtime for arbitrary weights, I'd love to hear about it. The algorithm is simple to implement and it seems to run much faster than `O(log(N))`. My gut says that it runs in amortized `O(1)` time, but you can't always trust your gut.

## Addendum: The Fastest Weighted Random Choice Algorithm

There's one more weighted random algorithm, originally discovered by A.J. Walker in 1974 (described in this excellent page by Keith Schwarz), that I think is the fastest and most efficient algorithm out there. It's really elegant, and it runs in `O(N)` setup time, and guaranteed `O(1)` run time for every random selection. The algorithm is hard to find on Google, despite being relatively simple to implement and clearly better than all other contenders (alas, including my own algorithm), so here is my Python implementation:

### Alias Method Demo

For now, I'll refrain from a full explanation, but essentially, the algorithm works by slicing up the weights and putting them into `N` buckets. Each bucket gets exactly 1 or 2 of the weight-slices in such a way that the slices in each bucket add up to the same amount, and no slices are left over at the end. The bigger-than-average weights each have small slices get shaved off to fill the gaps in the buckets of the smaller-than-average weights until the big weight is whittled down enough to be smaller-than-average, and the process continues. Why this is guaranteed to works out so perfectly is not immediately obvious, but you can grok it if you spend a while thinking about it and working through some examples.

When you want to do a weighted random selection, you just pick a random bucket (a constant-time operation: `buckets[int(rand*N)]`) and then choose either the first or second slice in the bucket, with odds proportional to the sizes of the two slices, another constant-time operation: ```i if (rand*N)%1 <= slice1/(slice1+slice2) else alias``` (`alias` is the index of the second slice in the bucket). There's some really clever optimizations, like always having bucket `i`'s first slice be from `weights[i]`, and scaling the slice sizes so that the values stored in the bucket always add up to 1. It's all very clever! Again, please check out and share the more detailed explanation.