I am trying to figure an elegant way of implementing the distribution of an amount into a given set of slots in python.
For example:
7 oranges distributed onto 4 plates would return:
[2, 2, 2, 1]
10 oranges across 4 plates would be:
[3, 3, 2, 2]
Conceptually, what you want to do is compute 7 // 4 = 1 and 7 % 4 = 3. This means that all the plates get 1 whole orange. The remainder of 3 tells you that three of the plates get an extra orange.
The divmod builtin is a shortcut for getting both quantities simultaneously:
def distribute(oranges, plates):
base, extra = divmod(oranges, plates)
return [base + (i < extra) for i in range(plates)]
With your example:
>>> distribute(oranges=7, plates=4)
[2, 2, 2, 1]
For completeness, you'd probably want to check that oranges is non-negative and plates is positive. Given those conditions, here are some additional test cases:
>>> distribute(oranges=7, plates=1)
[7]
>>> distribute(oranges=0, plates=4)
[0, 0, 0, 0]
>>> distribute(oranges=20, plates=2)
[10, 10]
>>> distribute(oranges=19, plates=4)
[5, 5, 5, 4]
>>> distribute(oranges=10, plates=4)
[3, 3, 2, 2]
You want to look at Bresenham's algorithm for drawing lines (i.e. distributing X pixels on a Y range as "straightly" as possible; the application of this to the distribution problem is straightforward).
This is an implementation I found here:
def get_line(start, end):
"""Bresenham's Line Algorithm
Produces a list of tuples from start and end
>>> points1 = get_line((0, 0), (3, 4))
>>> points2 = get_line((3, 4), (0, 0))
>>> assert(set(points1) == set(points2))
>>> print points1
[(0, 0), (1, 1), (1, 2), (2, 3), (3, 4)]
>>> print points2
[(3, 4), (2, 3), (1, 2), (1, 1), (0, 0)]
"""
# Setup initial conditions
x1, y1 = start
x2, y2 = end
dx = x2 - x1
dy = y2 - y1
# Determine how steep the line is
is_steep = abs(dy) > abs(dx)
# Rotate line
if is_steep:
x1, y1 = y1, x1
x2, y2 = y2, x2
# Swap start and end points if necessary and store swap state
swapped = False
if x1 > x2:
x1, x2 = x2, x1
y1, y2 = y2, y1
swapped = True
# Recalculate differentials
dx = x2 - x1
dy = y2 - y1
# Calculate error
error = int(dx / 2.0)
ystep = 1 if y1 < y2 else -1
# Iterate over bounding box generating points between start and end
y = y1
points = []
for x in range(x1, x2 + 1):
coord = (y, x) if is_steep else (x, y)
points.append(coord)
error -= abs(dy)
if error < 0:
y += ystep
error += dx
# Reverse the list if the coordinates were swapped
if swapped:
points.reverse()
return points
Mad Physicist answer is perfect. But if you want to distribute the oranges uniformley on the plates (eg. 2 3 2 3 vs 2 2 3 3 in the 7 oranges and 4 plates example), here's an simple idea.
Take an example with 31 oranges and 7 plates for example.
Step 1: You begin like Mad Physicist with an euclidian division: 31 = 4*7 + 3. Put 4 oranges in each plate and keep the remaining 3.
[4, 4, 4, 4, 4, 4, 4]
Step 2: Now, you have more plates than oranges, and that's quite different: you have to distribute plates among oranges. You have 7 plates and 3 oranges left: 7 = 2*3 + 1. You will have 2 plates per orange (you have a plate left, but it doesn't matter). Let's call this 2 the leap. Start at leap/2 will be pretty :
[4, 5, 4, 5, 4, 5, 4]
That was the easy case. What happens with 34 oranges and 7 plates?
Step 1: You still begin like Mad Physicist with an euclidian division: 34 = 4*7 + 6. Put 4 oranges in each plate and keep the remaining 6.
[4, 4, 4, 4, 4, 4, 4]
Step 2: Now, you have 7 plates and 6 oranges left: 7 = 1*6 + 1. You will have one plate per orange. But wait.. I don't have 7 oranges! Don't be afraid, I lend you an apple:
[5, 5, 5, 5, 5, 5, 4+apple]
But if you want some uniformity, you have to place that apple elsewhere! Why not try distribute apples like oranges in the first case? 7 plates, 1 apple : 7 = 1*7 + 0. The leap is 7, start at leap/2, that is 3:
[5, 5, 5, 4+apple, 5, 5, 5]
Step 3. You owe me an apple. Please give me back my apple :
[5, 5, 5, 4, 5, 5, 5]
To summarize : if you have few oranges left, you distribute the peaks, else you distribute the valleys. (Disclaimer: I'm the author of this "algorithm" and I hope it is correct, but please correct me if I'm wrong !)
Enough talk, the code:
def distribute(oranges, plates):
base, extra = divmod(oranges, plates) # extra < plates
if extra == 0:
L = [base for _ in range(plates)]
elif extra <= plates//2:
leap = plates // extra
L = [base + (i%leap == leap//2) for i in range(plates)]
else: # plates/2 < extra < plates
leap = plates // (plates-extra) # plates - extra is the number of apples I lent you
L = [base + (1 - (i%leap == leap//2)) for i in range(plates)]
return L
Some tests:
>>> distribute(oranges=28, plates=7)
[4, 4, 4, 4, 4, 4, 4]
>>> distribute(oranges=29, plates=7)
[4, 4, 4, 5, 4, 4, 4]
>>> distribute(oranges=30, plates=7)
[4, 5, 4, 4, 5, 4, 4]
>>> distribute(oranges=31, plates=7)
[4, 5, 4, 5, 4, 5, 4]
>>> distribute(oranges=32, plates=7)
[5, 4, 5, 4, 5, 4, 5]
>>> distribute(oranges=33, plates=7)
[5, 4, 5, 5, 4, 5, 5]
>>> distribute(oranges=34, plates=7)
[5, 5, 5, 4, 5, 5, 5]
>>> distribute(oranges=35, plates=7)
[5, 5, 5, 5, 5, 5, 5]
ceil solves the problem only in some circumstances.leap doesn't allow proper distribution of the added 1 within the plate count: for instance, when doing 28 oranges in 22 plates, it will only add 1 in four instances, because leap = 22 / (28 % 22) giving leap = 3, which will result in a sum of 29 since 1 is added every three iterations. Using ceil as suggested above will solve the problem only in some cases, but still not in this (leap = 4), resulting in a sum of 26 because 1 is added every 4 iterations.See also more_itertools.distribute, a third-party tool and its source code.
Code
Here we distributes m items into n bins, one-by-one, and count each bin.
import more_itertools as mit
def sum_distrib(m, n):
"""Return an iterable of m items distributed across n spaces."""
return [sum(x) for x in mit.distribute(n, [1]*m)]
Demo
sum_distrib(10, 4)
# [3, 3, 2, 2]
sum_distrib(7, 4)
# [2, 2, 2, 1]
sum_distrib(23, 17)
# [2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Example
This idea is akin to distributing a deck of cards among players. Here is an initial game of Slapjack
import random
import itertools as it
players = 8
suits = list("♠♡♢♣")
ranks = list(range(2, 11)) + list("JQKA")
deck = list(it.product(ranks, suits))
random.shuffle(deck)
hands = [list(hand) for hand in mit.distribute(players, deck)]
hands
# [[('A', '♣'), (9, '♠'), ('K', '♣'), (7, '♢'), ('A', '♠'), (5, '♠'), (2, '♠')],
# [(6, '♣'), ('Q', '♠'), (5, '♢'), (5, '♡'), (3, '♡'), (8, '♡'), (7, '♣')],
# [(7, '♡'), (9, '♢'), (2, '♢'), (9, '♡'), (7, '♠'), ('K', '♠')],
# ...]
where the cards are distributed "as equally as possible between all [8] players":
[len(hand) for hand in hands]
# [7, 7, 7, 7, 6, 6, 6, 6]
While the accepted answer does answer the question, at least based on the given examples, it does not distribute items in an actually evenly distributed set.
For instance, 10 oranges in 4 plates gives this:
[3, 3, 2, 2]
But more evenly distributed results could be following:
[2, 3, 3, 2][3, 2, 2, 3][2, 3, 2, 3][3, 2, 3, 2]With more complex cases, we could get multiple combinations of symmetrical and/or incremental, especially for cases in which "oranges" cannot be evenly divided into "groups" of "plates". Considering the case of 19 oranges in 7 plates: where would you place the 2-oranges plates?
[2, 3, 3, 3, 3, 3, 2][3, 2, 3, 3, 2, 3, 3][3, 2, 3, 3, 3, 2, 3]Depending on the requirements, each one of those combinations may be preferred, but arguing about how those could be computed and chosen is not the point of this post.
Note:
The Bresenham's algorithm based answer is theoretically more appropriate, but it just provides a copied code for a related aspect, requiring further analysis in order to adjust it to what's being asked. It also doesn't work appropriately for cases in which there are less oranges than plates.
The borrowed apple answer is interesting in principle, but has lots of flaws that present for specific cases (and using ceil instead of // only partially fixes it).
After some considerations and research, I stumbled upon a similarly related post: Generating evenly distributed multiples/samples within a range.
Most of its answers are based on the following concept:
def steps(start, end, n):
step = (end - start) / (n - 1)
return [round(start + i * step)) for i in range(n)]
The implementation is closely related to incremental error algorithms: you add assumed values and "accumulate" errors up to a certain point, when you also add a certain amount based on the accumulated errors.
In this case we let the round() function do that for us: it's not incremental, but we assume that evenly distributed items would eventually "fall in place".
Then, we can adjust that code in order to properly distribute our oranges (count) and plates (slots):
def distribute(count, slots):
step = count / slots
old = 0
res = []
for i in range(1, slots + 1):
new = round(i * step)
res.append(new - old)
old = new
return res
The procedure is changed from the original code by doing the following:
step is based on count (start of the previous code is always assumed as 0), but it's divided by the whole number of slots (not subtracted by 1), because we need to fill "slopes", not know about the gaps;slots + 1: the first step is already known, so the next has to be multiplied by 2;This approach is not perfect, but, as said, we're not discussing every possibility: there are many ways of aliasing non-integer values, and other than choosing on a preferred solution, we should also consider efficiency.
And this approach is quite efficient.
The difference with other approaches or error corrections may be quite important for relatively small values, but for large numbers (especially large differences of "oranges/plates"), any difference is just open to debate as possibly pointless.
Note that, in general, the real issue is the modulo between oranges and plates, because evenly distributed values normally increase each "slot" just by one: the problem of proper splitting comes only whenever slots % count is not zero, but that value can never be equal or larger than count, meaning that you only need to add 1 to a certain amount of slots that is always smaller than the overall slot count, because if you added 1 to all slots, you'd get 0 from the % modulo above.
Whenever you have to consider possible large numbers and that consideration is more important that overall efficiency for possibly small amounts, you can consider to slightly improve the above by adding the following at the beginning of the function:
def distribute(count, slots):
if not count % slots:
base = count // slots
return [base] * slots
step = count / slots
...
A final note: I've tested the above with reasonable values (ranges of hundreds for both count and slots), but not extremely large numbers or oranges/plates differences. It is possible that under those circumstances the rounding of floating point values may give inappropriate results.
If that level of accuracy is still required for such extreme values, you should consider adding some error checking, possibly by keeping a local count of the whole sum.