33

I am trying to figure an elegant way of implementing the distribution of an amount into a given set of slots in python.

For example:

7 oranges distributed onto 4 plates would return:

[2, 2, 2, 1]

10 oranges across 4 plates would be:

[3, 3, 2, 2]
0

6 Answers 6

52

Conceptually, what you want to do is compute 7 // 4 = 1 and 7 % 4 = 3. This means that all the plates get 1 whole orange. The remainder of 3 tells you that three of the plates get an extra orange.

The divmod builtin is a shortcut for getting both quantities simultaneously:

def distribute(oranges, plates):
    base, extra = divmod(oranges, plates)
    return [base + (i < extra) for i in range(plates)]

With your example:

>>> distribute(oranges=7, plates=4)
[2, 2, 2, 1]

For completeness, you'd probably want to check that oranges is non-negative and plates is positive. Given those conditions, here are some additional test cases:

>>> distribute(oranges=7, plates=1)
[7]

>>> distribute(oranges=0, plates=4)
[0, 0, 0, 0]

>>> distribute(oranges=20, plates=2)
[10, 10]

>>> distribute(oranges=19, plates=4)
[5, 5, 5, 4]

>>> distribute(oranges=10, plates=4)
[3, 3, 2, 2]
Sign up to request clarification or add additional context in comments.

3 Comments

@Lserni. This method guarantees that the difference between the maximum and minimum plate is at most 1. What additional criteria did you have for evenness?
While I agree that the distribution of the remainder could be improved from "shoved to the left", OP does nothing to lead me to believe that that's something they want.
Actually the OP's second example indicates that your solution - apart from simplicity - is the one yielding the expected results (hadn't noticed it before).
19

You want to look at Bresenham's algorithm for drawing lines (i.e. distributing X pixels on a Y range as "straightly" as possible; the application of this to the distribution problem is straightforward).

This is an implementation I found here:

def get_line(start, end):
    """Bresenham's Line Algorithm
    Produces a list of tuples from start and end

    >>> points1 = get_line((0, 0), (3, 4))
    >>> points2 = get_line((3, 4), (0, 0))
    >>> assert(set(points1) == set(points2))
    >>> print points1
    [(0, 0), (1, 1), (1, 2), (2, 3), (3, 4)]
    >>> print points2
    [(3, 4), (2, 3), (1, 2), (1, 1), (0, 0)]
    """
    # Setup initial conditions
    x1, y1 = start
    x2, y2 = end
    dx = x2 - x1
    dy = y2 - y1

    # Determine how steep the line is
    is_steep = abs(dy) > abs(dx)

    # Rotate line
    if is_steep:
        x1, y1 = y1, x1
        x2, y2 = y2, x2

    # Swap start and end points if necessary and store swap state
    swapped = False
    if x1 > x2:
        x1, x2 = x2, x1
        y1, y2 = y2, y1
        swapped = True

    # Recalculate differentials
    dx = x2 - x1
    dy = y2 - y1

    # Calculate error
    error = int(dx / 2.0)
    ystep = 1 if y1 < y2 else -1

    # Iterate over bounding box generating points between start and end
    y = y1
    points = []
    for x in range(x1, x2 + 1):
        coord = (y, x) if is_steep else (x, y)
        points.append(coord)
        error -= abs(dy)
        if error < 0:
            y += ystep
            error += dx

    # Reverse the list if the coordinates were swapped
    if swapped:
        points.reverse()
    return points

3 Comments

For a more clearly specified problem, this would be the better answer.
Mad Physicist: not sure I agree with you. Elegance is a bit subjective criterion, but for a layman this is not as elegant :)
@gmagno. It's not just about elegance. This solution actually distributes the bins differently than mine. If there was an additional requirement to maintain as uniform a distribution across the peaks as possible, mine wouldn't cut it at all. But yes, this is more cumbersome and I'm guessing OP wanted the beginner version.
7

Mad Physicist answer is perfect. But if you want to distribute the oranges uniformley on the plates (eg. 2 3 2 3 vs 2 2 3 3 in the 7 oranges and 4 plates example), here's an simple idea.

Easy case

Take an example with 31 oranges and 7 plates for example.

Step 1: You begin like Mad Physicist with an euclidian division: 31 = 4*7 + 3. Put 4 oranges in each plate and keep the remaining 3.

[4, 4, 4, 4, 4, 4, 4]

Step 2: Now, you have more plates than oranges, and that's quite different: you have to distribute plates among oranges. You have 7 plates and 3 oranges left: 7 = 2*3 + 1. You will have 2 plates per orange (you have a plate left, but it doesn't matter). Let's call this 2 the leap. Start at leap/2 will be pretty :

[4, 5, 4, 5, 4, 5, 4]

Not so easy case

That was the easy case. What happens with 34 oranges and 7 plates?

Step 1: You still begin like Mad Physicist with an euclidian division: 34 = 4*7 + 6. Put 4 oranges in each plate and keep the remaining 6.

[4, 4, 4, 4, 4, 4, 4]

Step 2: Now, you have 7 plates and 6 oranges left: 7 = 1*6 + 1. You will have one plate per orange. But wait.. I don't have 7 oranges! Don't be afraid, I lend you an apple:

[5, 5, 5, 5, 5, 5, 4+apple]

But if you want some uniformity, you have to place that apple elsewhere! Why not try distribute apples like oranges in the first case? 7 plates, 1 apple : 7 = 1*7 + 0. The leap is 7, start at leap/2, that is 3:

[5, 5, 5, 4+apple, 5, 5, 5]

Step 3. You owe me an apple. Please give me back my apple :

[5, 5, 5, 4, 5, 5, 5]

To summarize : if you have few oranges left, you distribute the peaks, else you distribute the valleys. (Disclaimer: I'm the author of this "algorithm" and I hope it is correct, but please correct me if I'm wrong !)

The code

Enough talk, the code:

def distribute(oranges, plates):
    base, extra = divmod(oranges, plates) # extra < plates
    if extra == 0:
        L = [base for _ in range(plates)]
    elif extra <= plates//2:
        leap = plates // extra
        L = [base + (i%leap == leap//2) for i in range(plates)]
    else: # plates/2 < extra < plates
        leap = plates // (plates-extra) # plates - extra is the number of apples I lent you
        L = [base + (1 - (i%leap == leap//2)) for i in range(plates)]
    return L

Some tests:

>>> distribute(oranges=28, plates=7)
[4, 4, 4, 4, 4, 4, 4]
>>> distribute(oranges=29, plates=7)
[4, 4, 4, 5, 4, 4, 4]
>>> distribute(oranges=30, plates=7)
[4, 5, 4, 4, 5, 4, 4]
>>> distribute(oranges=31, plates=7)
[4, 5, 4, 5, 4, 5, 4]
>>> distribute(oranges=32, plates=7)
[5, 4, 5, 4, 5, 4, 5]
>>> distribute(oranges=33, plates=7)
[5, 4, 5, 5, 4, 5, 5]
>>> distribute(oranges=34, plates=7)
[5, 5, 5, 4, 5, 5, 5]
>>> distribute(oranges=35, plates=7)
[5, 5, 5, 5, 5, 5, 5]

5 Comments

Good thinking and clear code. But the algorithm will give distribute(oranges=11, plates=8) == [1, 2, 1, 2, 1, 2, 1, 2] which contains 12 oranges.
change leap to math.ceil(plates / extra) solves that bug
@DavidWu Unfortunately, ceil solves the problem only in some circumstances.
This sadly doesn't work for lots of combinations. The main issue raises when leap doesn't allow proper distribution of the added 1 within the plate count: for instance, when doing 28 oranges in 22 plates, it will only add 1 in four instances, because leap = 22 / (28 % 22) giving leap = 3, which will result in a sum of 29 since 1 is added every three iterations. Using ceil as suggested above will solve the problem only in some cases, but still not in this (leap = 4), resulting in a sum of 26 because 1 is added every 4 iterations.
3

See also more_itertools.distribute, a third-party tool and its source code.

Code

Here we distributes m items into n bins, one-by-one, and count each bin.

import more_itertools as mit


def sum_distrib(m, n):
    """Return an iterable of m items distributed across n spaces."""
    return [sum(x) for x in mit.distribute(n, [1]*m)]

Demo

sum_distrib(10, 4)
# [3, 3, 2, 2]

sum_distrib(7, 4)
# [2, 2, 2, 1]

sum_distrib(23, 17)
# [2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Example

This idea is akin to distributing a deck of cards among players. Here is an initial game of Slapjack

import random
import itertools as it


players = 8
suits = list("♠♡♢♣")
ranks = list(range(2, 11)) + list("JQKA")
deck = list(it.product(ranks, suits))
random.shuffle(deck)

hands = [list(hand) for hand in mit.distribute(players, deck)]
hands
# [[('A', '♣'), (9, '♠'), ('K', '♣'), (7, '♢'), ('A', '♠'), (5, '♠'), (2, '♠')],
#  [(6, '♣'), ('Q', '♠'), (5, '♢'), (5, '♡'), (3, '♡'), (8, '♡'), (7, '♣')],
#  [(7, '♡'), (9, '♢'), (2, '♢'), (9, '♡'), (7, '♠'), ('K', '♠')],
#   ...]

where the cards are distributed "as equally as possible between all [8] players":

[len(hand) for hand in hands]
# [7, 7, 7, 7, 6, 6, 6, 6]

Comments

1

Not sure how this works. But it returns the exact same results

a = 23
b = 17
s = pd.Series(pd.cut(mylist, b), index=mylist)
s.groupby(s).size().values

Comments

0

While the accepted answer does answer the question, at least based on the given examples, it does not distribute items in an actually evenly distributed set.

For instance, 10 oranges in 4 plates gives this:

[3, 3, 2, 2]

But more evenly distributed results could be following:

  • symmetrical:
    • [2, 3, 3, 2]
    • [3, 2, 2, 3]
  • incremental error:
    • [2, 3, 2, 3]
    • [3, 2, 3, 2]

With more complex cases, we could get multiple combinations of symmetrical and/or incremental, especially for cases in which "oranges" cannot be evenly divided into "groups" of "plates". Considering the case of 19 oranges in 7 plates: where would you place the 2-oranges plates?

  • [2, 3, 3, 3, 3, 3, 2]
  • [3, 2, 3, 3, 2, 3, 3]
  • [3, 2, 3, 3, 3, 2, 3]
  • etc...

Depending on the requirements, each one of those combinations may be preferred, but arguing about how those could be computed and chosen is not the point of this post.

Note:
The Bresenham's algorithm based answer is theoretically more appropriate, but it just provides a copied code for a related aspect, requiring further analysis in order to adjust it to what's being asked. It also doesn't work appropriately for cases in which there are less oranges than plates.
The borrowed apple answer is interesting in principle, but has lots of flaws that present for specific cases (and using ceil instead of // only partially fixes it).

After some considerations and research, I stumbled upon a similarly related post: Generating evenly distributed multiples/samples within a range.

Most of its answers are based on the following concept:

def steps(start, end, n):
    step = (end - start) / (n - 1)
    return [round(start + i * step)) for i in range(n)]

The implementation is closely related to incremental error algorithms: you add assumed values and "accumulate" errors up to a certain point, when you also add a certain amount based on the accumulated errors.

In this case we let the round() function do that for us: it's not incremental, but we assume that evenly distributed items would eventually "fall in place".

Then, we can adjust that code in order to properly distribute our oranges (count) and plates (slots):

def distribute(count, slots):
    step = count / slots
    old = 0
    res = []
    for i in range(1, slots + 1):
        new = round(i * step)
        res.append(new - old)
        old = new
    return res

The procedure is changed from the original code by doing the following:

  • step is based on count (start of the previous code is always assumed as 0), but it's divided by the whole number of slots (not subtracted by 1), because we need to fill "slopes", not know about the gaps;
  • the returned list is initialized with the first "step";
  • that first step is also kept as a "previous" value for further computations;
  • we start the range by index 2 and up to slots + 1: the first step is already known, so the next has to be multiplied by 2;
  • in every iteration, we append the difference between the new computed value and the old one;
  • the new value is now the "new old" one, ready for the next computation;

This approach is not perfect, but, as said, we're not discussing every possibility: there are many ways of aliasing non-integer values, and other than choosing on a preferred solution, we should also consider efficiency.

And this approach is quite efficient.

The difference with other approaches or error corrections may be quite important for relatively small values, but for large numbers (especially large differences of "oranges/plates"), any difference is just open to debate as possibly pointless.

Note that, in general, the real issue is the modulo between oranges and plates, because evenly distributed values normally increase each "slot" just by one: the problem of proper splitting comes only whenever slots % count is not zero, but that value can never be equal or larger than count, meaning that you only need to add 1 to a certain amount of slots that is always smaller than the overall slot count, because if you added 1 to all slots, you'd get 0 from the % modulo above.

Whenever you have to consider possible large numbers and that consideration is more important that overall efficiency for possibly small amounts, you can consider to slightly improve the above by adding the following at the beginning of the function:

def distribute(count, slots):
    if not count % slots:
        base = count // slots
        return [base] * slots

    step = count / slots
    ...

A final note: I've tested the above with reasonable values (ranges of hundreds for both count and slots), but not extremely large numbers or oranges/plates differences. It is possible that under those circumstances the rounding of floating point values may give inappropriate results.
If that level of accuracy is still required for such extreme values, you should consider adding some error checking, possibly by keeping a local count of the whole sum.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.