Fastest Algorithm to Find the Minimum Sum of Absolute Differences through List Rotation

Question

By rotating 2 lists either from left to right, Find the smallest possible sum of the absolute value of the differences between each corresponding item in the two lists given they're the same length.

Rotation Sample:

List [0, 1, 2, 3, 4, 5] rotated to the left = [1, 2, 3, 4, 5, 0]

List [0, 1, 2, 3, 4, 5] rotated to the right= [5, 0, 1, 2, 3, 4]

Sum of Absolute Difference:

List 1 = [1, 2, 3, 4]
List 2 = [5, 6, 7, 8]

Sum of Abs. Diff. = |1-5| + |2-6| + |3-7| + |4-8| = 16

Once again, for any arbitrary length of list and integer values, task is to look for the least possible sum by simply rotating to the left/right of either or both lists.

I had no problem with the rotation and acquiring of the minimum sum of absolute difference. I just want to know the smarter approach since my algorithm checks for every possible combination which is quite slow.

Here is my bruteforce approach:

list1 = [45, 21, 64, 33, 49]
list2 = [90, 12, 77, 52, 28]
choices = []                # Put all possible sums into a list to find the minimum value.
for j in range(len(list1)):  # List1 does a full rotation
    total = 0
    for k in range(len(list1)):
        total += abs(list1[k] - list2[k])
    list1.append(list1.pop(0))
    choices.append(total)
print(min(choices))

What's a smarter approach? I would appreciate a shorter code and time complexity as well.

I managed to make it faster by applying generators. Credits to @kuriboh for the idea! But since I'm still new in generator implementation, just want to know if this is the best way of implementing it to reduce my time complexity especially for my loop. Can we still go faster than this configuration?

list1 = [45, 21, 64, 33, 49]
list2 = [90, 12, 77, 52, 28]
choices = []
l = len(list1)
for j in range(l):
    total = sum([abs(int(list1[k])-int(list2[k])) for k in range(l)])
    list1.append(list1.pop(0))
    choices.append(total)
print(min(choices))

Hint: use the differences bewteen elements within a list. For instance, your list1 above has diffs of [-24, 43, -31, 16, -4]. Look for "fit" patterns with the list2 diffs. Note that the improvement works only if you have very large lists; otherwise, the overhead of pattern matching eats up the gains over your O(N^2) algorithm. — Prune
– Prune, Commented Oct 20, 2020 at 0:39
@Prune I'm not quite sure if I completely understood your comment, why are we getting the differences of adjacent integers in list1? — lambduh
– lambduh, Commented Oct 20, 2020 at 0:53
Did you create this problem yourself, or why can't you tell us where it's from? — superb rain
– superb rain, Commented Oct 20, 2020 at 0:55
@Prune I don't see how to take advantage of that, would like to see it :-) — superb rain
– superb rain, Commented Oct 20, 2020 at 1:20
@muw My list2 is just a permutation of your list. When using your list2, the best rotation is [52, 28, 90, 12, 77], which gives a diff list of [-24, 62, -78, 65, -25]. That diff list makes Prune's idea seem promising. My goal was to permute list2 in a way that creates a more challenging test case. — user3386109
– user3386109, Commented Oct 20, 2020 at 2:05

EvilTak · Accepted Answer · 2020-10-20 02:20:06Z

3

Since Python treats negative indexes as counting from the right end, you could sum the absolute value of list1 minus (list2 shifted by k) where 0 ≤ k < len(list1) as

sum(abs(list1[i] - list2[i - k]) for i in range(len(list1)))

If you want the minimum of all of these values

length = len(list1)
min(sum(abs(list1[i] - list2[i - k]) for i in range(length))
    for k in range(length))

This code is still O(n^2), but there is a lot less pushing and popping going on.

I really can't think of any way to make the algorithm faster than O(n^2).

edited Oct 20, 2020 at 2:20

EvilTak

7,90233 silver badges39 bronze badges

answered Oct 20, 2020 at 0:55

Frank Yellin

11.7k1 gold badge15 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

18 Comments

superb rain Over a year ago

Minus is done with - :-P. And I'm pretty sure those index subtractions cost much more time than the pop+append.

EvilTak Over a year ago

@superbrain the index operations definitely do not take more time than the pop and append. The pop(0) call is O(n), as it is removing the first element of the list.

superb rain Over a year ago

@EvilTak They definitely do. Overall, pop is done only n times, while the index operations are done n^2 times. So they're doing the same "amount of work", but the pops do so at a far faster lower level.

EvilTak Over a year ago

@superbrain normally I would disagree with you, but given that this is Python, there is a chance that you are right. Still find it hard to believe that the overhead incurred by n extra subtractions outweighs the cost of the memcpy for pop. I'd definitely recommend that OP profile both solutions and pick which one works best for them. Regardless, this answer does do away with the redundant choices list, appending to which at every iteration would incur quite a bit of overhead.

EvilTak Over a year ago

@user3386109 I was hoping you saw that! The edit summary abuse was totally worth it. You wouldn't believe my euphoria at being able to reply to you via another potential edit after seeing your edit summary...

|

superb rain · Accepted Answer · 2020-10-20 18:22:33Z

3

An optimized mix of your original and Frank's accepted answer:

min(list1.append(list1.pop(0)) or
    sum(abs(x - y) for x, y in zip(list1, list2))
    for _ in list1)

Bit dirty to have the rotation in there like that, but hey, you're asking for "Fastest" :-)

Benchmark with lists of length 1000:

    original     Frank_Yellin   superb_rain  
     127 ms         164 ms         125 ms    
     140 ms         170 ms         117 ms    
     134 ms         166 ms         116 ms    
     124 ms         161 ms         126 ms    
     135 ms         164 ms         126 ms

Benchmark code:

from timeit import repeat
from random import shuffle

def original(list1, list2):
    choices = []                # Put all possible sums into a list to find the minimum value.
    for j in range(len(list1)):  # List1 does a full rotation
        total = 0
        for k in range(len(list1)):
            total += abs(list1[k] - list2[k])
        list1.append(list1.pop(0))
        choices.append(total)
    return min(choices)

def Frank_Yellin(list1, list2):
    length = len(list1)
    return min(sum(abs(list1[i] - list2[i - k]) for i in range(length))
    for k in range(length))

def superb_rain(list1, list2):
    return min(list1.append(list1.pop(0)) or
               sum(abs(x - y) for x, y in zip(list1, list2))
               for _ in list1)

funcs = [
    (10, original),
    (10, Frank_Yellin),
    (10, superb_rain),
    ]

list1 = list(range(1000))
list2 = list1.copy()
shuffle(list2)

for _, f in funcs:
    print(f(list1, list2))

for _, f in funcs:
    print(f.__name__.center(15), end='')
print()

for _ in range(5):
    for number, f in funcs:
        t = min(repeat(lambda: f(list1, list2), number=number)) / number
        print('%8d ms    ' % (t * 1e3), end='')
    print()

edited Oct 20, 2020 at 18:22

answered Oct 20, 2020 at 17:36

superb rain

5,5612 gold badges16 silver badges25 bronze badges

4 Comments

lambduh Over a year ago

Love this. I thought my code was optimized. I kinda missed the fact that I'm typecasting the lists into integer EVERY iteration. What you did is better. Well done!

superb rain Over a year ago

@muw Ha, I didn't even realize you did that. What was the point of that? Btw, I think David's answer is the most promising and only interesting one.

lambduh Over a year ago

I honestly don't understand his process yet, but if you think his idea is better than yours, then I'll pick his answer.

superb rain Over a year ago

@muw I don't understand his solution, either, but the idea is good and the tests convince me and benchmarks show it's very fast (I'll post another answer about that).

David Eisenstat · Accepted Answer · 2020-10-20 20:13:43Z

2

I haven't cracked the full problem, but in the special case where the input values are all 0 or 1 (or any two different values, or any of O(1) different values, but we'll need another idea to get much further than that), we can get an O(n log n)-time algorithm by applying fast convolution.

The idea is to compute all of the sums of absolute differences as List1 * reverse(1 - List2) + (1 - List1) * reverse(List2) where 1 - List means doing that operation point-wise and * denotes circular convolution (computable in time O(n log n) using a pair of FFTs). The definition of circular convolution here is

             n-1
             __
             \
(f * g)(i) = /_  f(j) g((i - j) mod n).
             j=0

Substituting List1 for f and reverse(1 - List2) for g, we get

                                  n-1
                                  __
                                  \
(List1 * reverse(1 - List2))(i) = /_ List1(j) (1 - List2((n-1-(i-j)) mod n))
                                  j=0

                                  n-1
                                  __
                                  \
                                = /_ List1(j) (1 - List2((j-(i+1)) mod n)).
                                  j=0

The product List1(j) (1 - List2((j-(i+1)) mod n)) is 1 if and only if List1(j) = 1 and List2((j-(i+1)) mod n) = 0, and 0 otherwise. Thus the i value of the convolution counts the number of places where List1 has a 1 offset i+1 circularly to the left of where List2 has a 0. The other convolution counts 0s corresponding to 1s. Given our input restrictions, this is the sum of absolute differences.

Code:

import numpy


def convolve_circularly(a1, a2):
    return numpy.round(numpy.abs(numpy.fft.ifft(numpy.fft.fft(a1) * numpy.fft.fft(a2))))


def min_sum_abs_diff(a1, a2):
    a1 = numpy.array(a1)
    a2 = numpy.array(a2)[::-1]
    return numpy.min(convolve_circularly(a1, 1 - a2) + convolve_circularly(1 - a1, a2))


def slow_min_sum_abs_diff(a1, a2):
    return min(
        sum(abs(a1[i] - a2[i - k]) for i in range(len(a1))) for k in range(len(a2))
    )


def main():
    n = 100
    for r in range(100000):
        a1 = numpy.random.randint(2, size=n)
        a2 = numpy.random.randint(2, size=n)
        r = min_sum_abs_diff(a1, a2)
        slow_r = slow_min_sum_abs_diff(a1, a2)
        if r != slow_r:
            print(a1, a2, r, slow_r)
            break


if __name__ == "__main__":
    main()

edited Oct 20, 2020 at 20:13

answered Oct 20, 2020 at 12:13

David Eisenstat

65.8k7 gold badges69 silver badges129 bronze badges

3 Comments

superb rain Over a year ago

Good idea to try a simpler problem. Sadly I don't understand the math. Would the Python implementation be simple? Not sure what to do with your equations. Would be nice to see it, and then it could be tested.

David Eisenstat Over a year ago

@superbrain I can try to hack one together using NumPy in a while.

superb rain Over a year ago

Great, thanks. Looks cool and is indeed very fast (I did some benchmarks in a new answer).

superb rain · Accepted Answer · 2020-10-21 15:44:36Z

Benchmarks with David Eisenstat's solution and two NumPy solutions from me.

500 random ints 0 or 1:

  189.62414 ms  slow_min_sum_abs_diff          # Like Frank's
   49.75403 ms  less_slow_min_sum_abs_diff     # My NumPy
   10.13092 ms  lesser_slow_min_sum_abs_diff   # My Numpy
    0.85030 ms  min_sum_abs_diff               # David's
    0.27434 ms  array_conversion               # for comparison

1000 random ints 0 or 1:

  857.02381 ms  slow_min_sum_abs_diff
  100.26820 ms  less_slow_min_sum_abs_diff
   28.55692 ms  lesser_slow_min_sum_abs_diff
    1.67077 ms  min_sum_abs_diff
    0.49301 ms  array_conversion

1000 random ints from -10⁶ to 10⁶ (without David's, as it's not made for that and would produce wrong results):

  829.18451 ms  slow_min_sum_abs_diff
   89.97418 ms  less_slow_min_sum_abs_diff
   22.69516 ms  lesser_slow_min_sum_abs_diff

I don't understand David's solution, but his own verification is convincing and I did some more myself, comparing each faster solution with the next-slower one (that's actually why I wrote my NumPy solutions, so I could test David's with larger inputs):

passed: less_slow_min_sum_abs_diff (220 tests with n=100)
passed: lesser_slow_min_sum_abs_diff (89 tests with n=300)
passed: min_sum_abs_diff (146 tests with n=1000)
passed: min_sum_abs_diff (5 tests with n=10000)

Benchmarks done on repl.it's Python 3.8.2 64-bit.

The code:

import numpy
from timeit import repeat, default_timer as timer


def array_conversion(a1, a2):
    a1 = numpy.array(a1)
    a2 = numpy.array(a2)


def convolve_circularly(a1, a2):
    return numpy.round(numpy.abs(numpy.fft.ifft(numpy.fft.fft(a1) * numpy.fft.fft(a2))))


def min_sum_abs_diff(a1, a2):
    a1 = numpy.array(a1)
    a2 = numpy.array(a2)[::-1]
    return numpy.min(convolve_circularly(a1, 1 - a2) + convolve_circularly(1 - a1, a2))


def slow_min_sum_abs_diff(a1, a2):
    return min(
        sum(abs(a1[i] - a2[i - k]) for i in range(len(a1))) for k in range(len(a2))
    )


def less_slow_min_sum_abs_diff(a1, a2):
    a1 = numpy.array(a1)
    a2 = numpy.array(a2)
    return min(
        numpy.abs(a1 - numpy.roll(a2, k)).sum()
        for k in range(len(a2))
    )


def lesser_slow_min_sum_abs_diff(a1, a2):
    n = len(a2)
    a1 = numpy.array(a1)
    a2 = numpy.concatenate((a2, a2))
    return min(
        numpy.abs(a1 - a2[k:k+n]).sum()
        for k in range(n)
    )


def random_arrays(n):
    a1 = numpy.random.randint(2, size=n).tolist()
    a2 = numpy.random.randint(2, size=n).tolist()
    return a1, a2


def verify(candidate, reference, n, timelimit=1):
    t0 = timer()
    count = 0
    while timer() - t0 < timelimit:
        a1_orig, a2_orig = random_arrays(n)
        a1, a2 = a1_orig.copy(), a2_orig.copy()
        expect = reference(a1, a2)
        assert a1 == a1_orig and a2 == a2_orig
        result = candidate(a1, a2)
        assert a1 == a1_orig and a2 == a2_orig
        if result != expect:
            print('wrong:')
            print('  expected', expect, 'by', reference.__name__)
            print('  got', result, 'by', candidate.__name__)
            print('  a1:', a1)
            print('  a2:', a2)
            break
        count += 1
    else:
        print('passed:', candidate.__name__, f'({count} tests with {n=})')


def main():
    if 1:
        verify(less_slow_min_sum_abs_diff, slow_min_sum_abs_diff, 100)
        verify(lesser_slow_min_sum_abs_diff, less_slow_min_sum_abs_diff, 300)
        verify(min_sum_abs_diff, lesser_slow_min_sum_abs_diff, 1000)
        verify(min_sum_abs_diff, lesser_slow_min_sum_abs_diff, 10000)
        print()

    funcs = [
        (10, slow_min_sum_abs_diff),
        (100, less_slow_min_sum_abs_diff),
        (100, lesser_slow_min_sum_abs_diff),
        (1000, min_sum_abs_diff),
        (1000, array_conversion),
    ]

    a1, a2 = random_arrays(1000)

    for _ in range(3):
        for number, func in funcs:
            t = min(repeat(lambda: func(a1, a2), number=number)) / number
            print('%11.5f ms ' % (t * 1e3), func.__name__)
        print()


if __name__ == "__main__":
    main()
    print('done')

Collectives™ on Stack Overflow

Fastest Algorithm to Find the Minimum Sum of Absolute Differences through List Rotation

4 Answers 4

18 Comments

4 Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

18 Comments

4 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related