Efficient algorithm to get all the combinations of numbers that are within a certain range from 2 lists in python

Question

Suppose I have two lists list_1 and list_2

list_1 = [1, 5, 10] list_2 = [3, 4, 15]

I want to get a list of tuples containing elements from both list_1 and list_2 such that the difference between the numbers in a tuple is under a some constant c.

E.g. suppose c is 2 then the tuples I would have would be: [(1, 3), (5, 3), (5, 4)]

Of course one can iterate over both lists and check that the difference between 2 elements is less than c, but that has a complexity of n^2 and I would rather reduce that complexity.

Since in the worst case the output will have O(n^2) pairs, I don't see how you could avoid an algorithm with that complexity, although that doesn't mean that you can't do better than the naive nested loop approach. — John Coleman
– John Coleman, Commented Oct 28, 2021 at 17:14
you can slightly improve by sorting one list and using bisect to find range of relevant elements, but, as John noted, the worst case will remain O(n^2) — Marat
– Marat, Commented Oct 28, 2021 at 17:23
I think you could do marginally better if both lists were sorted. Your example has both lists sorted, is that a guarantee on your problem or just coincidence? — AirSquid
– AirSquid, Commented Oct 28, 2021 at 17:35
Could you provide code to generate larger data that's realistic for you so we can benchmark more meaningfully? — Kelly Bundy
– Kelly Bundy, Commented Oct 29, 2021 at 19:34

JonSG · Accepted Answer · 2021-10-29 19:38:02Z

4

Here is an implementation of the idea of Marat from the comments:

import bisect

def close_pairs(list1,list2,c):
  #assumes that list2 is sorted
  for x in list1:
    i = bisect.bisect_left(list2,x-c)
    j = bisect.bisect_right(list2,x+c)
    yield from ((x,y) for y in list2[i:j])

list_1 = [1, 5, 10]
list_2 = [3, 4, 15]
print(list(close_pairs(list_1,list_2,2)))
#prints [(1, 3), (5, 3), (5, 4)]

To demonstrate the potential improvement of this strategy over what might be thought of as the "naive" approach, let's timeit.

import timeit

setup_naive = '''
import numpy
list_a = numpy.random.randint(0, 2500, 500).tolist()
list_b = numpy.random.randint(0, 2500, 500).tolist()
c = 2
def close_pairs(list_a, list_b, c):
    yield from ((x,y) for x in list_a for y in list_b if abs(x-y) <= c)
'''

setup_john_coleman = '''
import bisect
import numpy
list_a = numpy.random.randint(0, 2500, 500).tolist()
list_b = numpy.random.randint(0, 2500, 500).tolist()
c = 2
def close_pairs(list_a, list_b, c):
    list_a = sorted(list_a)
    list_b = sorted(list_b)
    for x in list_a:
        i = bisect.bisect_left(list_b,x-c)
        j = bisect.bisect_right(list_b,x+c)
        yield from ((x,y) for y in list_b[i:j])
'''

print(f"john_coleman: {timeit.timeit('list(close_pairs(list_a, list_b, c))', setup=setup_john_coleman, number=1000):.2f}")
print(f"naive: {timeit.timeit('list(close_pairs(list_a, list_b, c))', setup=setup_naive, number=1000):.2f}")

On a handy laptop that gives result like:

john_coleman: 0.50
naive: 18.35

edited Oct 29, 2021 at 19:38

JonSG

13.6k2 gold badges32 silver badges49 bronze badges

answered Oct 28, 2021 at 17:34

John Coleman

52.1k7 gold badges59 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

JonSG Over a year ago

Just a heads up that with some simple test data on my system, this is an order of magnitude faster than the naive approach even after including the required sort. I can add the timeit code to this answer if you like.

John Coleman Over a year ago

@JonSG You could add it here if you want, or have a separate answer. A separate answer could be helpful since it could be extended if someone else were to post an alternative, perhaps one that is able to exploit both lists being sorted.

knowledge_seeker Over a year ago

@JonSG Yes please, it would be very useful to see a comparison of both.

Kelly Bundy · Accepted Answer · 2021-10-29 07:23:15Z

If the lists are sorted as your example suggests, then remove the sorting and then this has runtime complexity O(M+N+P) where M and N are the list sizes and P is the number of close pairs. It keeps an index i so that ys[i] is the smallest y-value not too small, and then walks over ys[i:...] as long as they're not too large, yielding each pair.

def close_pairs(xs, ys, c):
    xs = sorted(xs)
    ys = sorted(ys) + [float('inf')]
    i = 0
    for x in xs:
        while x - ys[i] > c:
            i += 1
        j = i
        while ys[j] - x <= c:
            yield x, ys[j]
            j += 1

Benchmark results with lists/ranges 1000 times larger than your example:

 904.4 ms  close_pairs_naive
   4.9 ms  close_pairs_John_Coleman
   1.8 ms  close_pairs_Kelly_Bundy

Benchmark code:

from timeit import timeit
import random
import bisect
from collections import deque

def close_pairs_naive(list_a, list_b, c):
    yield from ((x,y) for x in list_a for y in list_b if abs(x-y) <= c)

def close_pairs_John_Coleman(list_a, list_b, c):
    list_a = sorted(list_a)
    list_b = sorted(list_b)
    for x in list_a:
        i = bisect.bisect_left(list_b,x-c)
        j = bisect.bisect_right(list_b,x+c)
        yield from ((x,y) for y in list_b[i:j])

def close_pairs_Kelly_Bundy(xs, ys, c):
    xs = sorted(xs)
    ys = sorted(ys) + [float('inf')]
    i = 0
    for x in xs:
        while x - ys[i] > c:
            i += 1
        j = i
        while ys[j] - x <= c:
            yield x, ys[j]
            j += 1

funcs = [
    close_pairs_naive,
    close_pairs_John_Coleman,
    close_pairs_Kelly_Bundy,
]

xs = random.choices(range(15000), k=3000)
ys = random.choices(range(15000), k=3000)
c = 2
args = xs, ys, c

expect = sorted(funcs[0](*args))
for func in funcs:
    result = sorted(func(*args))
    print(result == expect, func.__name__, len(result))
print()

for _ in range(3):
    for func in funcs:
        t = timeit(lambda: deque(func(*args), 0), number=1)
        print('%6.1f ms ' % (t * 1e3), func.__name__)
    print()

Collectives™ on Stack Overflow

Efficient algorithm to get all the combinations of numbers that are within a certain range from 2 lists in python

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related