10

I have a 100 by 100 numpy matrix. The matrix is mostly filled with zeros, but also contains some number of ints. For example:

[0 0 0 0 0 0 0 1]
[0 2 2 0 0 0 0 0]
[0 0 2 0 0 0 0 0]  False
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

What is the most efficient way to identify whether the matrix contains any number of adjacent ints of distinct types?

The above example would return False. Here is a True example, with the row containing the adjacency indicated:

[0 0 0 0 0 0 0 1]
[0 2 2 1 1 0 0 0]   <----  True
[0 0 2 0 0 0 0 0]  
[0 0 0 0 0 0 0 0]
[0 3 3 0 0 0 0 0]

Diagonals do not count as being adjacent. So this example would also return False:

[0 0 0 1 1 1 1 1]
[0 2 2 0 1 0 0 0]
[0 0 2 0 0 0 0 0]   False
[0 3 0 0 0 0 0 0]
[3 3 3 0 0 0 0 0]

I do not need to identify the position of the adjacency, just whether it exists of not.

Currently, I cannot do better than finding each no-zero element in the matrix and then checking its 4 flanking elements.

Thanks for all the great answers.

3
  • 1
    Without posting a full answer, numpy.diff with an axis parameter is where I would start. Commented Dec 19, 2016 at 2:26
  • Thanks, that should reduce the search area significantly. Commented Dec 19, 2016 at 2:35
  • Could you add the code that does check each element. That could remove some lingering ambiguity about the test. Commented Dec 19, 2016 at 4:37

5 Answers 5

6

If you can use this will be quite easy using ndimage.label and ndimage.labeled_comprehension:

import numpy as np
from scipy import ndimage

def multiple_unique_item(array):
    return len(np.unique(array)) > 1

def adjacent_diff(array):
    labeled_array, num_labels = ndimage.label(array)
    labels = np.arange(1, num_labels+1)
    any_multiple = ndimage.labeled_comprehension(array, labeled_array, labels, 
                                                 multiple_unique_item, bool, 0)
    return any_multiple.any()

label defaults to label neighboring values different from 0 without diagonals. Then the comprehension passes all values associated with a label to the helper function - which checks if more than one unique value is present. Finally it checks if any label had more than one value and returns this.

To test this on your test input arrays:

arr1 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,1,1,0,0,0],  
                 [0,0,2,0,0,0,0,0],
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

arr2 = np.array([[0,0,0,1,1,1,1,1],
                 [0,2,2,0,1,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,3,0,0,0,0,0,0],
                 [3,3,3,0,0,0,0,0]])

arr3 = np.array([[0,0,0,0,0,0,0,1],
                 [0,2,2,0,0,0,0,0],
                 [0,0,2,0,0,0,0,0],  
                 [0,0,0,0,0,0,0,0],
                 [0,3,3,0,0,0,0,0]])

>>> adjacent_diff(arr1)
True
>>> adjacent_diff(arr2)
False
>>> adjacent_diff(arr3)
False
Sign up to request clarification or add additional context in comments.

Comments

3

Looking at the description of your problem, it's probably not too much computational effort to check each possible nonzero integer value for its position in the array, and see if there are intersections. Now, this is generally overkill, but on your scale it might work: you can get the indices of each integer collection, and compute their distances using scipy.spatial.distance.cdist. I'm sure that some smart solution based on diff or something else is more efficient, but I had fun anyway:

import numpy as np
from scipy.spatial.distance import cdist
from itertools import combinations

M1 = np.array(
[[0,0,0,0,0,0,0,1],
 [0,2,2,1,1,0,0,0],  
 [0,0,2,0,0,0,0,0],
 [0,0,0,0,0,0,0,0],
 [0,3,3,0,0,0,0,0]])

M2 = np.array(
[[0,0,0,1,1,1,1,1],
 [0,2,2,0,1,0,0,0],
 [0,0,2,0,0,0,0,0],  
 [0,3,0,0,0,0,0,0],
 [3,3,3,0,0,0,0,0]])

def overlaps_eh(M):
    uniques = np.delete(np.unique(M),0) # get integers present
    unival_inds = [np.transpose(np.where(M==unival)) for unival in uniques]
    # unival_inds[k] contains the i,j indices of each element with the kth unique value

    for i1,i2 in combinations(range(len(unival_inds)),2):
        if np.any(cdist(unival_inds[i1],unival_inds[i2],'cityblock')==1):
            return True
    # if we're here: no adjacencies
    return False

First we collect the nonzero unique matrix elements into the array uniques. Then for each unique value we find the i,j indices of each element in the input array that has this value. Then we check each pair of unique values (generated using itertools.combinations), and use scipy.spatial.distance.cdist to measure the pair-wise distance of each pair of matrix elements. Using the Manhattan distance, if any pair of elements has distance 1, they are adjacent. So we only have to return True in case any of these distances is 1, otherwise we return False.

Comments

3

Here's an approach making heavy usage of slicing that are just views with focus on performance -

def distinct_ints(a):
    # Mask of zeros, non-zeros as we would use them frequently
    zm = a==0
    nzm = ~zm

    # Look for distint ints across rows
    row_thresh = (nzm[:,1:] & zm[:,:-1]).sum(1)
    row_out = ((nzm[:,1:] & (a[:,1:] != a[:,:-1])).sum(1)>row_thresh).any()

    # Look for distint ints across cols
    col_thresh = (nzm[1:] & zm[:-1]).sum(0)
    col_out = ((nzm[1:] & (a[1:] != a[:-1])).sum(0)>col_thresh).any()

    # Any from rows or cols
    out = row_out | col_out
    return out

Comments

2

Here's a solution using a masked array:

import numpy as np
import numpy.ma as ma
a = np.array([[0,1,0], [0,1,0], [2,2,2]])    # sample data 
x = ma.masked_equal(a, 0)                    # mask zeros
adjacencies = np.count_nonzero(np.diff(x, axis=0).filled(0)) + np.count_nonzero(np.diff(x, axis=1).filled(0))

On the last line, diff is applied to the masked array (zero entries ignored); the nonzero entries in diff mean adjacent different nonzero entries in array a. The variable adjacencies will have the total number of adjacencies (perhaps you only want to know if it's 0 or not). In the above example, it is 1.

Comments

1

It is possible to do it with numpy.diff, however, the fact that the zeros are not supposed to be considered complicates things a bit.

You can set zeros to a value large or small enough not to cause issues:

a[a == 0] = -999

Or use a float array and set them to nan or inf:

a[a == 0] = numpy.nan

Then simply look for first order differences of 1 in each direction:

numpy.any(numpy.abs(numpy.diff(a, axis=0)) == 1) or numpy.any(numpy.abs(numpy.diff(a, axis=1)) == 1)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.