Skip to content

Conversation

@mrocklin
Copy link
Member

@mrocklin mrocklin commented Feb 3, 2015

OK, so we do a dual approach to achieve fancy indexing.

Given an index, like

(5, slice(5, 10, 2), [1, 2, 3])

We first do the normal dask_slice solution on the array with the slice replaced with an empty list

(5, slice(5, 10, 2), slice(None, None, None))

We then follow with the final list list. I suspect that we could repeat these for multiple lists and achieve Matlab style orthogonal indexing.

It mostly works

Example

In [1]: from blaze import Data, compute, into

In [2]: import dask.array as da

In [3]: import numpy as np

In [4]: x = np.arange(100).reshape((10, 10))

In [5]: d = Data(into(da.Array, x, blockshape=(3, 3)))

In [6]: np.array(d[5:9, 1:9:2])  # could do this before
Out[6]: 
array([[51, 53, 55, 57],
       [61, 63, 65, 67],
       [71, 73, 75, 77],
       [81, 83, 85, 87]])

In [7]: np.array(d[0:3, [1, 3, 8, 3]])  # Now can do this
Out[7]: 
array([[ 1,  3,  8,  3],
       [11, 13, 18, 13],
       [21, 23, 28, 23]])

The actual dask looks like the following

In [8]: y = compute(d[0:3, [1, 3, 8, 3]])

In [14]: cull(y.dask, y.keys())
Out[14]: 
{'x_1': array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
        [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
        [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
        [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
        [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
        [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
        [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
        [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
        [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]]),
 ('slice-2', 0, 0): (<function operator.getitem>,
  ('x_1', 0, 0),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('slice-2', 0, 1): (<function operator.getitem>,
  ('x_1', 0, 1),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('slice-2', 0, 2): (<function operator.getitem>,
  ('x_1', 0, 2),
  (slice(0, 3, 1), slice(0, 3, 1))),
 ('x_1', 0, 0): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(0, 3, None))),
 ('x_1', 0, 1): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(3, 6, None))),
 ('x_1', 0, 2): (<function operator.getitem>,
  'x_1',
  (slice(0, 3, None), slice(6, 9, None))),
 ('x_4', 0, 0): (<function operator.getitem>,
  (<function numpy.core.multiarray.concatenate>,
   (list,
    [(<function operator.getitem>,
      ('slice-2', 0, 0),
      (slice(None, None, None), [1])),
     (<function operator.getitem>,
      ('slice-2', 0, 1),
      (slice(None, None, None), [0, 0])),
     (<function operator.getitem>,
      ('slice-2', 0, 2),
      (slice(None, None, None), [2]))]),
   1),
  (slice(None, None, None), (0, 1, 3, 1)))}

Some known problems

  • d[0] fails
  • Multiple lists fail (though I think that this is probably easy to fix in the Matlab style)
  • edge cases may fail

cc @nevermindewe @shoyer

@mrocklin
Copy link
Member Author

mrocklin commented Feb 3, 2015

Fixes #22

@mrocklin
Copy link
Member Author

mrocklin commented Feb 3, 2015

Also @shoyer, this brings us to the point where dask.array can successfully perform the following

In [19]: np.array(d[[5, 3, 0]].sum(axis=0))
Out[19]: array([ 80,  83,  86,  89,  92,  95,  98, 101, 104])

Which, I think, is likely sufficient for your common use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you really want to explicitly restrict array indexing to lists?

Assuming numpy is a hard dep of dask (which I think it is?) I would rather cast to ndarray for non integer/slices and then allow only 1d arrays of integers. For large arrays, using lists is going to be a bottleneck.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do both. I was just at about my limit for complexity while I was building this and didn't want to think about other cases. Both of those sound good though.

@shoyer
Copy link
Member

shoyer commented Feb 3, 2015

Handling 1D boolean arrays is also pretty easy -- you can just convert them into integer arrays with np.nonzero.

@mrocklin
Copy link
Member Author

mrocklin commented Feb 4, 2015

I've handled the edge cases (I think). Merging.

This doesn't yet handle multi-list nor things like slicing with arrays.

mrocklin added a commit that referenced this pull request Feb 4, 2015
@mrocklin mrocklin merged commit 7107fc7 into dask:master Feb 4, 2015
@mrocklin mrocklin deleted the more-slicing branch February 4, 2015 00:49
mrocklin pushed a commit to mrocklin/dask that referenced this pull request Mar 28, 2019
phofl pushed a commit to phofl/dask that referenced this pull request Dec 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants