14

I have a numpy array like the following:

Xtrain = np.array([[1, 2, 3],
                   [4, 5, 6],
                   [1, 7, 3]])

I want to shuffle the items of each row separately, but do not want the shuffle to be the same for each row (as in several examples just shuffle column order).

For example, I want an output like the following:

output = np.array([[3, 2, 1],
                   [4, 6, 5],
                   [7, 3, 1]])

How can I randomly shuffle each of the rows randomly in an efficient way? My actual np array is over 100000 rows and 1000 columns.

3
  • 1
    @Kasramvd According to np documentation, Multi-dimensional arrays are only shuffled along the first axis: >>> >>> arr = np.arange(9).reshape((3, 3)) >>> np.random.shuffle(arr) >>> arr array([[3, 4, 5], [6, 7, 8], [0, 1, 2]]) Commented May 27, 2018 at 16:39
  • 1
    Yes suffle() doesn't accept axis argument. Here is a similar question tho stackoverflow.com/questions/50415972/… Commented May 27, 2018 at 16:41
  • @Kasramvd If I understand it, this question wants to shuffle row order, not the actual values in the rows. Commented May 27, 2018 at 16:47

7 Answers 7

7

Since you want to only shuffle the columns you can just perform the shuffling on transposed of your matrix:

In [86]: np.random.shuffle(Xtrain.T)

In [87]: Xtrain
Out[87]: 
array([[2, 3, 1],
       [5, 6, 4],
       [7, 3, 1]])

Note that random.suffle() on a 2D array shuffles the rows not items in each rows. i.e. changes the position of the rows. Therefor if your change the position of the transposed matrix rows you're actually shuffling the columns of your original array.

If you still want a completely independent shuffle you can create random indexes for each row and then create the final array with a simple indexing:

In [172]: def crazyshuffle(arr):
     ...:     x, y = arr.shape
     ...:     rows = np.indices((x,y))[0]
     ...:     cols = [np.random.permutation(y) for _ in range(x)]
     ...:     return arr[rows, cols]
     ...: 

Demo:

In [173]: crazyshuffle(Xtrain)
Out[173]: 
array([[1, 3, 2],
       [6, 5, 4],
       [7, 3, 1]])

In [174]: crazyshuffle(Xtrain)
Out[174]: 
array([[2, 3, 1],
       [4, 6, 5],
       [1, 3, 7]])
Sign up to request clarification or add additional context in comments.

Comments

3

From: https://github.com/numpy/numpy/issues/5173

def disarrange(a, axis=-1):
    """
    Shuffle `a` in-place along the given axis.

    Apply numpy.random.shuffle to the given axis of `a`.
    Each one-dimensional slice is shuffled independently.
    """
    b = a.swapaxes(axis, -1)
    # Shuffle `b` in-place along the last axis.  `b` is a view of `a`,
    # so `a` is shuffled in place, too.
    shp = b.shape[:-1]
    for ndx in np.ndindex(shp):
        np.random.shuffle(b[ndx])
    return

Comments

3

This solution is not efficient by any means, but I had fun thinking about it, so wrote it down. Basically, you ravel the array, and create an array of row labels, and an array of indices. You shuffle the index array, and index the original and row label arrays with that. Then you apply a stable argsort to the row labels to gather the data into rows. Apply that index and reshape and viola, data shuffled independently by rows:

import numpy as np

r, c = 3, 4  # x.shape

x = np.arange(12) + 1  # Already raveled 
inds = np.arange(x.size)
rows = np.repeat(np.arange(r).reshape(-1, 1), c, axis=1).ravel()

np.random.shuffle(inds)
x = x[inds]
rows = rows[inds]

inds = np.argsort(rows, kind='mergesort')
x = x[inds].reshape(r, c)

Here is an IDEOne Link

Comments

2

We can create a random 2-dimensional matrix, sort it by each row, and then use the index matrix given by argsort to reorder the target matrix.

target = np.random.randint(10, size=(5, 5))
# [[7 4 0 2 5]
# [5 6 4 8 7]
# [6 4 7 9 5]
# [8 6 6 2 8]
# [8 1 6 7 3]]

shuffle_helper = np.argsort(np.random.rand(5,5), axis=1)
# [[0 4 3 2 1]
# [4 2 1 3 0]
# [1 2 3 4 0]
# [1 2 4 3 0]
# [1 2 3 0 4]]

target[np.arange(shuffle_helper.shape[0])[:, None], shuffle_helper]
# array([[7, 5, 2, 0, 4],
#       [7, 4, 6, 8, 5],
#       [4, 7, 9, 5, 6],
#       [6, 6, 8, 2, 8],
#       [1, 6, 7, 8, 3]])

Explanation

  • We use np.random.rand and argsort to mimic the effect from shuffling.
  • random.rand gives randomness.
  • Then, we use argsort with axis=1 to help rank each row. This creates the index that can be used for reordering.

4 Comments

How would that be better than sorting the rows of the original directly?
@MadPhysicist Sorting the original directly would lead to one same result and leave no randomness.
I just got it. You're using the fact that argsort has an axis argument to compensate for shuffle's lack. Clever
@MadPhysicist exactly! Thanks for taking time to understand this thought.
1

Lets say you have array a with shape 100000 x 1000.

b = np.random.choice(100000 * 1000, (100000, 1000), replace=False)
ind = np.argsort(b, axis=1)
a_shuffled = a[np.arange(100000)[:,np.newaxis], ind]

I don't know if this is faster than loop, because it needs sorting, but with this solution maybe you will invent something better, for example with np.argpartition instead of np.argsort

Comments

0

You may use Pandas:

df = pd.DataFrame(X_train)
_ = df.apply(lambda x: np.random.permutation(x), axis=1, raw=True)
df.values

Change the keyword to axis=0 if you want to shuffle columns.

Comments

0

Assuming it is 2 dimensional:

np.random.default_rng().permuted(Xtrain, axis=1)

np.random.default_rng().permuted(np.arange(9).reshape((3,3)), axis=1)
Out[6]: 
array([[0, 2, 1],
       [3, 4, 5],
       [8, 6, 7]])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.