BUG: Possible inefficiency in `numpy.transpose()` when using `.copy()` on large matrices

### Describe the issue:

Hi NumPy team,

While benchmarking matrix transposition performance, I noticed some unexpected performance spikes when using `numpy.transpose()` followed by `.copy()` on large square matrices—particularly at sizes like 8000×8000, 12000×12000, 16000×16000, and 18000×18000. The aim was to compare NumPy’s performance with alternatives such as Blosc2 under various chunking strategies.

I've attached a plot showing transposition times using only NumPy. There are noticeable performance spikes at certain matrix sizes, which might point to an inefficiency in how `.copy()` is handled after a transpose.

![Image](https://github.com/user-attachments/assets/7b497c51-aec0-472d-97d1-d59c31851662)

We have been able to consistently reproduce this behavior on a high-performance benchmark setup using an Intel 13900K CPU. The irregular timing spikes persist across multiple runs, suggesting that this is not a one-off anomaly but a reproducible performance characteristic of NumPy’s transposition followed by `.copy()` on large square arrays.


Thanks for your time and all the amazing work on NumPy!

Best regards,
The Blosc team

### Reproduce the code example:

```python
import numpy as np
import time
import plotly.express as px
import pandas as pd


shapes = [(100, 100), (500, 500), (1000, 1000), (2000, 2000), (3000, 3000), (4000, 4000), (5000, 5000), (6000, 6000), (7000, 7000), (8000,8000), (9000,9000),
          (10000, 10000), (11000, 11000), (12000, 12000), (13000, 13000), (14000, 14000), (15000, 15000), (16000, 16000), (17000, 17000), (18000, 18000)]
sizes_mb = [(np.prod(size) * 8) / 2**20 for size in shapes]  # Convert to MB
results = {"numpy": []}

for size in shapes:
    arr = np.linspace(0, 1, np.prod(size)).reshape(size)

    start_time = time.perf_counter()

    np_T = np.transpose(arr).copy()

    end_time = time.perf_counter()
    time_b = end_time - start_time

    print(f"shape={size}, Performance = {time_b:.6f} s")
        
    results["numpy"].append(time_b)

df = pd.DataFrame({
    "Matrix Size (MB)": sizes_mb,
    "NumPy Time (s)": results["numpy"],
})

fig = px.line(df,
              x="Matrix Size (MB)",
              y=["NumPy Time (s)"],
              title="Performance of Matrix Transposition NumPy",
              labels={"value": "Time (s)", "variable": "Method"},
              markers=True)

fig.show()
```

### Error message:

```shell

```

### Python and NumPy Versions:

> 2.2.3

> 3.12.8 | packaged by conda-forge | (main, Dec  5 2024, 14:24:40) [GCC 13.3.0]

### Runtime Environment:

_No response_

### Context for the issue:

The goal of the benchmarks was to compare NumPy's performance against Blosc2, under different chunking strategies. For additional context, these benchmarks and observations are part of a broader performance analysis documented on the Blosc blog ([Transposing Compressed Matrices](https://www.blosc.org/posts/transpose-compressed-matrices/)), where we explore storage-aware transposition techniques.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Possible inefficiency in `numpy.transpose()` when using `.copy()` on large matrices #28711

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BUG: Possible inefficiency in numpy.transpose() when using .copy() on large matrices #28711

Description

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

BUG: Possible inefficiency in `numpy.transpose()` when using `.copy()` on large matrices #28711