Skip to content

BUG: Possible inefficiency in numpy.transpose() when using .copy() on large matrices #28711

@ricardosp4

Description

@ricardosp4

Describe the issue:

Hi NumPy team,

While benchmarking matrix transposition performance, I noticed some unexpected performance spikes when using numpy.transpose() followed by .copy() on large square matrices—particularly at sizes like 8000×8000, 12000×12000, 16000×16000, and 18000×18000. The aim was to compare NumPy’s performance with alternatives such as Blosc2 under various chunking strategies.

I've attached a plot showing transposition times using only NumPy. There are noticeable performance spikes at certain matrix sizes, which might point to an inefficiency in how .copy() is handled after a transpose.

Image

We have been able to consistently reproduce this behavior on a high-performance benchmark setup using an Intel 13900K CPU. The irregular timing spikes persist across multiple runs, suggesting that this is not a one-off anomaly but a reproducible performance characteristic of NumPy’s transposition followed by .copy() on large square arrays.

Thanks for your time and all the amazing work on NumPy!

Best regards,
The Blosc team

Reproduce the code example:

import numpy as np
import time
import plotly.express as px
import pandas as pd


shapes = [(100, 100), (500, 500), (1000, 1000), (2000, 2000), (3000, 3000), (4000, 4000), (5000, 5000), (6000, 6000), (7000, 7000), (8000,8000), (9000,9000),
          (10000, 10000), (11000, 11000), (12000, 12000), (13000, 13000), (14000, 14000), (15000, 15000), (16000, 16000), (17000, 17000), (18000, 18000)]
sizes_mb = [(np.prod(size) * 8) / 2**20 for size in shapes]  # Convert to MB
results = {"numpy": []}

for size in shapes:
    arr = np.linspace(0, 1, np.prod(size)).reshape(size)

    start_time = time.perf_counter()

    np_T = np.transpose(arr).copy()

    end_time = time.perf_counter()
    time_b = end_time - start_time

    print(f"shape={size}, Performance = {time_b:.6f} s")
        
    results["numpy"].append(time_b)

df = pd.DataFrame({
    "Matrix Size (MB)": sizes_mb,
    "NumPy Time (s)": results["numpy"],
})

fig = px.line(df,
              x="Matrix Size (MB)",
              y=["NumPy Time (s)"],
              title="Performance of Matrix Transposition NumPy",
              labels={"value": "Time (s)", "variable": "Method"},
              markers=True)

fig.show()

Error message:

Python and NumPy Versions:

2.2.3

3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:24:40) [GCC 13.3.0]

Runtime Environment:

No response

Context for the issue:

The goal of the benchmarks was to compare NumPy's performance against Blosc2, under different chunking strategies. For additional context, these benchmarks and observations are part of a broader performance analysis documented on the Blosc blog (Transposing Compressed Matrices), where we explore storage-aware transposition techniques.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions