Skip to content

Can't solve torch.lstsq() with specific values #44378

@xiaosu-zhu

Description

@xiaosu-zhu

🐛 Bug

When using torch.lstsq to solve with specific values, torch.lstsq produces wrong result (on cuda) or raises error (on cpu) while scipy.linalg.lstsq not.

To Reproduce

Steps to reproduce the behavior:

  1. Use the dumped .npy file: a.npy, b.npy to perform torch.lstsq
  2. Meanwhile, lstsq can also be solved in scipy.linalg.lstsq
  3. With the given numpy array, torch.lstsq raises error on cpu, or gives wrong result on cuda, while scipy.linalg.lstsq not.

Code samples and error messages:

Download files:

import numpy as np
import scipy.linalg
import torch

# sparse 0-1 matrix with shape [100000, 1024]
# each row contains exactly 4 ones and other zeros
a = np.load("a.npy").astype(np.float32)
# int matrix with shape [100000, 128]
b = np.load("b.npy").astype(np.float32)

x, _, _, _ = scipy.linalg.lstsq(a, b)

# 67480.16036171981
print(((a @ x - b) ** 2).sum(-1).mean())

b = torch.from_numpy(b).cuda()
a = torch.from_numpy(a).cuda()
x, _ = torch.lstsq(b, a)

# 7.4439e+15
print(((a @ x[:a.shape[-1]] - b) ** 2).sum(-1).mean())

# raise RuntimeError: Lapack Error in gels : 
# The 259-th diagonal element of the triangular factor of A is zero 
# at /opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/TH/generic/THTensorLapack.cpp:177
x, _ = torch.lstsq(b.cpu(), a.cpu())

Raise RuntimeError: Lapack Error in gels : The 259-th diagonal element of the triangular factor of A is zero at /opt/conda/conda-bld/pytorch_1595629395347/work/aten/src/TH/generic/THTensorLapack.cpp:177 when using torch.FloatTensor.
Or an irregular result when using torch.cuda.FloatTensor.

Expected behavior

Produce the same result with scipy.linalg.lstsq.

Environment

PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.2

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-10ubuntu2) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration: 
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
GPU 7: GeForce GTX 1080 Ti

Nvidia driver version: 450.51.05
cuDNN version: Could not collect

Versions of relevant libraries:
[pip3] numpy==1.19.1
[pip3] torch==1.6.0
[pip3] torchvision==0.7.0
[conda] blas                      1.0                         mkl    defaults
[conda] cudatoolkit               10.2.89              hfd86e86_1    defaults
[conda] mkl                       2020.1                      217    defaults
[conda] mkl-service               2.3.0            py38he904b0f_0    defaults
[conda] mkl_fft                   1.1.0            py38h23d657b_0    defaults
[conda] mkl_random                1.1.1            py38h0573a6f_0    defaults
[conda] numpy                     1.19.1           py38hbc911f0_0    defaults
[conda] numpy-base                1.19.1           py38hfa32c7d_0    defaults
[conda] pytorch                   1.6.0           py3.8_cuda10.2.89_cudnn7.6.5_0    pytorch
[conda] torchvision               0.7.0                py38_cu102    pytorch

cc @ezyang @gchanan @zou3519 @bdhirsh @vishwakftw @jianyuh @nikitaved @pearu @mruberry @heitorschueroff @walterddr @rgommers @vincentqb

Metadata

Metadata

Assignees

Labels

high prioritymodule: linear algebraIssues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmulmodule: numpyRelated to numpy support, and also numpy compatibility of our operatorstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions