Implements `torch.linalg.lstsq` #49093

nikitaved · 2020-12-09T16:07:51Z

Fixes #44378 by providing a wider range of drivers similar to what SciPy is doing.

The supported CPU drivers are gels, gelsy, gelsd, gelss.
The CUDA interface has only gels implemented but only for overdetermined systems.

The current state of this PR:

CPU interface
CUDA interface
CPU tests
CUDA tests
Memory-efficient batch-wise iteration with broadcasting which fixes Methods that solve systems of linear equations are memory-inefficient (batch-wise broadcasting) #49252
docs

dr-ci · 2020-12-09T16:30:27Z

💊 CI failures summary and remediations

As of commit 6064f79 (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

xiaosu-zhu · 2020-12-10T03:20:37Z

Is there a chance to migrate gels* from LAPACK to CUDA, without MAGMA? I'm not a pro in cpp or CUDA programming, but I 'm just curious about if there exists equivalent operations in CUDA that implement gels*.

nikitaved · 2020-12-11T14:16:39Z

@xiaosu-zhu , implementing these things from scratch is no trivial matter, and might be beyond the scope of this PR. Maybe it is possible to contact the MAGMA people and ask them whether there is a plan to support these functions in near future...

xiaosu-zhu · 2020-12-11T15:30:34Z

@xiaosu-zhu , implementing these things from scratch is no trivial matter, and might be beyond the scope of this PR. Maybe it is possible to contact the MAGMA people and ask them whether there is a plan to support these functions in near future...

Yeah, thank you @nikitaved. I have anticipated that it would be a hard work, may be leave it for the future 😃

nikitaved · 2021-01-03T15:33:30Z

This PR is ready for review.

torch/linalg/__init__.py

codecov · 2021-01-03T22:04:02Z

Codecov Report

Merging #49093 (6064f79) into master (1772e26) will increase coverage by 0.02%.
The diff coverage is 92.93%.

@@            Coverage Diff             @@
##           master   #49093      +/-   ##
==========================================
+ Coverage   77.30%   77.32%   +0.02%     
==========================================
  Files        1888     1888              
  Lines      183589   183858     +269     
==========================================
+ Hits       141925   142176     +251     
- Misses      41664    41682      +18

IvanYashchuk

The implementation looks clean to me. I left a few questions and suggestions inline.
The idea of implementing batch_iterator_with_broadcasting is great. It will be certainly useful for other "find x s.t. A x = b" functions.

Since LAPACK 3.7.0 (released in Dec 2016) there is also GETSLS driver available (mkl docs, netlib release notes). Something to consider to add in follow-up work.
A note for the future: while there are no specialized drivers in cuBLAS, cuSOLVER, MAGMA for m < n case, we can implement it ourselves using SVD u @ s⁻¹ @ vᴴ @ b.

aten/src/ATen/native/LinearAlgebraUtils.h

torch/testing/_internal/common_utils.py

IvanYashchuk · 2021-01-06T13:33:49Z

torch/linalg/__init__.py

+    ``'gelsy'`` is the fastest among the rank-revealing algorithms that also handles rank-deficient inputs.
+
+Returns:
+    (Tensor, Tensor, Tensor): a namedtuple (x, rank, s) containing:


Would it be more clear to have a namedtuple (solution, rank, singular_values)?
cc: @mruberry

I think the same way, but the inspiration was taken from SciPy.

scipy.linalg function don't return a namedtuple, so that shouldn't matter here. I'd say go for the clearer names.

torch/linalg/__init__.py

test/test_linalg.py

aten/src/ATen/native/LinearAlgebraUtils.h

mruberry · 2021-01-13T09:26:06Z

@nikitaved @IvanYashchuk ping me when you're both happy with this PR.

nikitaved · 2021-01-13T10:20:10Z

I am basically done. I am not sure whether it is actually a good idea to use pseudoinve for cases m < n, for two reasons:

It is slow.
lstsq is not differentiable, while pseudoinverse is.

nikitaved · 2021-01-13T17:48:40Z

aten/src/ATen/native/native_functions.yaml

+- func: _lstsq_helper(Tensor a, Tensor b, float cond, str? driver_name) -> (Tensor, Tensor, Tensor)
+  use_c10_dispatcher: full
+  variants: function
+  dispatch:
+    CPU: _lstsq_helper_cpu
+    CUDA: _lstsq_helper_cuda


Maybe it makes sense to remove this from native functions and use the declared/defined dispatch instead? CC @mruberry

Yep, that's a great idea.

IvanYashchuk

Everything looks very good! So I'm happy with this PR.

aten/src/ATen/native/native_functions.yaml

test/test_linalg.py

IvanYashchuk

Everything looks very good to me.
There is a minor suggestion for the tests.

I've noticed one thing we didn't discuss is broadcasting for the case when either a or b is batched while the other is not. Currently it's not allowed in this PR

RuntimeError: torch.linalg.lstsq: self.dim() must be greater or equal to b.dim() and (self.dim() - b.dim()) <= 1

NumPy's lstsq does not support batched input at all. But for example numpy.linalg.solve supports it and allows the following

b = torch.randn(5, 1)
a = torch.randn(2,1,3, 5, 5)
np.linalg.solve(a, b)

I think this should be discussed separately and then the same behaviour applied to all A x = b functions.

test/test_linalg.py

nikitaved · 2021-02-24T14:07:59Z

@IvanYashchuk , we have already linalg.solve, right? How does it handle broadcasting? Your example is fully broadcastable, and, since we mention broadcasting in the documentation, it makes sense to support the broadcasting semantics in full, so, I agree, we need to fix that.

nikitaved · 2021-02-26T12:01:27Z

@mruberry , with @IvanYashchuk we decided to create an issue to discuss the broadcasting situation for the solve-like methods. Once we agree on the interface, we could adapt it for linalg.lstsq in a follow-up PR.

mruberry · 2021-02-26T17:23:56Z

I think this PR is unblocked, and indeed, the todos could be done in separate PRs.

Awesome!

Regarding the value for cond, I think we could keep the current behavior, so that the user could use lstsq without worrying about its value. I could update the doc indicating that the default value is subject to potential changes, hence it makes sense to specify cond explicitly to guarantee the non-bc-breaking behavior.

Sounds great. Do you just want to make this tweak by adding a sentence to the description of the cond arg in the doc and resolve the conflict with torch/overrides.py and we'll merge this?

Edit: actually, the ROCm test failures might be real.

15:07:16 ERROR [9.679s]: test_fn_grad_linalg_solve_cuda_complex128 (__main__.TestGradientsCUDA)

Let's see if rebasing fixes them or try to identify if these is an issue.

mruberry · 2021-03-02T18:05:17Z

fyi I expect this will land today; internal tooling needed some help with it.

facebook-github-bot · 2021-03-03T03:01:56Z

@mruberry merged this pull request in 3ac9013.

mruberry · 2021-03-03T23:16:14Z

FYI going to revert this due to an internal build issue. No action needed, I just need to resolve it internally.

Update: an internal project had a build issue because they're consuming LAPACK functions from multiple dependencies. They are working on resolving the issue now, and will validate their fix allows us to land this change without breaking them. There's no action on our part at this time.

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

mruberry · 2021-03-12T00:55:03Z

Arg. This hit a logical merge conflict with the update from "supports_tensor_out" -> "supports_out." I updated it, but the lint build is going to fail because we now check for trailing whitespace, and there's a ton of trailing whitespace in this PR's base. @nikitaved, would you please rebase this? The internal build appears to be OK now so after a rebase we should be OK to land.

facebook-github-bot

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: Fixes pytorch#44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes pytorch#49252 - [x] docs Pull Request resolved: pytorch#49093 Reviewed By: H-Huang Differential Revision: D26723384 Pulled By: mruberry fbshipit-source-id: c9866a95f14091955cf42de22f4ac9e2da009713

Summary: Fixes pytorch#44378 by providing a wider range of drivers similar to what SciPy is doing. The supported CPU drivers are `gels, gelsy, gelsd, gelss`. The CUDA interface has only `gels` implemented but only for overdetermined systems. The current state of this PR: - [x] CPU interface - [x] CUDA interface - [x] CPU tests - [x] CUDA tests - [x] Memory-efficient batch-wise iteration with broadcasting which fixes pytorch#49252 - [x] docs Pull Request resolved: pytorch#49093 Reviewed By: albanD Differential Revision: D26991788 Pulled By: mruberry fbshipit-source-id: 8af9ada979240b255402f55210c0af1cba6a0a3c

dzhulgakov · 2021-05-27T18:46:36Z

aten/src/ATen/native/LinearAlgebraUtils.h

+
+      check_if_copy_needed_for_a(a_curr_linear_batch_idx);
+
+      auto* a_working_ptr = a_3d.select(0, a_curr_linear_batch_idx)


is it guaranteed that a_3d is contiguous? Otherwise just getting data_ptr might not be safe.

In comments it says that the input is expected to be "almost contiguous" but I can't find where it's enforced.

Btw, it might be marginally faster to create TensorAccessor once instead of doing repeated .select() (https://pytorch.org/cppdocs/notes/tensor_basics.html#efficient-access-to-tensor-elements)

Have a look at the long comment right above, it mentions that a and b are expected to be "contiguous" (wrt to the batch dimensions) and in column-major order wrt to the last two dimensions, i.e. the output of a Lapack routine is sufficient. No enforcing is done.

Tensor accessor will not work for CUDA tensors, or will it? Or we would need to write a separate kernel for that, right?

nikitaved requested a review from glaringlee as a code owner December 9, 2020 16:07

facebook-github-bot added the cla signed label Dec 9, 2020

nikitaved added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Dec 9, 2020

pytorchbot added the open source label Dec 9, 2020

nikitaved force-pushed the nikved/linalg_lstsq branch from 056961b to 927e084 Compare December 9, 2020 17:31

IvanYashchuk mentioned this pull request Dec 10, 2020

torch.linalg in PyTorch 1.10 tracker #42666

Closed

4 tasks

This was referenced Dec 11, 2020

Implement derivative for torch.lstsq #27036

Closed

Methods that solve systems of linear equations are memory-inefficient (batch-wise broadcasting) #49252

Open

nikitaved changed the title ~~[WIP] Implements torch.linalg.lstsq~~ Implements torch.linalg.lstsq Jan 3, 2021

nikitaved requested review from mruberry and removed request for glaringlee January 3, 2021 15:33

nikitaved commented Jan 3, 2021

View reviewed changes

torch/linalg/__init__.py Show resolved Hide resolved

mruberry requested a review from IvanYashchuk January 4, 2021 16:21

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jan 5, 2021

IvanYashchuk reviewed Jan 6, 2021

View reviewed changes

nikitaved commented Jan 13, 2021

View reviewed changes

IvanYashchuk approved these changes Jan 13, 2021

View reviewed changes

nikitaved mentioned this pull request Jan 13, 2021

Roll-up: remaining TH functions #49421

Closed

14 tasks

mruberry reviewed Jan 19, 2021

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

mruberry reviewed Jan 19, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

IvanYashchuk reviewed Feb 24, 2021

View reviewed changes

test/test_linalg.py Outdated Show resolved Hide resolved

test/test_linalg.py Outdated Show resolved Hide resolved

facebook-github-bot closed this in 3ac9013 Mar 3, 2021

facebook-github-bot added the Merged label Mar 3, 2021

mruberry reopened this Mar 3, 2021

facebook-github-bot reviewed Mar 11, 2021

View reviewed changes

nikitaved force-pushed the nikved/linalg_lstsq branch 3 times, most recently from b0ca309 to 4684f6d Compare March 12, 2021 09:39

implement torch.linalg.lstsq

6064f79

nikitaved force-pushed the nikved/linalg_lstsq branch from 4684f6d to 6064f79 Compare March 12, 2021 09:43

facebook-github-bot reviewed Mar 12, 2021

View reviewed changes

facebook-github-bot closed this in afa1ff8 Mar 12, 2021

vishwakftw mentioned this pull request Mar 16, 2021

Migrate lstsq from the TH to Aten (CPU) #24726 #38210

Closed

IvanYashchuk mentioned this pull request Apr 20, 2021

Add compute_residuals flag for torch.linalg.lstsq #56483

Open

dzhulgakov reviewed May 27, 2021

View reviewed changes

dzhulgakov mentioned this pull request May 27, 2021

Make Tensor::data_ptr available only on rvalue #49192

Closed

raj-magesh mentioned this pull request Aug 6, 2022

ENH: default cond parameter of linalg.lstsq scipy/scipy#16786

Closed

github-actions bot deleted the nikved/linalg_lstsq branch February 11, 2024 01:51


		check_if_copy_needed_for_a(a_curr_linear_batch_idx);

		auto* a_working_ptr = a_3d.select(0, a_curr_linear_batch_idx)

Implements torch.linalg.lstsq #49093

Implements torch.linalg.lstsq #49093

Uh oh!

Conversation

nikitaved commented Dec 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Dec 9, 2020 • edited by facebook-github-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

xiaosu-zhu commented Dec 10, 2020

Uh oh!

nikitaved commented Dec 11, 2020

Uh oh!

xiaosu-zhu commented Dec 11, 2020

Uh oh!

nikitaved commented Jan 3, 2021

Uh oh!

Uh oh!

codecov bot commented Jan 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

IvanYashchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

IvanYashchuk Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

nikitaved Jan 6, 2021

Choose a reason for hiding this comment

Uh oh!

rgommers Jan 11, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mruberry commented Jan 13, 2021

Uh oh!

nikitaved commented Jan 13, 2021

Uh oh!

nikitaved Jan 13, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry Jan 19, 2021

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

IvanYashchuk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nikitaved commented Feb 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikitaved commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruberry commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mruberry commented Mar 2, 2021

Uh oh!

facebook-github-bot commented Mar 3, 2021

Uh oh!

Implements `torch.linalg.lstsq` #49093

Implements `torch.linalg.lstsq` #49093

nikitaved commented Dec 9, 2020 •

edited

Loading

dr-ci bot commented Dec 9, 2020 •

edited by facebook-github-bot

Loading

codecov bot commented Jan 3, 2021 •

edited

Loading

nikitaved commented Feb 24, 2021 •

edited

Loading

nikitaved commented Feb 26, 2021 •

edited

Loading

mruberry commented Feb 26, 2021 •

edited

Loading

mruberry commented Mar 3, 2021 •

edited

Loading

nikitaved May 27, 2021 •

edited

Loading

nikitaved May 27, 2021 •

edited

Loading