BUG: disallow setting flag to writeable after fromstring, frombuffer #11739

mattip · 2018-08-14T18:23:17Z

Fixes #9440. The comment in the code was incorrect, array_setstate used in unpickling does not call _IsWriteable. _IsWriteable is only used in setting flags.

Note that new pickling tests use a large buffer to ensure it goes through PyArray_SetBaseObject in array_setstate, but that the flag is still set to writeable.

eric-wieser · 2018-08-15T04:25:33Z

doc/release/1.16.0-notes.rst

Based off readonly buffers? Seems harmless to allow them to be made readonly if the source buffer was writeable

eric-wieser · 2018-08-15T04:26:51Z

numpy/core/tests/test_multiarray.py

clarify - readonly buffer

numpy/core/tests/test_multiarray.py

eric-wieser · 2018-08-15T04:38:34Z

numpy/core/tests/test_multiarray.py

Looks like we ought to rename this to frombuffer at some point, especially once we only support python 3 and fromstring(str) fails but fromstr(bytes) succeeds.

mattip · 2018-08-29T11:05:59Z

@eric-wieser ping

eric-wieser · 2018-08-30T06:23:52Z

numpy/core/tests/test_multiarray.py

I'm confused by the purpose of this test - why would a writeable array have a non-writeable base?

Also, on my machine type(vals.base) is always bytes - I don't see any point listing str and unicode here

removed non-writable bases

It seems like the fix here is that we should be pickling bytearray objects, not bytes

eric-wieser · 2018-08-30T06:27:19Z

numpy/core/tests/test_multiarray.py

This needs a comment - there's something special about 1000 here - using anything below 251 (dtype=np.int32 on my machine) does not set .base for me

added comment. The condition is in array_setstate, to decide to use PyArray_SetBaseObject

mattip · 2018-08-30T08:17:36Z

Squashed to a single commit and rebased

numpy/core/tests/test_multiarray.py

numpy/core/src/multiarray/common.c

mattip · 2018-10-10T05:18:49Z

rebased to fix merge conflict

mattip · 2018-11-14T23:30:54Z

Can this go in for 1.16?

mattip · 2018-11-26T16:36:08Z

@eric-wieser, as you were the only one to look at this, can you give this a final review and approve/reject? The change itself is rather minor and only involves removing code.

eric-wieser · 2018-11-26T16:44:44Z

numpy/core/tests/test_multiarray.py

Can you add def test_writeable_from_readonly here to split the tests up a little?

eric-wieser · 2018-11-26T16:45:31Z

numpy/core/tests/test_multiarray.py

And perhaps add def test_writeable_from_buffer as a test name here

eric-wieser

Generally looks good - but the test could do with being split up a little.

…ytes))

eric-wieser · 2018-11-26T23:39:43Z

doc/release/1.16.0-notes.rst


 .. _`NEP 15` : http://www.numpy.org/neps/nep-0015-merge-multiarray-umath.html
 .. _`NEP 18` : http://www.numpy.org/neps/nep-0018-array-function-protocol.html
+Arrays based off readonly buffers cannot be set ``writeable``


Whitespace, but this can be fixed in the 1.16 release

eric-wieser · 2018-11-26T23:40:35Z

Thanks @mattip

matthew-brett · 2018-12-11T13:11:07Z

This is one is a bit of a bummer for us over at Nibabel.

We are reading in large (sometimes very large) datasets, from disk. We're doing this in Python. Because we've done the reading, we know that the bytes object we read, can be modified safely without causing problems for anyone else. Thus we set the writeable flag, to avoid a big copy:

https://github.com/nipy/nibabel/blob/master/nibabel/volumeutils.py#L543

The change in this PR prevented us from doing that. In the spirit of Python's "let me shoot myself in the foot if I really want to, and I claim to know what I'm doing" - is there any way to keep our memory-saving behavior with current numpy?

See: nipy/nibabel#697

mattip · 2018-12-23T10:35:49Z

We currently do not have a way to force the writable flag to True from python. You could do this in C. We seem to have conflicting goals of eliminating dangerous behaviour and allow advanced users to use dangerous behaviour

matthew-brett · 2018-12-23T11:09:49Z

I don't think the goals really conflict. I think it boils down to avoid dangerous behavior by default and ask the user to be explicit that they want dangerous behavior. That is, do not put needless technical barriers in the way of doing dangerous things, but make sure that the it's hard to do dangerous things by accident.

william-silversmith · 2019-01-14T20:04:30Z

This change broke my code as well. I have a use case where I download serialized 3D arrays that represent a big image of brain tissue. The images are split into chunks to make random access possible. If someone wants to do a "non-chunk aligned write", we must download the appropriate chunks, render them into a buffer, and shade the overlapping region. However, the decoder spits out numpy arrays set to "write=False". Since these buffers are going to be reuploaded and discarded, it's perfectly safe to write to them and saves memory. In this case, there is a 1:1 relationship between the underlying buffer and the numpy array. That can't be known ahead of time by numpy, it's something the programmer knows.

As we are using numpy to processes large images, occasionally it is necessary to ensure that we have 1x memory usage without copies. Taking advantage of tricks like this has enabled us to render 62 GB images without blowing out memory in the past (we were limited to a small number of copies that could be held in memory on a large machine).

My options going forward are to pin the numpy version or make my code less efficient. It would be desirable if this change were either reverted or another avenue was available for "dangerous" modifications.

charris · 2019-01-14T21:30:28Z

@mattip I think we are going to need a workaround here.

eric-wieser · 2019-01-14T23:04:21Z

However, the decoder spits out numpy arrays set to "write=False". Since these buffers are going to be reuploaded and discarded, it's perfectly safe to write to them and saves memory

If this is true, then your decoder has a bug in it. You should not be outputting readonly arrays if your intent is for them to not be readonly.

william-silversmith · 2019-01-14T23:19:04Z

In my case:

raw_data = b'...' # from network
output = np.frombuffer(raw_data, dtype=dtype).reshape(shape, order='F')

mattip · 2019-01-15T02:58:52Z

you would get better performance by using socket.recv_into(array) with a preallocated ndarray, since you can control the lifetime of the ndarray and possibly reuse it. Here is a simple example from stack overflow

eric-wieser · 2019-01-15T03:11:24Z

As @mattip alludes to, the important line is raw_data = b'...' # from network. If python has returned a bytes object, then you've already lost - that's python telling you "you are not allowed to modify this".

Numpy used to let you ignore python there, but now it does not. The fix is to somehow obtain a bytearray object instead, which should be just as fast.

GetArrayFromImage was refactored so the buffer is first passed to this GetArrayViewFromImage. In this case, it does not need to be writable. Remove the restriction to retain used in GetArrayViewFromImage. See also: numpy/numpy#11739

mattip added 00 - Bug component: numpy._core labels Aug 14, 2018

eric-wieser reviewed Aug 15, 2018

View reviewed changes

numpy/core/tests/test_multiarray.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 15, 2018

View reviewed changes

eric-wieser reviewed Aug 30, 2018

View reviewed changes

mattip force-pushed the set-write-flag branch 2 times, most recently from f9cae4f to b6c72b8 Compare August 30, 2018 07:50

eric-wieser reviewed Aug 30, 2018

View reviewed changes

numpy/core/tests/test_multiarray.py Outdated Show resolved Hide resolved

eric-wieser reviewed Aug 30, 2018

View reviewed changes

numpy/core/src/multiarray/common.c Outdated Show resolved Hide resolved

mattip force-pushed the set-write-flag branch from b6c72b8 to b2857c0 Compare September 9, 2018 09:42

mattip force-pushed the set-write-flag branch from b2857c0 to b87d4bd Compare October 10, 2018 05:17

mattip force-pushed the set-write-flag branch from b87d4bd to b17d772 Compare October 16, 2018 05:02

eric-wieser reviewed Nov 26, 2018

View reviewed changes

BUG: disallow setting flag to writeable if isinstance(a.base, (str, b…

a2202b9

…ytes))

mattip force-pushed the set-write-flag branch from b17d772 to a2202b9 Compare November 26, 2018 18:01

eric-wieser reviewed Nov 26, 2018

View reviewed changes

eric-wieser merged commit efffc76 into numpy:master Nov 26, 2018

effigies mentioned this pull request Dec 3, 2018

Numpy pre-release breaks setting some arrays writable nipy/nibabel#697

Closed

effigies mentioned this pull request Dec 10, 2018

FIX: Numpy pre-release accommodations nipy/nibabel#700

Merged

mattip deleted the set-write-flag branch December 23, 2018 10:35

ElDeveloper mentioned this pull request Jan 9, 2019

Test suite uncovers issues with latest numpy version scikit-bio/scikit-bio#1648

Closed

william-silversmith mentioned this pull request Jan 14, 2019

Numpy 1.16 breaks setflags(write=True) seung-lab/cloud-volume#181

Closed

william-silversmith mentioned this pull request Jan 15, 2019

read directly into bytearrays in get_file seung-lab/cloud-volume#183

Open

colinpalmer mentioned this pull request Jan 23, 2019

Errors when running against numpy 1.16 ccpem/mrcfile#13

Closed

effigies mentioned this pull request Apr 1, 2019

Cannot set values in GIFTI data array loaded from file nipy/nibabel#746

Closed

pauldmccarthy mentioned this pull request Apr 11, 2019

Support read-only numpy arrays mcfletch/pyopengl#22

Open

thewtex mentioned this pull request Jun 17, 2019

BUG: Do not require PyBUF_WRITABLE in GetArrayViewFromImage InsightSoftwareConsortium/ITK#1027

Merged

brpdt mentioned this pull request May 22, 2020

Unnecessary copying of read-only arrays negates benefit of BLAS saxpy #16347

Closed

Uh oh!

BUG: disallow setting flag to writeable after fromstring, frombuffer #11739

BUG: disallow setting flag to writeable after fromstring, frombuffer #11739

Uh oh!

Conversation

mattip commented Aug 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Aug 29, 2018

Uh oh!

eric-wieser Aug 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser Aug 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip commented Aug 30, 2018

Uh oh!

Uh oh!

Uh oh!

mattip commented Oct 10, 2018

Uh oh!

mattip commented Nov 14, 2018

Uh oh!

mattip commented Nov 26, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented Nov 26, 2018

Uh oh!

matthew-brett commented Dec 11, 2018

Uh oh!

mattip commented Dec 23, 2018

Uh oh!

matthew-brett commented Dec 23, 2018

Uh oh!

william-silversmith commented Jan 14, 2019

Uh oh!

charris commented Jan 14, 2019

Uh oh!

eric-wieser commented Jan 14, 2019

Uh oh!

william-silversmith commented Jan 14, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattip commented Jan 15, 2019

Uh oh!

eric-wieser commented Jan 15, 2019

Uh oh!

Reviewers

Assignees

eric-wieser Aug 30, 2018 •

edited

Loading

eric-wieser Aug 30, 2018 •

edited

Loading

william-silversmith commented Jan 14, 2019 •

edited

Loading