Skip to content

Conversation

@mattip
Copy link
Member

@mattip mattip commented Aug 14, 2018

Fixes #9440. The comment in the code was incorrect, array_setstate used in unpickling does not call _IsWriteable. _IsWriteable is only used in setting flags.

Note that new pickling tests use a large buffer to ensure it goes through PyArray_SetBaseObject in array_setstate, but that the flag is still set to writeable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based off readonly buffers? Seems harmless to allow them to be made readonly if the source buffer was writeable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarify - readonly buffer

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we ought to rename this to frombuffer at some point, especially once we only support python 3 and fromstring(str) fails but fromstr(bytes) succeeds.

@mattip
Copy link
Member Author

mattip commented Aug 29, 2018

@eric-wieser ping

Copy link
Member

@eric-wieser eric-wieser Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the purpose of this test - why would a writeable array have a non-writeable base?

Also, on my machine type(vals.base) is always bytes - I don't see any point listing str and unicode here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed non-writable bases

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the fix here is that we should be pickling bytearray objects, not bytes

Copy link
Member

@eric-wieser eric-wieser Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a comment - there's something special about 1000 here - using anything below 251 (dtype=np.int32 on my machine) does not set .base for me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comment. The condition is in array_setstate, to decide to use PyArray_SetBaseObject

@mattip mattip force-pushed the set-write-flag branch 2 times, most recently from f9cae4f to b6c72b8 Compare August 30, 2018 07:50
@mattip
Copy link
Member Author

mattip commented Aug 30, 2018

Squashed to a single commit and rebased

@mattip
Copy link
Member Author

mattip commented Oct 10, 2018

rebased to fix merge conflict

@mattip
Copy link
Member Author

mattip commented Nov 14, 2018

Can this go in for 1.16?

@mattip
Copy link
Member Author

mattip commented Nov 26, 2018

@eric-wieser, as you were the only one to look at this, can you give this a final review and approve/reject? The change itself is rather minor and only involves removing code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add def test_writeable_from_readonly here to split the tests up a little?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And perhaps add def test_writeable_from_buffer as a test name here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good - but the test could do with being split up a little.


.. _`NEP 15` : http://www.numpy.org/neps/nep-0015-merge-multiarray-umath.html
.. _`NEP 18` : http://www.numpy.org/neps/nep-0018-array-function-protocol.html
Arrays based off readonly buffers cannot be set ``writeable``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whitespace, but this can be fixed in the 1.16 release

@eric-wieser eric-wieser merged commit efffc76 into numpy:master Nov 26, 2018
@eric-wieser
Copy link
Member

Thanks @mattip

@matthew-brett
Copy link
Contributor

This is one is a bit of a bummer for us over at Nibabel.

We are reading in large (sometimes very large) datasets, from disk. We're doing this in Python. Because we've done the reading, we know that the bytes object we read, can be modified safely without causing problems for anyone else. Thus we set the writeable flag, to avoid a big copy:

https://github.com/nipy/nibabel/blob/master/nibabel/volumeutils.py#L543

The change in this PR prevented us from doing that. In the spirit of Python's "let me shoot myself in the foot if I really want to, and I claim to know what I'm doing" - is there any way to keep our memory-saving behavior with current numpy?

See: nipy/nibabel#697

@mattip
Copy link
Member Author

mattip commented Dec 23, 2018

We currently do not have a way to force the writable flag to True from python. You could do this in C. We seem to have conflicting goals of eliminating dangerous behaviour and allow advanced users to use dangerous behaviour

@mattip mattip deleted the set-write-flag branch December 23, 2018 10:35
@matthew-brett
Copy link
Contributor

I don't think the goals really conflict. I think it boils down to avoid dangerous behavior by default and ask the user to be explicit that they want dangerous behavior. That is, do not put needless technical barriers in the way of doing dangerous things, but make sure that the it's hard to do dangerous things by accident.

@william-silversmith
Copy link

This change broke my code as well. I have a use case where I download serialized 3D arrays that represent a big image of brain tissue. The images are split into chunks to make random access possible. If someone wants to do a "non-chunk aligned write", we must download the appropriate chunks, render them into a buffer, and shade the overlapping region. However, the decoder spits out numpy arrays set to "write=False". Since these buffers are going to be reuploaded and discarded, it's perfectly safe to write to them and saves memory. In this case, there is a 1:1 relationship between the underlying buffer and the numpy array. That can't be known ahead of time by numpy, it's something the programmer knows.

As we are using numpy to processes large images, occasionally it is necessary to ensure that we have 1x memory usage without copies. Taking advantage of tricks like this has enabled us to render 62 GB images without blowing out memory in the past (we were limited to a small number of copies that could be held in memory on a large machine).

My options going forward are to pin the numpy version or make my code less efficient. It would be desirable if this change were either reverted or another avenue was available for "dangerous" modifications.

@charris
Copy link
Member

charris commented Jan 14, 2019

@mattip I think we are going to need a workaround here.

@eric-wieser
Copy link
Member

However, the decoder spits out numpy arrays set to "write=False". Since these buffers are going to be reuploaded and discarded, it's perfectly safe to write to them and saves memory

If this is true, then your decoder has a bug in it. You should not be outputting readonly arrays if your intent is for them to not be readonly.

@william-silversmith
Copy link

william-silversmith commented Jan 14, 2019

In my case:

raw_data = b'...' # from network
output = np.frombuffer(raw_data, dtype=dtype).reshape(shape, order='F')

@mattip
Copy link
Member Author

mattip commented Jan 15, 2019

you would get better performance by using socket.recv_into(array) with a preallocated ndarray, since you can control the lifetime of the ndarray and possibly reuse it. Here is a simple example from stack overflow

@eric-wieser
Copy link
Member

As @mattip alludes to, the important line is raw_data = b'...' # from network. If python has returned a bytes object, then you've already lost - that's python telling you "you are not allowed to modify this".

Numpy used to let you ignore python there, but now it does not. The fix is to somehow obtain a bytearray object instead, which should be just as fast.

thewtex added a commit to thewtex/ITK that referenced this pull request Jun 17, 2019
GetArrayFromImage was refactored so the buffer is first passed to this
GetArrayViewFromImage.  In this case, it does not need to be writable.
Remove the restriction to retain used in GetArrayViewFromImage.

See also:

  numpy/numpy#11739
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants