Skip to content

Conversation

@gpshead
Copy link
Member

@gpshead gpshead commented Apr 19, 2018

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible. We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of a
simple appropriate minimum size int64_t[1] as [1] length arrays at the
end of a struct are known to clang to be used for variable sized objects.

A variable length array (VLA) would be more proper and simplify the
dictobject.c code further by not having to subtract the size of the struct
memeber in the three places it does size calculations, but PEP-007 does not
allow those in CPython's coding standard today.

https://bugs.python.org/issue33312

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of a
simple appropriate minimum size int64_t[1] as [1] length arrays at the
end of a struct are known to clang to be used for variable sized objects.

A variable length array (VLA) would be more proper and simplify the
dictobject.c code further by not having to subtract the size of the struct
memeber in the three places it does size calculations, but PEP-007 does not
allow those in CPython's coding standard today.
If MSVC on appveyor does not like the VLA I'll go back to [1] instead of [].
@gpshead
Copy link
Member Author

gpshead commented Apr 19, 2018

See https://bugs.python.org/issue33312 for discussion.

#endif
} dk_indices;
Dynamically sized, SIZEOF_VOID_P is minimum. */
char dk_indices[]; /* char is required to avoid strict aliasing. */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not it be unsigned char?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the DKIX_EMPTY constant is -1 and all of the types this was replacing are signed (as are the things we cast it to everywhere). so sticking with char made sense.

i'd prefer to say int8_t but given that references I've found only mention char and unsigned char in relation to strict aliasing I'm being conservative and exactly matching that.

.dk_indices = { .as_1 = {DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY,
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}},
{DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY,
DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY, DKIX_EMPTY}, /* dk_indices */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the size of the dk_indices field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a static initializer. it's my understanding that static initializing a VLA has the compiler allocate space for however many elements you enter.

example: char foo[] = "hello"

#endif
#define DK_ENTRIES(dk) \
((PyDictKeyEntry*)(&(dk)->dk_indices.as_1[DK_SIZE(dk) * DK_IXSIZE(dk)]))
((PyDictKeyEntry*)(&((int8_t*)((dk)->dk_indices))[DK_SIZE(dk) * DK_IXSIZE(dk)]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following expression look a tiny bit clearer to you?

((PyDictKeyEntry*)((int8_t*)((dk)->dk_indices) + DK_SIZE(dk) * DK_IXSIZE(dk))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe, but i'll still toss another pair of ()s in there for clarity:

((PyDictKeyEntry*)(((int8_t*)((dk)->dk_indices)) + DK_SIZE(dk) * DK_IXSIZE(dk))

even though i believe those are equivalent (the cast happens before the + ?)

@gpshead gpshead merged commit 397f1b2 into python:master Apr 20, 2018
@miss-islington
Copy link
Contributor

Thanks @gpshead for the PR 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

@gpshead gpshead deleted the bjp_issue33312 branch April 20, 2018 05:41
@bedevere-bot
Copy link

GH-6543 is a backport of this pull request to the 3.7 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 20, 2018
…6537)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)

Co-authored-by: Gregory P. Smith <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Apr 20, 2018
…6537)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)

Co-authored-by: Gregory P. Smith <[email protected]>
@bedevere-bot
Copy link

GH-6544 is a backport of this pull request to the 3.6 branch.

gpshead pushed a commit that referenced this pull request Apr 20, 2018
…H-6543)

Fix clang ubsan (undefined behavior sanitizer) warnings in dictobject.c by
adjusting how the internal struct _dictkeysobject shared keys structure is
declared.

This remains ABI compatible.  We get rid of the union at the end of the
struct being used for conveinence to avoid typecasting in favor of char[]
variable length array at the end of a struct. This is known to clang to be
used for variable sized objects and will not cause an undefined behavior
problem.  Similarly, char arrays do not have strict aliasing undefined
behavior when cast.

PEP-007 does not currently list variable length arrays (VLAs) as allowed
in our subset of C99.  If this turns out to be a problem, the fix to this is
to change the char `dk_indices[]` into `dk_indices[1]` and restore the
three size computation subtractions this change removes:
  `- Py_MEMBER_SIZE(PyDictKeysObject, dk_indices)`

If this works as is I'll make a separate PR to update PEP-007.
(cherry picked from commit 397f1b2)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type-bug An unexpected behavior, bug, or error

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants