gh-98896: Fix parsing issue in resource_tracker to allow shared memory names containing colons #138473
gh-98896: Fix parsing issue in resource_tracker to allow shared memory names containing colons #138473encukou merged 7 commits intopython:mainfrom
Conversation
13ca553 to
d973cf0
Compare
|
Hi Petr,
Thanks for your email (and also the one you sent later) - it is very kind
of you.
I try to address, indeed, the newlines - by encoding and decoding the
shared_memory name. So a different approach than before. See the updated PR.
Regards,
Rani
…On Tue, Nov 4, 2025 at 2:56 PM Petr Viktorin ***@***.***> wrote:
***@***.**** commented on this pull request.
Hello! Sorry for the delay in reviewing.
The lix looks good for colons, but a comment on the issue also mentioned
*newlines* in filenames. Do you want to tackle that, or leave it to
another PR?
------------------------------
In Lib/multiprocessing/resource_tracker.py
<#138473 (comment)>:
> + parts = line.strip().decode('ascii').split(':')
+ if len(parts) < 3:
+ raise ValueError("malformed resource_tracker message: %r" % (parts,))
+ cmd = parts[0]
+ rtype = parts[-1]
+ name = ':'.join(parts[1:-1])
How about this?
⬇️ Suggested change
- parts = line.strip().decode('ascii').split(':')
- if len(parts) < 3:
- raise ValueError("malformed resource_tracker message: %r" % (parts,))
- cmd = parts[0]
- rtype = parts[-1]
- name = ':'.join(parts[1:-1])
+ cmd, *name_parts, rtype = line.strip().decode('ascii').split(':')
+ name = ':'.join(name_parts)
—
Reply to this email directly, view it on GitHub
<#138473 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AH6O62VUX7TDUCM6PR6DYN333CV67AVCNFSM6AAAAACFRVHJJWVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTIMJWGU4DAOBUGY>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
encukou
left a comment
There was a problem hiding this comment.
Thank you!
This approach feels ad-hoc; did you consider a standard encoding like JSON lines or pickle?
(Python's JSON encoder won't output newlines unless you ask for indentation; pickle.load reads one object from a file and stops.)
But I can merge this if you don't want to explore those options.
| cmd, name, rtype = line.strip().decode('ascii').split(':') | ||
| cmd, enc_name, rtype = line.rstrip(b'\n').decode('ascii').split(':', 2) | ||
| if rtype == "shared_memory": | ||
| name = base64.urlsafe_b64decode(enc_name.encode('ascii')).decode('utf-8', 'surrogateescape') |
There was a problem hiding this comment.
Decoding is OK with strings.
| name = base64.urlsafe_b64decode(enc_name.encode('ascii')).decode('utf-8', 'surrogateescape') | |
| name = base64.urlsafe_b64decode(enc_name).decode('utf-8', 'surrogateescape') |
There was a problem hiding this comment.
Thanks for your comment. I like your suggestion to consider using JSON to encode the message.
Also, in the original implementation there was a check that the message is not longer than 512 bytes, since writes shorter than PIPE_BUF (512 bytes on POSIX) are guaranteed to be atomic. But there was no check on the name length.
Now I check that the name is at most 255 bytes long, which is the value of NAME_MAX on Linux (including the leading slash that POSIX requires in shared memory and semaphore names).
I still encode the name using Base64, because json.dumps(..., ensure_ascii=True) would otherwise expand each non-ASCII byte into a 6-character escape like \uDC80. Using Base64 ensures that a 255-byte name becomes at most 340 bytes long, so the total JSON message always remains well below 512 bytes.
As a result, the previous runtime check for the message length is now replaced by an assert.
What do you think?
…the sent length below 512
encukou
left a comment
There was a problem hiding this comment.
Hm, good catch about still needing base64 for atomicity.
Hopefully, humans won't need to inspect the stream.
| line = raw.rstrip(b'\n') | ||
| try: | ||
| obj = json.loads(line.decode('ascii')) |
There was a problem hiding this comment.
Stripping and decoding shouldn't be needed; json.loads can handle bytes and trailing newlines.
| if not isinstance(cmd, str) or not isinstance(rtype, str) or not isinstance(b64, str): | ||
| raise ValueError("malformed resource_tracker fields: %r" % (obj,)) | ||
|
|
||
| enc = b64.encode('ascii') |
There was a problem hiding this comment.
urlsafe_b64decode can handle bytes as well.
| cmd = obj.get("cmd") | ||
| rtype = obj.get("rtype") | ||
| b64 = obj.get("base64_name") |
There was a problem hiding this comment.
You don't need get with constant arguments:
| cmd = obj.get("cmd") | |
| rtype = obj.get("rtype") | |
| b64 = obj.get("base64_name") | |
| cmd = obj["cmd"] | |
| rtype = obj["rtype"] | |
| b64 = obj["base64_name"] |
There was a problem hiding this comment.
Alternately, make some fields optional -- then the PROBE message can be shorter:
| cmd = obj.get("cmd") | |
| rtype = obj.get("rtype") | |
| b64 = obj.get("base64_name") | |
| cmd = obj["cmd"] | |
| rtype = obj["rtype"] | |
| b64 = obj.get("base64_name", "") |
|
🤖 New build scheduled with the buildbot fleet by @encukou for commit ea6416e 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F138473%2Fmerge If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again. |
|
Looks good! Thank you for the fix! |
|
Thanks @rani-pinchuk for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.13. |
|
Thanks @rani-pinchuk for the PR, and @encukou for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14. |
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
| res = assert_python_failure("-c", code, PYTHONWARNINGS='error') | ||
| self.assertIn(b'DeprecationWarning', res.err) | ||
| self.assertIn(b'is multi-threaded, use of forkpty() may lead to deadlocks in the child', res.err) | ||
|
|
There was a problem hiding this comment.
This should have been a double line gap. I'll fix it in the backports.
…itrary shared memory names (pythonGH-138473) (cherry picked from commit c6f3dd6) Co-authored-by: Rani Pinchuk <[email protected]>
|
GH-141922 is a backport of this pull request to the 3.14 branch. |
… shared memory names (GH-138473) (GH-141922) Co-authored-by: Rani Pinchuk <[email protected]>
…itrary shared memory names (pythonGH-138473) (pythonGH-141922) (cherry picked from commit 64d6bde) Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Rani Pinchuk <[email protected]>
|
GH-142014 is a backport of this pull request to the 3.13 branch. |
… shared memory names (GH-138473) (GH-142014) (cherry picked from commit 64d6bde) Co-authored-by: Stan Ulbrych <[email protected]> Co-authored-by: Rani Pinchuk <[email protected]>
|
this wound up causing some disruption in the stable branch backports - it is fine to keep this as is in main. but we may need to rework a protocol compatible solution in 3.14 and 3.13. #142206 |
…shared memory names (pythonGH-138473)
Shared memory names containing colons were not parsed correctly as the code of
resource_trackerassumed that these names contain no colons.@encukou