-
-
Notifications
You must be signed in to change notification settings - Fork 33.7k
gh-138122: Add blocking mode for accurate stack traces in Tachyon #142998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
bb510c0 to
8c21dc9
Compare
|
@ivonastojanovic can you take a look? |
de4448b to
c270764
Compare
Non-blocking sampling reads process memory while the target continues running, which can produce torn stacks when generators or coroutines rapidly switch between yield points. Blocking mode uses atomic process suspension (task_suspend on macOS, NtSuspendProcess on Windows, PTRACE_SEIZE on Linux) to stop the target during each sample, ensuring consistent snapshots. Use blocking mode with longer intervals (1ms+) to avoid impacting the target too much. The default non-blocking mode remains best for most cases since it has zero overhead. Also fix a frame cache bug: the cache was including the last_profiled_frame itself when extending with cached data, but this frame was executing in the previous sample and its line number may have changed. For example, if function A was sampled at line 6, then execution continued to line 10 and called B→C, the next sample would incorrectly report A at line 6 (from cache) instead of line 10. The fix uses start_idx + 1 to only trust frames ABOVE last_profiled_frame — these caller frames are frozen at their call sites and cannot change until their callees return. Signed-off-by: Pablo Galindo <[email protected]>
|
|
||
| // Extend frame_info with frames from start_idx onwards | ||
| PyObject *slice = PyList_GetSlice(entry->frame_list, start_idx, num_frames); | ||
| // Extend frame_info with frames ABOVE start_idx (not including it). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixes a frame cache bug: the cache was including the last_profiled_frame
itself when extending with cached data, but this frame was executing in
the previous sample and its line number may have changed. For example,
if function A was sampled at line 6, then execution continued to line 10
and called B→C, the next sample would incorrectly report A at line 6
(from cache) instead of line 10. The fix uses start_idx + 1 to only trust
frames ABOVE last_profiled_frame: these caller frames are frozen at their
call sites and cannot change until their callees return.
ivonastojanovic
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a couple of nits
| collector.collect_failed_sample() | ||
| errors += 1 | ||
| except Exception as e: | ||
| print(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| print(e) |
| duration_sec = current_time - start_time | ||
| break | ||
| except (RuntimeError, UnicodeDecodeError, MemoryError, OSError): | ||
| except (RuntimeError, UnicodeDecodeError, MemoryError, OSError) as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| except (RuntimeError, UnicodeDecodeError, MemoryError, OSError) as e: | |
| except (RuntimeError, UnicodeDecodeError, MemoryError, OSError): |
| # When consume_generator is on the arithmetic lines (temp1, temp2, etc.), | ||
| # fibonacci_generator should NOT be in the stack at all. | ||
| # Line numbers are important here - see ARITHMETIC_LINES below. | ||
| cls.generator_script = ''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Maybe we could use textwrap.dedent here to keep the string nicely formatted. What do you think?
|
|
||
| .. option:: --blocking | ||
|
|
||
| Stop the target process during each sample. This ensures consistent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if pause might sound better than stop here
| Stop the target process during each sample. This ensures consistent | |
| Pause the target process during each sample. This ensures consistent |
Non-blocking sampling reads process memory while the target continues
running, which can produce torn stacks when generators or coroutines
rapidly switch between yield points. Blocking mode uses atomic process
suspension (task_suspend on macOS, NtSuspendProcess on Windows,
PTRACE_SEIZE on Linux) to stop the target during each sample, ensuring
consistent snapshots.
Use blocking mode with longer intervals (1ms+) to avoid impacting the
target too much. The default non-blocking mode remains best for most
cases since it has zero overhead.
📚 Documentation preview 📚: https://cpython-previews--142998.org.readthedocs.build/