Skip to content

Conversation

@brandtbucher
Copy link
Member

@brandtbucher brandtbucher commented Nov 6, 2025

Example flamegraph from one of the tests:

Screenshot 2025-10-26 at 7 24 34 PM

📚 Documentation preview 📚: https://cpython-previews--141108.org.readthedocs.build/

… in the sampling profiler

- Introduce a new field in the GC state to store the frame that initiated garbage collection.
- Update RemoteUnwinder to include options for including "<native>" and "<GC>" frames in the stack trace.
- Modify the sampling profiler to accept parameters for controlling the inclusion of native and GC frames.
- Enhance the stack collector to properly format and append these frames during profiling.
- Add tests to verify the correct behavior of the profiler with respect to native and GC frames, including options to exclude them.
@brandtbucher brandtbucher self-assigned this Nov 6, 2025
@brandtbucher brandtbucher added the type-feature A feature request or enhancement label Nov 6, 2025
@brandtbucher brandtbucher requested a review from 1st1 as a code owner November 6, 2025 03:51
@brandtbucher brandtbucher added sprint interpreter-core (Objects, Python, Grammar, and Parser dirs) labels Nov 6, 2025
@brandtbucher brandtbucher added the stdlib Standard Library Python modules in the Lib/ directory label Nov 6, 2025
@pablogsal
Copy link
Member

Seems that there is either some form of a race of somehow the windows test don't trigger the GC:

Profile Stats:
       nsamples   sample%  tottime (ms)    cumul%   cumtime (s)  filename:lineno(function)
     1510/36080      63.0       151.000    1505.8         3.608  tmpiafjtexk:10(slow_fibonacci)
        0/26017       0.0         0.000    1085.9         2.602  ~:0(<native>)
         0/2395       0.0         0.000     100.0         0.240  _sync_coordinator.py:227(main)
         0/2395       0.0         0.000     100.0         0.240  _sync_coordinator.py:244(<module>)
         0/2395       0.0         0.000     100.0         0.240  runpy.py:88(_run_code)
         0/2395       0.0         0.000     100.0         0.240  runpy.py:198(_run_module_as_main)
         1/2374       0.0         0.100      99.1         0.237  _sync_coordinator.py:186(_execute_script)
         0/2370       0.0         0.000      98.9         0.237  tmpiafjtexk:50(<module>)
         0/2287       0.0         0.000      95.5         0.229  tmpiafjtexk:44(main_loop)
        658/658      27.5        65.800      27.5         0.066  tmpiafjtexk:5(slow_fibonacci)
          91/91       3.8         9.100       3.8         0.009  tmpiafjtexk:7(slow_fibonacci)
           0/83       0.0         0.000       3.5         0.008  tmpiafjtexk:43(main_loop)
          39/39       1.6         3.900       1.6         0.004  tmpiafjtexk:16(cpu_intensive_work)
          35/35       1.5         3.500       1.5         0.004  tmpiafjtexk:17(cpu_intensive_work)
          29/29       1.2         2.900       1.2         0.003  tmpiafjtexk:8(slow_fibonacci)

Legend:
  nsamples: Direct/Cumulative samples (direct executing / on call stack)
  sample%: Percentage of total samples this function was directly executing
  tottime: Estimated total time spent directly in this function
  cumul%: Percentage of total samples when this function was on the call stack
  cumtime: Estimated cumulative time (including time in called functions)
  filename:lineno(function): Function location and name

Summary of Interesting Functions:

Functions with Highest Direct/Cumulative Ratio (Hot Spots):
  1.000 direct/cumulative ratio, 3.1% direct samples: tmpiafjtexk:(cpu_intensive_work)
  0.062 direct/cumulative ratio, 95.5% direct samples: tmpiafjtexk:(slow_fibonacci)
  0.000 direct/cumulative ratio, 0.0% direct samples: _sync_coordinator.py:(_execute_script)

Functions with Highest Call Frequency (Indirect Calls):
  34570 indirect calls, 1538.3% total stack presence: tmpiafjtexk:(slow_fibonacci)
  26017 indirect calls, 1085.9% total stack presence: ~:(<native>)
  2395 indirect calls, 100.0% total stack presence: _sync_coordinator.py:(main)

Functions with Highest Call Magnification (Cumulative/Direct):
  2374.0x call magnification, 2373 indirect calls from 1 direct: _sync_coordinator.py:(_execute_script)
  16.1x call magnification, 34570 indirect calls from 2288 direct: tmpiafjtexk:(slow_fibonacci)
'8;6u

@pablogsal
Copy link
Member

Another posibility is that the machines are too slow and we don't even get to run under the gc somehow?

@pablogsal
Copy link
Member

Maybe slow_fibonacci is too slow? 😆

@pablogsal
Copy link
Member

pablogsal commented Nov 8, 2025

I am thinking that <native> it's useful but perhaps it's a bit noisy if you are not hunting for it? Should we default it to False?

Another idea is that maybe there is a C function in the stack maybe in another PR we can fetch the C function name and use that as the code?

@pablogsal
Copy link
Member

I have pushed some new tests and fixes hopefully this does the trick

@brandtbucher
Copy link
Member Author

The flakiness of these sorts of tests is... annoying. Quitting for the night.

@pablogsal
Copy link
Member

The flakiness of these sorts of tests is... annoying. Quitting for the night.

I feel you. Unfortunately it's very hard to write correct code here as its fundamentally a race condition between the function being profiled and the profiler. Specially in slow machines it's a pain.

I recommend doing one thing and one thing only per test

@pablogsal
Copy link
Member

@brandtbucher a suggestion if you struggled with CI it's to just add the GC switch in this PR and figure out native mode layer as that is currently less useful and it's giving us trouble.

@brandtbucher
Copy link
Member Author

I think it's an ASan-specific thing (I can reproduce locally). I'll figure out what's going on later.

@brandtbucher
Copy link
Member Author

I thought I was being clever when I also added support for native frames at the very top of the stack in a recent commit, but that only works on debug builds (where we clear the stack pointer upon resuming a Python frame). 🤦🏼‍♂️

Reverting, this version only finds native frames in the middle of the stack now.

@pablogsal
Copy link
Member

I thought I was being clever when I also added support for native frames at the very top of the stack in a recent commit, but that only works on debug builds (where we clear the stack pointer upon resuming a Python frame). 🤦🏼‍♂️

Reverting, this version only finds native frames in the middle of the stack now.

Haha nice!

I assume this means that you prefer to go with GC + native in this PR then, no?

@brandtbucher
Copy link
Member Author

Yeah, I’m happy with the current state. We can beef up the native features later if they’re worth the performance hit.

Copy link
Member

@pablogsal pablogsal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT! Amazing work 💪

Left some small comments

@github-project-automation github-project-automation bot moved this from Todo to In Progress in Sprint 2024 Nov 13, 2025
@pablogsal
Copy link
Member

Fixed some merge conflicts

@pablogsal pablogsal enabled auto-merge (squash) November 17, 2025 13:32
@pablogsal pablogsal merged commit 336366f into python:main Nov 17, 2025
46 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Sprint 2024 Nov 17, 2025
@bedevere-bot
Copy link

⚠️⚠️⚠️ Buildbot failure ⚠️⚠️⚠️

Hi! The buildbot AMD64 FreeBSD14 3.x (tier-3) has failed when building commit 336366f.

What do you need to do:

  1. Don't panic.
  2. Check the buildbot page in the devguide if you don't know what the buildbots are or how they work.
  3. Go to the page of the buildbot that failed (https://buildbot.python.org/#/builders/1232/builds/7240) and take a look at the build logs.
  4. Check if the failure is related to this commit (336366f) or if it is a false positive.
  5. If the failure is related to this commit, please, reflect that on the issue and make a new Pull Request with a fix.

You can take a look at the buildbot page here:

https://buildbot.python.org/#/builders/1232/builds/7240

Failed tests:

  • test_profiling

Failed subtests:

  • test_process_pool_executor_pickle - test.test_profiling.test_sampling_profiler.TestProcessPoolExecutorSupport.test_process_pool_executor_pickle

Summary of the results of the build (if available):

==

Click to see traceback logs
Traceback (most recent call last):
  File "/home/buildbot/buildarea/3.x.opsec-fbsd14/build/Lib/test/test_profiling/test_sampling_profiler.py", line 3354, in test_process_pool_executor_pickle
    self.assertIn("Results: [2, 4, 6]", stdout)
    ~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError: 'Results: [2, 4, 6]' not found in ''

@pablogsal
Copy link
Member

This is not in this PR I will take a look

@pablogsal
Copy link
Member

Fixing in #141688 (comment)

StanFromIreland pushed a commit to StanFromIreland/cpython that referenced this pull request Dec 6, 2025
…filer (python#141108)

- Introduce a new field in the GC state to store the frame that initiated garbage collection.
- Update RemoteUnwinder to include options for including "<native>" and "<GC>" frames in the stack trace.
- Modify the sampling profiler to accept parameters for controlling the inclusion of native and GC frames.
- Enhance the stack collector to properly format and append these frames during profiling.
- Add tests to verify the correct behavior of the profiler with respect to native and GC frames, including options to exclude them.

Co-authored-by: Pablo Galindo Salgado <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

interpreter-core (Objects, Python, Grammar, and Parser dirs) sprint stdlib Standard Library Python modules in the Lib/ directory type-feature A feature request or enhancement

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

3 participants