Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: b7062
Choose a base ref
...
head repository: ggml-org/llama.cpp
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: b7063
Choose a head ref
  • 1 commit
  • 1 file changed
  • 1 contributor

Commits on Nov 15, 2025

  1. vulkan: change graph_compute to be async and enable get_tensor_async (#…

    …17158)
    
    * vulkan: change graph_compute to be async and enable get_tensor_async
    
    This allows some additional CPU/GPU overlap for large pp workloads. Also seems
    to help a bit for token gen, maybe getting rid of a small bubble between
    graph_compute and get_tensor.
    
    Async set and copy functions seem to be very rarely used, so I didn't enable
    them because I didn't have a good way to test them.
    
    The async commands need to be ordered against each other, so put them all on
    the compute queue. The non-async commands still use the transfer queue.
    
    The fence for graph_compute/get_tensor_async is submitted and waited on in
    ggml_vk_synchronize.
    
    * fix thread safety errors
    
    * teardown context cleanly
    
    * Handle async read to non-pinned dst
    jeffbolznv authored Nov 15, 2025
    Configuration menu
    Copy the full SHA
    38eaf32 View commit details
    Browse the repository at this point in the history
Loading