|
CUDA Green Context API | Memory Footprint
|
|
2
|
53
|
December 17, 2025
|
|
TensorFlow + RTX 5090 + WSL: CUDA 12 Installed in WSL but Windows Driver Uses CUDA 13
|
|
2
|
59
|
December 17, 2025
|
|
Looking for advice for CUDA performance tracking in CI/CD pipelines
|
|
2
|
17
|
December 17, 2025
|
|
Pinned memory throughput significantly lower on Ubuntu than on Windows
|
|
22
|
183
|
December 17, 2025
|
|
Displaymodeselector + RTX PRO 6000 blackwell workstation edition
|
|
4
|
133
|
December 16, 2025
|
|
Double4 is deprecated, but the preferred double4_32a is unrecognized?
|
|
6
|
24
|
December 16, 2025
|
|
How to sync Cuda and Vulkan?
|
|
2
|
21
|
December 16, 2025
|
|
Nvcc, syntax error in cuda.h(7451): error: expected a ")"
|
|
3
|
49
|
December 16, 2025
|
|
Wmma vs Wgmma On H100 GPU
|
|
4
|
29
|
December 15, 2025
|
|
Thrust device allocator vs std allocator
|
|
3
|
38
|
December 15, 2025
|
|
Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?
|
|
4
|
72
|
December 15, 2025
|
|
Weekend project: Very accurate double-precision sincos() implementation for a restricted domain
|
|
0
|
23
|
December 14, 2025
|
|
Pixel Shader vs NPP - Which is faster for batch processing NV12 to RGB conversions and display directly to screen?
|
|
5
|
63
|
December 14, 2025
|
|
Fedora 43 and NVCC / Cuda13.1 error "exception specification is incompatible" rsqrt / rsqrtf
|
|
0
|
46
|
December 13, 2025
|
|
Register usage spike in SASS with divison slow/full path
|
|
13
|
204
|
December 12, 2025
|
|
RTX 5090 not working with PyTorch and Stable Diffusion (sm_120 unsupported)
|
|
10
|
8062
|
December 12, 2025
|
|
Need a Windows 11 Driver for a M10
|
|
3
|
25
|
December 12, 2025
|
|
Question about the cacheConfig value in nsight systems
|
|
6
|
52
|
December 12, 2025
|
|
Is the CUDA tile kernel submitted to GPU still using the cuLaunchKernel?
|
|
2
|
50
|
December 12, 2025
|
|
Unexpected results on cub::DeviceRadixSort::SortKeys and SortPairs with 128 bit keys
|
|
5
|
22
|
December 12, 2025
|
|
How many tensor cores to execute the wmma.mma.sync.aligned.{alayout}.{blayout}.m16n16k16 instruction?
|
|
23
|
117
|
December 12, 2025
|
|
Cuda runfile won't extract
|
|
4
|
128
|
December 11, 2025
|
|
Compiling magma on Jetson Thor
|
|
0
|
13
|
December 11, 2025
|
|
__frsqrt_rn is not accurate 0.5ulp? I found a number
|
|
4
|
42
|
December 10, 2025
|
|
FFMA with Uniform register
|
|
3
|
72
|
December 9, 2025
|
|
Can't install CUDA and Nsight - Visual Studio or what? (Updated)
|
|
4
|
132
|
December 9, 2025
|
|
Is it possible having compressible memory & memory pools over the same array on device?
|
|
0
|
28
|
December 9, 2025
|
|
CUDA-Q kernel crashes on Tesla V100 (Driver 570.133 / CUDA 12.8) when running VQE
|
|
0
|
20
|
December 9, 2025
|
|
cudaMemcpyBatchAsync cannot aggregate D2D copy operations
|
|
13
|
111
|
December 9, 2025
|
|
Training YOLO in the background
|
|
1
|
44
|
December 8, 2025
|