Kecho (@kechogarcia) / X

Kecho

1,723 posts

Kecho

@kechogarcia

Love rendering, games perf and all things GPU. Opinions in this account are strictly personal.

Orlando, FL

Joined November 2010

Kecho
@kechogarcia
Jan 4, 2025
3 lines of code in c, no expensive allocations, handful of instructions easier to understand. int begin = 0, end = 0, queue[max_sz]; void push(int e){queue[(end++) % max_sz]=e;} int pop(){return queue[(begin++)%max_sz];}
This post is unavailable.
116K
Kecho
@kechogarcia
Jan 28, 2025
Big tech ai slop is driven by web dudes, who worship abstraction. Efficient GPU programming requires thinking about memory, and webdudes are allergic to memory.
Jebrim
@AgileJebrim
Jan 28, 2025
Replying to @charles_print
Finally might be an exaggeration. I’m skeptical that all the Python folks in OpenAI will adapt and start becoming low level GPU optimizers, or letting themselves be bossed around by us. It’s hard to change the culture.
28K
Kecho
@kechogarcia
Aug 7, 2020
Don't miss out our #Siggraph2020 presentation about the lighting of Need for Speed in Frostbite. We will cover Global Illumination, Materials and Reflections! This was a lot of hard work, and we really hope you guys enjoy it :) s2020.siggraph.org/presentation/?…
Kecho
@kechogarcia
Nov 16, 2023
Demo showcasing GPU resident drawer, GPU occlusion culling and the new STP upscaler. Tech that my team (Weta realtime) has been relentlessly working on. More details coming soon :)
Unity
@unity
Nov 16, 2023
Replying to @unity
Check out Fantasy Kingdom in Unity 6, a stylized environment showcasing the latest capabilities for rendering, lighting, and scaling richer worlds, with significant performance improvements. This demo is a modified and expanded version of "Fantasy Kingdom" by @SyntyStudios (2/5)
00:00
57K
Kecho
@kechogarcia
Jan 6, 2025
Merge sort in C, roughly same amount of lines. And relatively trivial code. No expensive recursion. No expensive allocations. Only a simple scratch buffer. Easily parallelizable. Very cache friendly. gist.github.com/kecho/b3abcecd…
This post is unavailable.
25K
Kecho
@kechogarcia
Apr 21, 2021
Today I am releasing Noice. github.com/kecho/Noice A short side project written in #ISPC for windows & linux. It is a command line 2d/3d texture noise generation utility I hope its something useful, and feel free to try it. Binaries: github.com/kecho/Noice/re…
Kecho
@kechogarcia
Mar 27, 2023
14K
Kecho
@kechogarcia
Nov 30, 2020
Today concludes my last day at EA, after almost 10 years. Its bittersweet. I will miss everyone who made my experience so pleasant. On to the next adventure: I'll be joining @unity3d starting on January. Now time to relax...
Kecho
@kechogarcia
Jul 3, 2022
progress on my compute rasterizer (side proj) github.com/kecho/grr I am able to do everything on python + hlsl! also 5ms in a 2070 on the stanford dragon (2mil vertices) at 4k. More optimizations to come! I have also integrated parts of implot and more imgui functionality
Kecho
@kechogarcia
Jan 9, 2025
Don't do separable in compute, do it pixel, way faster. To match pxl, read & write in groups of 4 use Morton pattern. To beat pxl, no separable; MIP chain first, 4 mips at a time use lds. Poisson disk kernel, MIP index for large jumps. For large kernels prefer gaussian pyramid.
Acerola
@Acerola_t
Jan 7, 2025
Writing compute shaders for godot compositor effects has been pretty tiresome so I spent the past few days thinking about and writing an interpreter for a wrapper language for glsl compute shaders to allow multiple kernels in one shader file Since the godot shader compiler
19K
Kecho
@kechogarcia
Mar 15, 2021
Replying to @BartWronsk
Kecho
@kechogarcia
Dec 20, 2023
Stable radix sorting in the GPU (8 bits per radix). Not many accessible resources to do it stable. Here it goes: 1. Count & Scatter 2. Prefix batch table 3. Global Prefix 4. Scatter Output 5. Repeat 1-4 for each radix. (Details & code in thread)
21K
Kecho
@kechogarcia
Sep 15, 2021
Wishing all crypto farms would catch on fire. GPU dev here, can't get a 3090 to save my life. I give up.
Kecho
@kechogarcia
Jan 25, 2025
You can use mine, which is multi pass and handles arbitrary sizes as well as indirect, and uses wave intrinsics. There's also a new one from amd that is single pass!
Ben Sims
@BenSimsTech
Jan 24, 2025
It was difficult to find a simple, concise parallel prefix sum example online, even the GPU gems code was obscure and had bugs. I wrote a simple one, only handles a 1024 array, but can be expanded to larger arrays with multiple passes. Not the most efficient method, but simple
gpu_algorithms/gpu_algorithms/gpu/prefix_sum.hlsl at main · kecho/gpu_algorithms
From github.com
11K