Log inSign up
Kecho
1,723 posts
Image
user avatar
Kecho
@kechogarcia
Love rendering, games perf and all things GPU. Opinions in this account are strictly personal.
Orlando, FL
Joined November 2010
723
Following
2,675
Followers
  • user avatar
    Kecho
    @kechogarcia
    Jan 4, 2025
    3 lines of code in c, no expensive allocations, handful of instructions easier to understand. int begin = 0, end = 0, queue[max_sz]; void push(int e){queue[(end++) % max_sz]=e;} int pop(){return queue[(begin++)%max_sz];}
    This post is unavailable.
    116K
  • user avatar
    Kecho
    @kechogarcia
    Jan 28, 2025
    Big tech ai slop is driven by web dudes, who worship abstraction. Efficient GPU programming requires thinking about memory, and webdudes are allergic to memory.
    user avatar
    Jebrim
    @AgileJebrim
    Jan 28, 2025
    Replying to @charles_print
    Finally might be an exaggeration. I’m skeptical that all the Python folks in OpenAI will adapt and start becoming low level GPU optimizers, or letting themselves be bossed around by us. It’s hard to change the culture.
    28K
  • user avatar
    Kecho
    @kechogarcia
    Aug 7, 2020
    Don't miss out our #Siggraph2020 presentation about the lighting of Need for Speed in Frostbite. We will cover Global Illumination, Materials and Reflections! This was a lot of hard work, and we really hope you guys enjoy it :) s2020.siggraph.org/presentation/?…
    Image
  • user avatar
    Kecho
    @kechogarcia
    Nov 16, 2023
    Demo showcasing GPU resident drawer, GPU occlusion culling and the new STP upscaler. Tech that my team (Weta realtime) has been relentlessly working on. More details coming soon :)
    user avatar
    Unity
    @unity
    Nov 16, 2023
    Replying to @unity
    Check out Fantasy Kingdom in Unity 6, a stylized environment showcasing the latest capabilities for rendering, lighting, and scaling richer worlds, with significant performance improvements. This demo is a modified and expanded version of "Fantasy Kingdom" by @SyntyStudios (2/5)
    Image
    00:00
    57K
  • user avatar
    Kecho
    @kechogarcia
    Jan 6, 2025
    Merge sort in C, roughly same amount of lines. And relatively trivial code. No expensive recursion. No expensive allocations. Only a simple scratch buffer. Easily parallelizable. Very cache friendly. gist.github.com/kecho/b3abcecd…
    Image
    This post is unavailable.
    25K
  • user avatar
    Kecho
    @kechogarcia
    Apr 21, 2021
    Today I am releasing Noice. github.com/kecho/Noice A short side project written in #ISPC for windows & linux. It is a command line 2d/3d texture noise generation utility I hope its something useful, and feel free to try it. Binaries: github.com/kecho/Noice/re…
    Image
  • user avatar
    Kecho
    @kechogarcia
    Mar 27, 2023
    Image
    14K
  • user avatar
    Kecho
    @kechogarcia
    Nov 30, 2020
    Today concludes my last day at EA, after almost 10 years. Its bittersweet. I will miss everyone who made my experience so pleasant. On to the next adventure: I'll be joining @unity3d starting on January. Now time to relax...
  • user avatar
    Kecho
    @kechogarcia
    Jul 3, 2022
    progress on my compute rasterizer (side proj) github.com/kecho/grr I am able to do everything on python + hlsl! also 5ms in a 2070 on the stanford dragon (2mil vertices) at 4k. More optimizations to come! I have also integrated parts of implot and more imgui functionality
    Image
  • user avatar
    Kecho
    @kechogarcia
    Jan 9, 2025
    Don't do separable in compute, do it pixel, way faster. To match pxl, read & write in groups of 4 use Morton pattern. To beat pxl, no separable; MIP chain first, 4 mips at a time use lds. Poisson disk kernel, MIP index for large jumps. For large kernels prefer gaussian pyramid.
    user avatar
    Acerola
    @Acerola_t
    Jan 7, 2025
    Writing compute shaders for godot compositor effects has been pretty tiresome so I spent the past few days thinking about and writing an interpreter for a wrapper language for glsl compute shaders to allow multiple kernels in one shader file Since the godot shader compiler
    Image
    19K
  • user avatar
    Kecho
    @kechogarcia
    Mar 15, 2021
    Replying to @BartWronsk
    Image
  • user avatar
    Kecho
    @kechogarcia
    Dec 20, 2023
    Stable radix sorting in the GPU (8 bits per radix). Not many accessible resources to do it stable. Here it goes: 1. Count & Scatter 2. Prefix batch table 3. Global Prefix 4. Scatter Output 5. Repeat 1-4 for each radix. (Details & code in thread)
    21K
  • user avatar
    Kecho
    @kechogarcia
    Sep 15, 2021
    Wishing all crypto farms would catch on fire. GPU dev here, can't get a 3090 to save my life. I give up.
  • user avatar
    Kecho
    @kechogarcia
    Jan 25, 2025
    You can use mine, which is multi pass and handles arbitrary sizes as well as indirect, and uses wave intrinsics. There's also a new one from amd that is single pass!
    user avatar
    Ben Sims
    @BenSimsTech
    Jan 24, 2025
    It was difficult to find a simple, concise parallel prefix sum example online, even the GPU gems code was obscure and had bugs. I wrote a simple one, only handles a 1024 array, but can be expanded to larger arrays with multiple passes. Not the most efficient method, but simple
    Image
    Image
    gpu_algorithms/gpu_algorithms/gpu/prefix_sum.hlsl at main · kecho/gpu_algorithms
    From github.com
    11K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement