Yangqing Jia (@jiayq) / X

Yangqing Jia

667 posts

Yangqing Jia

@jiayq

Founder & CEO @leptonai (acq'ed by NVIDIA). Built caffe, ONNX, PyTorch 1.0. Xoogler, Meta/Alibaba ex-exec. Views are my own and do not represent my employer.

Berkeley, CA

Joined April 2009

Yangqing Jia
@jiayq
Dec 26, 2024
In 2019 I had a chat with the DeepSeek team, in the hope of selling them an AI cloud solution. I was trying to convince them a few things: - you don't need complicated cloud virtualization, you just need containers and an efficient scheduler. - you will need really fast,
Andrej Karpathy
@karpathy
Dec 26, 2024
DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being
272K
Yangqing Jia
@jiayq
Jul 27, 2024
People often ask why prices like $2.8/m token for Llama 405B, while being super fast, are still profitable at @LeptonAI. We've even been asked by a leading GPU provider! So, I figured we should share some technical analysis. This information could benefit the community. We've
257K
Yangqing Jia
@jiayq
Jan 26, 2024
All right, fully open-source code, Apache license for anyone and any company to use freely: github.com/leptonai/searc… Our goal: enable creators and enterprises to build AI applications as easy as possible, like this search application. Happy Friday and have fun!
Yangqing Jia
@jiayq
Jan 24, 2024
Building an AI app has never been easier. Over the weekend, we built a demo for conversational search with <500 lines of python, and it's live at search.lepton.run. Give it a shot! Code to be open sourced soon as we clean up all "# temp scaffolds" stuff. (1/x)
00:00
GitHub - leptonai/search_with_lepton: Building a quick conversation-based search demo with Lepton...
From github.com
220K
Yangqing Jia
@jiayq
Jan 24, 2024
Building an AI app has never been easier. Over the weekend, we built a demo for conversational search with <500 lines of python, and it's live at search.lepton.run. Give it a shot! Code to be open sourced soon as we clean up all "# temp scaffolds" stuff. (1/x)
00:00
564K
Yangqing Jia
@jiayq
Feb 20, 2024
Probably the first operation cost analysis of owning @GroqInc hardware to run Llama2-70b. First of all, let me say I am a big fan of Groq. Great performance, great potential. The below is just a showcase how challenging things might be when rivaling the industry lead, but given
312K
Yangqing Jia
@jiayq
Dec 26, 2024
There is a lot of unconscious emphasis of the DeepSeek model being “Chinese” and implicit connection with the Sino-US relationship or the GPU power. In my eyes, the success of DeepSeek has little to do with that. It is simple intelligence and pragmatism at work: given a limit of
68K
Yangqing Jia
@jiayq
Mar 31, 2024
I probably have some credibility as a person who has worked on @TensorFlow and @PyTorch both (also, caffe / @onnxai / distbelief / a few others that never saw the light), so here are my two cents: (1) Speed doesn't really matter today as long as it is not particularly bad.
128K
Yangqing Jia
@jiayq
Jul 28, 2024
Accounting report on llama3 tokenomics. After my initial post, @swyx and @dylan522p had a great follow up question on the llama3 405b profitability. Read the original post here: x.com/swyx/status/18…. Hats off to @capetorch and @HamelHusain too! I love such posts - this is
146K
Yangqing Jia
@jiayq
May 3, 2024
The new ChatGPT search is killing the "simply call conventional search and summarize with LLM" wrappers. Why? It takes actions. Try searching with "what did the most popular post in hacker news say today?" and you'll see the difference. If you simply translate that to a google
310K
Yangqing Jia
@jiayq
Oct 6, 2023
(1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday. An average human reads 350 words per minute, which translates to 5.5 words per second.
118K
Yangqing Jia
@jiayq
Feb 10, 2023
Prompt engineering reminds me of the days when people try out different regularizers for their convex optimization problems.
61K
Yangqing Jia
@jiayq
Jul 24, 2024
Memory Matters for LLM. While everyone is rushing to provide the serverless Llama3-405b model, I want to talk about one key choice that matters a lot, especially for dedicated enterprise deployments when traffic is not very high: memory. - The normal deployment of a model the
51K
Yangqing Jia
@jiayq
Feb 25, 2024
I was asked why "GPU is an arbitrage market", so let me break down the definition to a few bullet points. (1) The conventional cloud cpu market is very friendly to on-demand uses: you request it, you use it, and you release it. Of course, you can reserve machines for 1-3 years
65K
Yangqing Jia
@jiayq
Jan 28, 2024
search.lepton.run code is trending on GitHub! We love open source and love to see how much we can help creators commoditize RAG and other conversational search technique. It’s not that complex - we definitely thought it was harder when we started!
30K