Log inSign up
Yangqing Jia
667 posts
Image
user avatar
Yangqing Jia
@jiayq
Founder & CEO @leptonai (acq'ed by NVIDIA). Built caffe, ONNX, PyTorch 1.0. Xoogler, Meta/Alibaba ex-exec. Views are my own and do not represent my employer.
Berkeley, CA
daggerfs.com
Joined April 2009
364
Following
19K
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • user avatar
    Yangqing Jia
    @jiayq
    Dec 26, 2024
    In 2019 I had a chat with the DeepSeek team, in the hope of selling them an AI cloud solution. I was trying to convince them a few things: - you don't need complicated cloud virtualization, you just need containers and an efficient scheduler. - you will need really fast,
    user avatar
    Andrej Karpathy
    @karpathy
    Dec 26, 2024
    DeepSeek (Chinese AI co) making it look easy today with an open weights release of a frontier-grade LLM trained on a joke of a budget (2048 GPUs for 2 months, $6M). For reference, this level of capability is supposed to require clusters of closer to 16K GPUs, the ones being
    272K
  • user avatar
    Yangqing Jia
    @jiayq
    Jul 27, 2024
    People often ask why prices like $2.8/m token for Llama 405B, while being super fast, are still profitable at @LeptonAI. We've even been asked by a leading GPU provider! So, I figured we should share some technical analysis. This information could benefit the community. We've
    257K
  • user avatar
    Yangqing Jia
    @jiayq
    Jan 26, 2024
    All right, fully open-source code, Apache license for anyone and any company to use freely: github.com/leptonai/searc… Our goal: enable creators and enterprises to build AI applications as easy as possible, like this search application. Happy Friday and have fun!
    user avatar
    Yangqing Jia
    @jiayq
    Jan 24, 2024
    Building an AI app has never been easier. Over the weekend, we built a demo for conversational search with <500 lines of python, and it's live at search.lepton.run. Give it a shot! Code to be open sourced soon as we clean up all "# temp scaffolds" stuff. (1/x)
    Image
    00:00
    Image
    GitHub - leptonai/search_with_lepton: Building a quick conversation-based search demo with Lepton...
    From github.com
    220K
  • user avatar
    Yangqing Jia
    @jiayq
    Jan 24, 2024
    Building an AI app has never been easier. Over the weekend, we built a demo for conversational search with <500 lines of python, and it's live at search.lepton.run. Give it a shot! Code to be open sourced soon as we clean up all "# temp scaffolds" stuff. (1/x)
    Image
    00:00
    564K
  • user avatar
    Yangqing Jia
    @jiayq
    Feb 20, 2024
    Probably the first operation cost analysis of owning @GroqInc hardware to run Llama2-70b. First of all, let me say I am a big fan of Groq. Great performance, great potential. The below is just a showcase how challenging things might be when rivaling the industry lead, but given
    312K
  • user avatar
    Yangqing Jia
    @jiayq
    Dec 26, 2024
    There is a lot of unconscious emphasis of the DeepSeek model being “Chinese” and implicit connection with the Sino-US relationship or the GPU power. In my eyes, the success of DeepSeek has little to do with that. It is simple intelligence and pragmatism at work: given a limit of
    68K
  • user avatar
    Yangqing Jia
    @jiayq
    Mar 31, 2024
    I probably have some credibility as a person who has worked on @TensorFlow and @PyTorch both (also, caffe / @onnxai / distbelief / a few others that never saw the light), so here are my two cents: (1) Speed doesn't really matter today as long as it is not particularly bad.
    128K
  • user avatar
    Yangqing Jia
    @jiayq
    Jul 28, 2024
    Accounting report on llama3 tokenomics. After my initial post, @swyx and @dylan522p had a great follow up question on the llama3 405b profitability. Read the original post here: x.com/swyx/status/18…. Hats off to @capetorch and @HamelHusain too! I love such posts - this is
    146K
  • user avatar
    Yangqing Jia
    @jiayq
    May 3, 2024
    The new ChatGPT search is killing the "simply call conventional search and summarize with LLM" wrappers. Why? It takes actions. Try searching with "what did the most popular post in hacker news say today?" and you'll see the difference. If you simply translate that to a google
    Image
    310K
  • user avatar
    Yangqing Jia
    @jiayq
    Oct 6, 2023
    (1/n) We are drastically overestimating the cost of LLMs, because we sometimes over-focus for single-query speed. Had the privilege to talk about this topic at the #vllm meetup yesterday. An average human reads 350 words per minute, which translates to 5.5 words per second.
    Image
    118K
  • user avatar
    Yangqing Jia
    @jiayq
    Feb 10, 2023
    Prompt engineering reminds me of the days when people try out different regularizers for their convex optimization problems.
    61K
  • user avatar
    Yangqing Jia
    @jiayq
    Jul 24, 2024
    Memory Matters for LLM. While everyone is rushing to provide the serverless Llama3-405b model, I want to talk about one key choice that matters a lot, especially for dedicated enterprise deployments when traffic is not very high: memory. - The normal deployment of a model the
    Image
    51K
  • user avatar
    Yangqing Jia
    @jiayq
    Feb 25, 2024
    I was asked why "GPU is an arbitrage market", so let me break down the definition to a few bullet points. (1) The conventional cloud cpu market is very friendly to on-demand uses: you request it, you use it, and you release it. Of course, you can reserve machines for 1-3 years
    65K
  • user avatar
    Yangqing Jia
    @jiayq
    Jan 28, 2024
    search.lepton.run code is trending on GitHub! We love open source and love to see how much we can help creators commoditize RAG and other conversational search technique. It’s not that complex - we definitely thought it was harder when we started!
    Image
    30K
Advertisement
Advertisement