Log inSign up
Weco AI
74 posts
Image
user avatar
Weco AI
@WecoAI
The platform for self-improving code
weco.ai
Joined April 2023
6
Following
1,911
Followers
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    20h
    This year @WecoAI will be at the @aiDotEngineer World's Fair. We'll host a hands-on autoresearch workshop on June 29. And I'll give a talk on July 1. Looking forward to chatting with old and new friends there!
    Image
    AI Engineer World's Fair 2026: June 29 - July 2, San Francisco
    From ai.engineer
    3.4K
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Jun 22
    Production autoresearch is usually killed by reward hacking or side effects. But we still see a pattern that survives: the unit been evaled is functional or near-functional code. Some examples: (1/5)🧵
    5.1K
  • user avatar
    Weco AI
    @WecoAI
    Jun 16
    We're thrilled to welcome Vayum Arora to Weco AI as our growth lead! We couldn't be more excited to bring on Vayum's mix of frontier engineering and business instinct as we continue to grow. Welcome to Weco, @vayum_arora!
    Image
    2.6K
  • user avatar
    Weco AI
    @WecoAI
    Jun 14
    Sharing some of our internal benchmark results
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Jun 14
    We benchmarked 7 frontier models on 3 categories of autoresearch tasks: ML engineering, harness/prompt engineering, and algorithmic discovery. Fable-5 won overall even under cost constraint, but on ML engineering, the open model Kimi-K2.7-Code surpassed frontier models.🧵(1/5)
    Image
    1.2K
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Jun 6
    Replying to @sanmking and @KyleVedder
    Haha, thanks for remembering Weco Observe! We’ve been working in the autoresearch space for about three years, though, before it even had this name. It actually started with:
    Image
    GitHub - WecoAI/aideml: AIDE: AI-Driven Exploration in the Space of Code. The machine Learning...
    From github.com
    630
  • user avatar
    Weco AI
    @WecoAI
    Jun 3
    Autoresearch can hill-climb a private benchmark. The real question is: can an AI agent do research that the community can trust and build on?
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Jun 3
    OpenAI ran a hiring challenge, but the top candidate was one they couldn’t hire: our autonomous research agent, Aiden. In Parameter Golf, Aiden ran for 22 days, and out-outperformed all 1,016 other researchers: 🧵 (1/8)
    Image
    00:00
    1.4K
  • user avatar
    Weco AI
    @WecoAI
    May 21
    Introducing SpecBench: the first benchmark for measuring reward hacking in long-horizon coding agents. Key finding: reward hacking is driven not by test coverage, but by the gap between task difficulty and model capability: 🧵(1/8)
    Image
    13K
    user avatar
    Weco AI
    @WecoAI
    May 21
    Replying to @WecoAI
    Some practical suggestions for anyone running Ralph loop, /goal, autoresearch or weco: 1. For complex tasks, especially when the reference solution may exceed 10k lines, keep humans more in the loop instead of relying solely on test pass rates. 2. For complex tasks, choose the
    561
    user avatar
    Weco AI
    @WecoAI
    May 21
    More details Blog post: weco.ai/blog/specbench Paper: arxiv.org/abs/2605.21384 Github repo: github.com/WecoAI/SpecBen… (8/8)
    Image
    608
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Apr 21
    Comparing Opus 4.7 vs 4.6 on AutoResearch. Opus 4.7 isn't significantly more sample-efficient, but is surprisingly cheaper due to fewer function calls. Details in 🧵(1/4)
    Image
    9.6K
  • user avatar
    Weco AI
    @WecoAI
    Apr 2
    Time to try autoresearch if you're tuning hyper-parameters
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Apr 2
    Is autoresearch really better than classic hyperparameter tuning? We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better: 🧵(1/6)
    Image
    1.2K
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Apr 2
    Is autoresearch really better than classic hyperparameter tuning? We did experiments comparing Optuna & autoresearch. Autoresearch converges faster, is more cost-efficient, and even generalizes better: 🧵(1/6)
    Image
    136K
  • Weco AI reposted
    user avatar
    Geek Lite
    @QingQ77
    Mar 28
    awesome-autoresearch — 一个把 AutoResearch 真实应用案例和开源实现整理到一起的索引仓库 帮你快速看清 AutoResearch 这套 loop 到底已经被迁到了哪些任务上。 从 nanoGPT 训练,到 Shopify Liquid、CUDA kernel、voice agent prompt、表格建模,覆盖面比我预想的大。
    Image
    GitHub - WecoAI/awesome-autoresearch: Curated list of AutoResearch use cases with optimization...
    From github.com
    31K
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Mar 28
    AutoResearch is a general purpose code optimizer, and math formulas can also be expressed as code. The emerging use case of formula discovery is really interesting, give it empirical data and let the agent search for math expressions that fit. Examples 🧵(1/5):
    15K
  • Weco AI reposted
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Mar 22
    The replies surfaced a lot of amazing use cases, more than I expected. There must be more outside my radar. Creating a curated list here, PRs welcome for your own use cases, ideally with traces so the community can verify! github.com/WecoAI/awesome…
    Image
    user avatar
    Zhengyao Jiang
    @zhengyaojiang
    Mar 21
    Autoresearch has been out for 2 weeks. The community is trying to apply it to everything with a measurable metric, here are some successful attempts: 🧵 (1/6)
    28K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
Advertisement
Advertisement