We're using Devpost only for project submissions. Please refer to https://lu.ma/judge for all details about the hackathon

Requirements

What to Build

This hackathon is focused on buildig the best LLM as a judge projects. Creating LLM as a Judge evaluations, improving on existing ones, implementing research in this field into code, creating UIs, running evaluations and improving them and more. 

​During this in-person hackathon, let's build LLM Judges together and move the field forward a little by: 

  • ​Productionizing the latest LLM-evaluator research

  • ​Improving on your existing judge

  • ​Building annotation UIs

  • ​Designing wireframes for collaborative annotation between humans and AI

What to Submit

Teams will be required to present their work in a short 3-5 minute presentation, all code must be public on Github, all Weights & Biases Weave projects must be set to visible as well and submitted together with the project files. 

Hackathon Sponsors

Prizes

$5,104 in prizes
First Prize
1 winner

Every member of the winning team will get a 1 year of Cursor pro subscription + Meta Rayban glasses of their choice.

Second Prize
1 winner

Every hacker on the second team will get 1 year of Cursor Pro and a Paddleton blanket

Third Prize
1 winner

1 year of Cursor Pro for each member of the team

Best Weave Usage bonus prize
1 winner

1 year of Cursor Pro

Devpost Achievements

Submitting to this hackathon could earn you:

Judges

​Greg Kamradt

​Greg Kamradt
Founder, Data Independent

Eugene Yan,

Eugene Yan,
Senior Applied Scientist, Amazon

Charles Frye

Charles Frye
AI Engineer, Modal Labs

​Shreya Shankar

​Shreya Shankar
ML Engineer, PhD at UC Berkeley

​Shawn Lewis

​Shawn Lewis
CTO and Co-founder, W&B

Anish Shah

Anish Shah
Growth ML Engineer, W&B

Tim Sweeney

Tim Sweeney
Staff Software Engineer, W&B

Judging Criteria

  • Creativity
    Anything from creative prompting, to system design and/or UX for llm-as-a-judge projects
  • Utility / Usefuleness
    How does this project affect the real world
  • Technical Implementation / Execution
    High level of technical ability, implementation of existing eval research
  • Presentation
    Team concisely delivers their project during presentation, github is open, weave dashboards and traces included, etc
  • Bonus : Weave usage

Questions? Email the hackathon manager

Tell your friends

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.