We're using Devpost only for project submissions. Please refer to https://lu.ma/judge for all details about the hackathon
Requirements
What to Build
This hackathon is focused on buildig the best LLM as a judge projects. Creating LLM as a Judge evaluations, improving on existing ones, implementing research in this field into code, creating UIs, running evaluations and improving them and more.
During this in-person hackathon, let's build LLM Judges together and move the field forward a little by:
-
Productionizing the latest LLM-evaluator research
-
Improving on your existing judge
-
Building annotation UIs
-
Designing wireframes for collaborative annotation between humans and AI
What to Submit
Teams will be required to present their work in a short 3-5 minute presentation, all code must be public on Github, all Weights & Biases Weave projects must be set to visible as well and submitted together with the project files.
Prizes
First Prize
Every member of the winning team will get a 1 year of Cursor pro subscription + Meta Rayban glasses of their choice.
Second Prize
Every hacker on the second team will get 1 year of Cursor Pro and a Paddleton blanket
Third Prize
1 year of Cursor Pro for each member of the team
Best Weave Usage bonus prize
1 year of Cursor Pro
Devpost Achievements
Submitting to this hackathon could earn you:
Judges
Greg Kamradt
Founder, Data Independent
Eugene Yan,
Senior Applied Scientist, Amazon
Charles Frye
AI Engineer, Modal Labs
Shreya Shankar
ML Engineer, PhD at UC Berkeley
Shawn Lewis
CTO and Co-founder, W&B
Anish Shah
Growth ML Engineer, W&B
Tim Sweeney
Staff Software Engineer, W&B
Judging Criteria
-
Creativity
Anything from creative prompting, to system design and/or UX for llm-as-a-judge projects -
Utility / Usefuleness
How does this project affect the real world -
Technical Implementation / Execution
High level of technical ability, implementation of existing eval research -
Presentation
Team concisely delivers their project during presentation, github is open, weave dashboards and traces included, etc -
Bonus : Weave usage
Questions? Email the hackathon manager
Tell your friends
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
