Name: Weights & Biases Judgement Day Hackathon
Start: 2024-09-22T00:00:00.000-04:00
End: 2024-09-22T17:45:00.000-04:00
Location: Weights & Biases Judgement Day Hackathon

Weights & Biases Judgement Day Hackathon

Come together to make the best LLM Judge

Who can participate

Folks who were approved on lu.ma/judge
Above legal age of majority in country of residence
Only specific countries/territories included

View full rules

We're using Devpost only for project submissions. Please refer to https://lu.ma/judge for all details about the hackathon

Requirements

What to Build

This hackathon is focused on buildig the best LLM as a judge projects. Creating LLM as a Judge evaluations, improving on existing ones, implementing research in this field into code, creating UIs, running evaluations and improving them and more.

During this in-person hackathon, let's build LLM Judges together and move the field forward a little by:

Productionizing the latest LLM-evaluator research
Improving on your existing judge
Building annotation UIs
Designing wireframes for collaborative annotation between humans and AI

What to Submit

Teams will be required to present their work in a short 3-5 minute presentation, all code must be public on Github, all Weights & Biases Weave projects must be set to visible as well and submitted together with the project files.

Hackathon Sponsors

Prizes

$5,104 in prizes

First Prize

1 winner

Every member of the winning team will get a 1 year of Cursor pro subscription + Meta Rayban glasses of their choice.

Second Prize

1 winner

Every hacker on the second team will get 1 year of Cursor Pro and a Paddleton blanket

Third Prize

1 winner

1 year of Cursor Pro for each member of the team

Best Weave Usage bonus prize

1 winner

1 year of Cursor Pro

Judges

Greg Kamradt
Founder, Data Independent

Eugene Yan,
Senior Applied Scientist, Amazon

Charles Frye
AI Engineer, Modal Labs

Shreya Shankar
ML Engineer, PhD at UC Berkeley

Shawn Lewis
CTO and Co-founder, W&B

Anish Shah
Growth ML Engineer, W&B

Tim Sweeney
Staff Software Engineer, W&B

Judging Criteria

Creativity
Anything from creative prompting, to system design and/or UX for llm-as-a-judge projects
Utility / Usefuleness
How does this project affect the real world
Technical Implementation / Execution
High level of technical ability, implementation of existing eval research
Presentation
Team concisely delivers their project during presentation, github is open, weave dashboards and traces included, etc
Bonus : Weave usage

Weights & Biases SF office	Invite only
$5,104 in prizes	28 participants

Weights & Biases Judgement Day Hackathon

Come together to make the best LLM Judge

Who can participate

We're using Devpost only for project submissions. Please refer to https://lu.ma/judge for all details about the hackathon

Requirements

What to Build

What to Submit

Hackathon Sponsors

Prizes

First Prize

Second Prize

Third Prize

Best Weave Usage bonus prize

Devpost Achievements

Judges

Judging Criteria