DoNotFail | Devpost

Email chain showing the replies
My favorite email it sent

Inspiration

We've all been there - feeling as though we've been unjustly marked down on a project or exam, wishing we could contest the grade but not knowing how to articulate our arguments. This feeling was particularly common during late-night study sessions when discussing with peers. This shared sentiment amongst students became the catalyst for creating "DoNotFail," an AI agent that champions for academic fairness.

What it does

"DoNotFail" is an intelligent AI agent designed to review and analyze the feedback given on academic assignments. When a student feels they've been unfairly marked, they can turn to "DoNotFail" which will dissect the provided feedback, cross-reference with the student's answer or project, and then provide a structured argument detailing areas of potential oversight or misunderstanding by the TA. This ensures students have a well-informed perspective when approaching TAs or professors to discuss grades.

How we built it

Built on the Claude 2 chat model, we fine-tuned the model with a plethora of academic grading guidelines across various disciplines. Integrating with the popular file format called "PDF" it can read assignments, feedback and rubrics. We spoof being a user on ChatGPT to upload images of feedback from sites like CrowdMark which don't export PDFs properly and automatically extract the comments. All of this context is given to Claude 2, utilizing the 100k context length. The agent is tricked into thinking its acting in a play, pretending to be a student that was unfairly graded (to get around Claude's ethical concerns on arguing for marks) and then generates a plan. It will then infinitely email the TA following its plan.

Challenges we ran into

Using GPT4-V without API access. Ethan getting on to a bus half way through hackathon. Window's and Mac collaboration. Claude having strict guidelines.

Accomplishments that we're proud of

Hacking GPT4-V in the sketchiest way possible
Writing genuinely convincing emails that sound like students begging for grades
Using Claude instead of GPT4 for the agent

What we learned

The complexities of taming agents
Tricking agents into staying on task
Avoiding filters placed on LLMs (ie. "It is unethical to email your professor asking for marks back")