ChatMD | Devpost

Inspiration

Many people use LLMs to get information about possible causes of a set of symptoms, in the same way that they'd use WebMD. However, the incentive of large language models to predict text in a way that is rated highly by humans, while generally tending to make their content helpful and honest, is not necessarily associated with truthful information. Meanwhile, although WebMD's system is likely based on a specific database of vetted facts, the user experience, absent a natural language environment enabled by LLMs, is quite lacking.

What it does

ChatMD takes in a list of symptoms from the user, and comes up with possible diagnoses. For each diagnosis, it finds a reliable source on the diagnosis, and then checks to see if this source says that the diagnosis is compatible with the symptoms. Finally, it outputs an explanation of where this diagnosis fits and doesn't fit the symptom descriptions, as well as other symptoms that may confirm the diagnosis.

How we built it

The back-end starts out by taking a user's list of symptoms and inputting it into GPT-4 to get some possible diagnoses. From there, these diagnoses are parsed into a Python list using some string slicing alongside GPT-3.5-turbo. Then, five possible links are found as resources for each diagnosis, by using GPT-4. After that, for each disease, the first link that isn't either broken or attached to too long of a website to parse is scraped to find the raw HTML, and this HTML is parsed into a more accessible format first by using libraries to remove any HTML that isn't text, and then by using GPT-3.5-turbo to remove top bars, sidebars, and other unnecessary text. Finally, this website is passed into GPT-3.5-turbo alongside the suspected disease and the reported symptoms, and an explanation of this disease and its consistency with the reported symptoms is given to the user.

Challenges we ran into

We tried at first to use IBM's Watson models for inference, but the models' APIs couldn't be easily accessed, so we decided on GPT instead. From there, the main issues were related to prompt engineering: some of the prompts didn't give the kind of answers we wanted, and even more were inconsistent in their answers. Nevertheless, we managed to get some fairly functional prompts up and running. Finally, we ran into some challenges integrating the front and back ends, and we ended up getting our project working.

Accomplishments that we're proud of

We're proud that we have implemented an open area of work in large language models - linking their inference up with ground truth - in a short amount of time, and without requiring specialty medical APIs. We're also proud that we managed to implement both a front end and a back end for an application in a format that is ready to be pushed to the Web. Finally, we're proud that we're improving the state of medical information tools using cutting-edge technology in a way that leverages the advantages of LLMs and the advantages of reliable medical resources.

What we learned

We learned that different GPT models are useful for different tasks, and that this mostly depends on desired speed, desired reasoning in a response, and desired context window. We also learned that web scraping is a highly effective way to call on existing information in a way that doesn't involve API calls. Finally, we learned a bit about the challenges involved in building such an app: speed, reliability, and integration with an effective front end.

What's next for ChatMD

The next step is to integrate more functionalities for feedback and dialog between the user and the chatbot. This would involve asking clarifying questions about possible symptoms to narrow down the list of possible diagnoses to a few most likely ones. There are also some issues with speed and reliability which would probably require a bunch of small fixes to successfully solve. Another future avenue would involve integration with physical health sensors from additional wearables.