Inspiration
Many disabled senior patients require specialized care techniques that are developed with years of connection with their caregiver. When faced with absences of their caregiver, even slight changes in caretaking routines can trouble patients, especially patients with dementia and other unique disabilities. We want to bridge the gap between familiar caretakers and temporary substitutes to ensure smooth transition. These patients who cannot bear even the smallest changes in their daily routine can develop escalated symptoms when stressed out, which may escalate their conditions.
What it does
Our AI caregiving assistant is trained on videos of primary caregivers performing activities of daily living (ADLs) tailored to each patient's needs. In the absence of a primary caregiver, the AI can guide substitute caregivers with clear, step-by-step instructions specific to the patient. The AI monitors each action and, if a caregiver deviates from the patient’s preferred method, it provides corrections to ensure proper care.
How we built it
We deployed the hands-free feature of Meta Ray-Ban smart glasses to effortlessly capture egocentric videos of caregivers performing Activities of Daily Living (ADLs) for disabled patients. These hands-free recordings provide essential training data for enhancing AI-assisted caregiving. To process and analyze this data, we fine-tuned a Large Language Model (LLM) and Vision-Language Model (VLM) using Gemini’s multimodal API, which allows the model to interpret and reason across multiple input modalities, including video and audio.
For fine-tuning, we employed Gemini’s Video Analyzer API to extract detailed scene descriptions from the training videos. This enables the AI to build a structured understanding of each caregiving scenario based on scene ground truth, ensuring accurate context recognition. Using one-shot and few-shot learning, the model learns from a limited set of examples to generalize effectively across various caregiving tasks. With the extracted text from the video analysis, we conducted extensive prompt engineering to optimize Gemini’s response quality and contextual accuracy.
Our front-end application integrates WebRTC as the backend, enabling real-time video and audio streaming to the API. The AI processes these inputs using action recognition, allowing it to generate step-by-step caregiving instructions based on detected movements and activities. This ensures caregivers receive precise, context-aware guidance in real time, improving patient care while minimizing the need for extensive manual training.
Challenges we ran into
One of the main challenges we faced was creating user personas for seniors with psychological and physical disabilities. Understanding the specific challenges these individuals face was crucial for developing an AI assistant that could provide relevant and effective support. Our team lacked expertise in occupational therapy and physical therapy, making it difficult to validate our approach with professional insights. Consulting experts on best caregiving practices was challenging, especially given the limited weekend availability of professionals, which restricted our ability to seek feedback during development.
Selecting the most appropriate ADLs to record for our training dataset was another obstacle, as we had to determine which tasks would be most beneficial for caregivers while ensuring the AI could generalize across different scenarios.
Recording the dataset itself proved to be difficult, as recreating ADL scenarios required realistic acting and proper execution of caregiving techniques. Since our AI assistant is multimodal, building a real-time system that could accurately interpret video and audio inputs while providing clear and concise caregiving instructions was a complex technical challenge. Ensuring that the AI could understand the context of a situation and generate meaningful guidance required extensive testing and refinement.
Accomplishments that we're proud of
One of our biggest accomplishments was gaining a deep understanding of caregivers' challenges by consulting professional physical therapists (PTs) and occupational therapists (OTs) and conducting extensive research on disabilities and dementia. This allowed us to design an AI assistant that genuinely addresses real-world caregiving needs.
Another major achievement was successfully fine-tuning an accurate multimodal AI assistant with a minimal dataset. Despite the limited amount of training data, we leveraged few-shot learning techniques to achieve highly accurate results, demonstrating the efficiency of our approach.
We also managed to strike the right balance between generalization and personalization in caregiving tasks. The AI assistant can understand the unique requirements of an individual patient while remaining adaptable to new, unseen circumstances, much like a professional caregiver who deeply understands their patient.
Lastly, we took on the challenge of learning and implementing state-of-the-art technologies, including advancements in physical AI, which enhances the AI’s ability to interpret and respond to real-world caregiving scenarios with precision and contextual awareness.
Built With
- copilot
- gemini
- github
- gpt
- javascript
- python
- ray-ban
- vscode
- webrtc
Log in or sign up for Devpost to join the conversation.