An accessibility application that provides real-time sign language-to-text and speech-to-text translation with virtual camera output for video conferencing applications.
- Real-time American Sign Language recognition using TensorFlow and MediaPipe
- Hand gesture detection and pose estimation
- Virtual camera output for use in Zoom, Teams, and other video conferencing apps
- Text-to-speech audio output for recognized signs
- Supports 9 sign language actions: agree, hello, no, ok, problem, question, thank you, understand, yes
- Real-time speech recognition using VOSK
- Multi-language support (10+ languages)
- Virtual camera output with live captions
- Automatic language model download
- English (en)
- Spanish (es)
- French (fr)
- German (de)
- Russian (ru)
- Chinese (zh)
- Japanese (ja)
- Portuguese (pt)
- Italian (it)
- Hindi (hi)
- macOS (tested on macOS with Python 3.12+)
- Webcam for sign language recognition
- Microphone for speech-to-text
- Python 3.12 or higher (with tkinter support)
See requirements.txt for the complete list. Key dependencies include:
opencv-python- Video processingmediapipe- Hand and pose detectiontensorflow- ML model for sign language recognitionpyvirtualcam- Virtual camera outputvosk- Speech recognitionpyttsx3- Text-to-speechtkinter- GUI (usually included with Python)
-
Clone the repository:
git clone <repository-url> cd audibly
-
Install Python dependencies:
pip install -r requirements.txt
-
Download VOSK language models: Models will be automatically downloaded when you first use speech-to-text for each language. Alternatively, you can download them manually from: https://alphacephei.com/vosk/models
-
Ensure model files are present:
wlasl_demo.keras- Sign language recognition model (should be in project root)actions.json- List of recognized sign language actions (should be in project root)
Run the main GUI application:
python3 main.pyNote: If you're using Python 3.12 without tkinter support, you may need to:
- Use a Python version with tkinter (e.g.,
python3which may be Python 3.14) - Or install tkinter for Python 3.12:
brew install python-tk@3.12
- Click the "Sign Language-to-Text" button in the GUI
- Position yourself in front of your webcam
- The application will:
- Detect your hands and recognize sign language gestures
- Display recognized words on screen
- Output text to a virtual camera (usable in video conferencing apps)
- Speak recognized words using text-to-speech
- Select your preferred language from the dropdown menu
- Click the "Speech-to-Text" button
- Start speaking - your speech will be:
- Converted to text in real-time
- Displayed as captions on a virtual camera
- Available for use in video conferencing applications
Both modes output to a virtual camera that can be selected in:
- Zoom
- Microsoft Teams
- Google Meet
- OBS Studio
- Any application that supports virtual cameras
audibly/
├── main.py # Main GUI application
├── asl.py # Sign language recognition script
├── speech_to_text.py # Speech-to-text script
├── actions.json # Sign language action labels
├── wlasl_demo.keras # Trained sign language model
├── requirements.txt # Python dependencies
├── ml/ # Machine learning training code
│ ├── test.py
│ ├── wlasl_demo.keras
│ └── requirements.txt
└── audibly-site/ # Website source code
└── src/
You can modify settings in asl.py:
SEQUENCE_LENGTH- Number of frames to analyze (default: 30)WINDOW- Majority vote window size (default: 10)IDLE_THRESH- Confidence threshold for idle detection (default: 0.4)COMMIT_THRESH- Confidence threshold for word commitment (default: 0.5)HOLD_TIME- Time in seconds before committing a word (default: 0.5)CLEAR_IDLE_SECONDS- Time before clearing sentence when hands are down (default: 10.0)
Language models are automatically downloaded on first use. Models are stored locally and reused for subsequent sessions.
The ml/ directory contains training code for custom sign language models. See the Jupyter notebook and test scripts for more details.
The audibly-site/ directory contains the project website source code (React/Vite).