Audibly

An accessibility application that provides real-time sign language-to-text and speech-to-text translation with virtual camera output for video conferencing applications.

Features

👐 Sign Language-to-Text (ASL)

Real-time American Sign Language recognition using TensorFlow and MediaPipe
Hand gesture detection and pose estimation
Virtual camera output for use in Zoom, Teams, and other video conferencing apps
Text-to-speech audio output for recognized signs
Supports 9 sign language actions: agree, hello, no, ok, problem, question, thank you, understand, yes

🎤 Speech-to-Text

Real-time speech recognition using VOSK
Multi-language support (10+ languages)
Virtual camera output with live captions
Automatic language model download

🌐 Supported Languages (Speech-to-Text)

English (en)
Spanish (es)
French (fr)
German (de)
Russian (ru)
Chinese (zh)
Japanese (ja)
Portuguese (pt)
Italian (it)
Hindi (hi)

Requirements

System Requirements

macOS (tested on macOS with Python 3.12+)
Webcam for sign language recognition
Microphone for speech-to-text
Python 3.12 or higher (with tkinter support)

Python Dependencies

See requirements.txt for the complete list. Key dependencies include:

opencv-python - Video processing
mediapipe - Hand and pose detection
tensorflow - ML model for sign language recognition
pyvirtualcam - Virtual camera output
vosk - Speech recognition
pyttsx3 - Text-to-speech
tkinter - GUI (usually included with Python)

Installation

Clone the repository:
```
git clone <repository-url>
cd audibly
```
Install Python dependencies:
```
pip install -r requirements.txt
```
Download VOSK language models: Models will be automatically downloaded when you first use speech-to-text for each language. Alternatively, you can download them manually from: https://alphacephei.com/vosk/models
Ensure model files are present:
- wlasl_demo.keras - Sign language recognition model (should be in project root)
- actions.json - List of recognized sign language actions (should be in project root)

Usage

Starting the Application

Run the main GUI application:

python3 main.py

Note: If you're using Python 3.12 without tkinter support, you may need to:

Use a Python version with tkinter (e.g., python3 which may be Python 3.14)
Or install tkinter for Python 3.12: brew install python-tk@3.12

Using Sign Language-to-Text

Click the "Sign Language-to-Text" button in the GUI
Position yourself in front of your webcam
The application will:
- Detect your hands and recognize sign language gestures
- Display recognized words on screen
- Output text to a virtual camera (usable in video conferencing apps)
- Speak recognized words using text-to-speech

Using Speech-to-Text

Select your preferred language from the dropdown menu
Click the "Speech-to-Text" button
Start speaking - your speech will be:
- Converted to text in real-time
- Displayed as captions on a virtual camera
- Available for use in video conferencing applications

Virtual Camera

Both modes output to a virtual camera that can be selected in:

Zoom
Microsoft Teams
Google Meet
OBS Studio
Any application that supports virtual cameras

Project Structure

audibly/
├── main.py              # Main GUI application
├── asl.py               # Sign language recognition script
├── speech_to_text.py    # Speech-to-text script
├── actions.json         # Sign language action labels
├── wlasl_demo.keras     # Trained sign language model
├── requirements.txt     # Python dependencies
├── ml/                  # Machine learning training code
│   ├── test.py
│   ├── wlasl_demo.keras
│   └── requirements.txt
└── audibly-site/        # Website source code
    └── src/

Configuration

Sign Language Recognition

You can modify settings in asl.py:

SEQUENCE_LENGTH - Number of frames to analyze (default: 30)
WINDOW - Majority vote window size (default: 10)
IDLE_THRESH - Confidence threshold for idle detection (default: 0.4)
COMMIT_THRESH - Confidence threshold for word commitment (default: 0.5)
HOLD_TIME - Time in seconds before committing a word (default: 0.5)
CLEAR_IDLE_SECONDS - Time before clearing sentence when hands are down (default: 10.0)

Speech Recognition

Language models are automatically downloaded on first use. Models are stored locally and reused for subsequent sessions.

Development

Training Custom Sign Language Models

The ml/ directory contains training code for custom sign language models. See the Jupyter notebook and test scripts for more details.

Website

The audibly-site/ directory contains the project website source code (React/Vite).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audibly

Features

👐 Sign Language-to-Text (ASL)

🎤 Speech-to-Text

🌐 Supported Languages (Speech-to-Text)

Requirements

System Requirements

Python Dependencies

Installation

Usage

Starting the Application

Using Sign Language-to-Text

Using Speech-to-Text

Virtual Camera

Project Structure

Configuration

Sign Language Recognition

Speech Recognition

Development

Training Custom Sign Language Models

Website

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
audibly-site		audibly-site
ml		ml
models		models
.gitignore		.gitignore
README.md		README.md
actions.json		actions.json
asl.py		asl.py
main.py		main.py
requirements.txt		requirements.txt
speech_to_text.py		speech_to_text.py
wlasl_demo.keras		wlasl_demo.keras

rumezaa/audibly

Folders and files

Latest commit

History

Repository files navigation

Audibly

Features

👐 Sign Language-to-Text (ASL)

🎤 Speech-to-Text

🌐 Supported Languages (Speech-to-Text)

Requirements

System Requirements

Python Dependencies

Installation

Usage

Starting the Application

Using Sign Language-to-Text

Using Speech-to-Text

Virtual Camera

Project Structure

Configuration

Sign Language Recognition

Speech Recognition

Development

Training Custom Sign Language Models

Website

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages