An intelligent CLI for analyzing SQL schemas and generating realistic, relationally-aware mock data for testing.
DataFly-CLI is a command-line tool that uses LLMs to understand the structure and relations of your SQL database schema. It automates the tedious process of creating realistic test data by generating relationally-consistent mock data tailored to your specific schema.
- LLM-Powered Schema Analysis: Automatically scans SQL schemas to create a detailed, human-readable YAML description and a machine-readable dependency graph.
- Relational-Aware Data Generation: Creates mock data that respects foreign keys, one-to-one, one-to-many, and many-to-many relationships.
- Interactive Generation Planning: Takes a simple user request (e.g., "50 users") and generates a complete, step-by-step plan for populating all required dependent tables.
- Schema Refinement: Allows you to correct or enhance the AI's understanding of your schema's business logic using natural language commands.
- Multi-Provider AI Support: Seamlessly switch between major LLM providers like Google Gemini, OpenAI, Anthropic, and Meta.
- Languages: Go, SQL
- CLI Framework: Cobra
- Terminal UI (TUI): Bubble Tea, Lipgloss
- AI/LLM: LangChain
DataFly operates by first analyzing your raw SQL schema to create two artifacts: a detailed YAML description and a dependency graph. This "understanding" can be inspected with describe and modified with refine. Based on this understanding, the generate command creates a plan which is then used to populate your database.
Manages LLM provider configurations.
- Usage:
datafly config set google --key [YOUR_KEY] - Description: Securely saves the API key for a specified provider (
google,openai,anthropic,meta) and sets it as the active provider for all AI-related operations.
- Usage:
datafly config status - Description: Shows the currently configured LLM provider and confirms that an API key is present.
Manages saved database connections.
- Usage:
datafly connection add staging-db "mysql://..." - Description: Saves a new database connection and performs the initial AI-powered scan to analyze its schema, generating the necessary configuration files.
- Usage:
datafly connection list(alias:ls) - Description: Lists all saved database connections by name and URI.
- Usage:
datafly connection update staging-db - Description: Re-scans the schema of an existing connection to detect any changes (like new tables or columns) and updates the AI's understanding.
- Usage:
datafly connection remove staging-db(alias:rm) - Description: Deletes a saved connection and all of its associated files.
Inspects the AI's understanding of a schema.
- Usage:
datafly describe staging-db - Description: Displays a detailed, interactive view of what the AI understands about your schema, including table purposes, column descriptions, and the generated faker tags for each field.
Refines the AI's understanding using natural language.
- Usage:
datafly refine staging-db - Description: Opens an interactive prompt where you can provide natural language instructions to correct or enhance the AI's understanding of your schema's business logic. This cascades into an update of the final data generation code.
Generates mock data based on the schema understanding.
- Usage:
datafly generate staging-db - Description: Opens an interactive interface where you can either specify what data to generate using natural language (e.g., "50 users and about 200 posts") or let it infer a plan to create a test version of the entire database. In all cases, the tool first creates a detailed step-by-step generation plan and presents it for your approval before executing.
Please note that DataFly is currently under active development. Features and commands may change.