Skip to content

AI-Powered Relationally Aware Mock Data Generation for SQL Databases.

Notifications You must be signed in to change notification settings

Sectonic/DataFly-CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataFly-CLI: AI-Powered Relationally Aware Mock Data Generation

An intelligent CLI for analyzing SQL schemas and generating realistic, relationally-aware mock data for testing.

Overview

DataFly-CLI is a command-line tool that uses LLMs to understand the structure and relations of your SQL database schema. It automates the tedious process of creating realistic test data by generating relationally-consistent mock data tailored to your specific schema.

Features

  • LLM-Powered Schema Analysis: Automatically scans SQL schemas to create a detailed, human-readable YAML description and a machine-readable dependency graph.
  • Relational-Aware Data Generation: Creates mock data that respects foreign keys, one-to-one, one-to-many, and many-to-many relationships.
  • Interactive Generation Planning: Takes a simple user request (e.g., "50 users") and generates a complete, step-by-step plan for populating all required dependent tables.
  • Schema Refinement: Allows you to correct or enhance the AI's understanding of your schema's business logic using natural language commands.
  • Multi-Provider AI Support: Seamlessly switch between major LLM providers like Google Gemini, OpenAI, Anthropic, and Meta.

Tech Stack

  • Languages: Go, SQL
  • CLI Framework: Cobra
  • Terminal UI (TUI): Bubble Tea, Lipgloss
  • AI/LLM: LangChain

How It Works & Commands

DataFly operates by first analyzing your raw SQL schema to create two artifacts: a detailed YAML description and a dependency graph. This "understanding" can be inspected with describe and modified with refine. Based on this understanding, the generate command creates a plan which is then used to populate your database.


datafly config

Manages LLM provider configurations.

config set <provider> --key <api_key>

  • Usage: datafly config set google --key [YOUR_KEY]
  • Description: Securely saves the API key for a specified provider (google, openai, anthropic, meta) and sets it as the active provider for all AI-related operations.

config status

  • Usage: datafly config status
  • Description: Shows the currently configured LLM provider and confirms that an API key is present.

datafly connection

Manages saved database connections.

connection add <name> <database_uri>

  • Usage: datafly connection add staging-db "mysql://..."
  • Description: Saves a new database connection and performs the initial AI-powered scan to analyze its schema, generating the necessary configuration files.

connection list

  • Usage: datafly connection list (alias: ls)
  • Description: Lists all saved database connections by name and URI.

connection update <name>

  • Usage: datafly connection update staging-db
  • Description: Re-scans the schema of an existing connection to detect any changes (like new tables or columns) and updates the AI's understanding.

connection remove <name>

  • Usage: datafly connection remove staging-db (alias: rm)
  • Description: Deletes a saved connection and all of its associated files.

datafly describe

Inspects the AI's understanding of a schema.

describe <connection>

  • Usage: datafly describe staging-db
  • Description: Displays a detailed, interactive view of what the AI understands about your schema, including table purposes, column descriptions, and the generated faker tags for each field.

datafly refine

Refines the AI's understanding using natural language.

refine <connection>

  • Usage: datafly refine staging-db
  • Description: Opens an interactive prompt where you can provide natural language instructions to correct or enhance the AI's understanding of your schema's business logic. This cascades into an update of the final data generation code.

datafly generate

Generates mock data based on the schema understanding.

generate <connection>

  • Usage: datafly generate staging-db
  • Description: Opens an interactive interface where you can either specify what data to generate using natural language (e.g., "50 users and about 200 posts") or let it infer a plan to create a test version of the entire database. In all cases, the tool first creates a detailed step-by-step generation plan and presents it for your approval before executing.

Work in Progress

Please note that DataFly is currently under active development. Features and commands may change.

About

AI-Powered Relationally Aware Mock Data Generation for SQL Databases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages