liteparser

liteparser is a highly efficient and fully complete SQLite SQL parser. It parses any SQLite SQL into a fully visitable AST, with JSON serialization, SQL unparsing (AST to SQL), tolerant/IDE parsing, and AST mutation, all in pure C with zero dependencies.

Easily embeddable: a single static library (libliteparser.a) or shared library, with one header file. No external dependencies, no runtime allocations outside the arena. Drop it into any C, C++, or FFI-capable project.

Features

Complete SQLite grammar: SELECT, INSERT, UPDATE, DELETE, CREATE TABLE/INDEX/VIEW/TRIGGER, ALTER, DROP, PRAGMA, transactions, CTEs, window functions, virtual tables, UPSERT, RETURNING, and every other SQLite statement
Arena-allocated AST: all nodes allocated from a single arena; one call frees everything. No individual free() calls, no memory leaks
JSON serialization: convert any AST to JSON (lp_ast_to_json)
SQL unparsing: convert AST back to SQL text with 100% round-trip fidelity (lp_ast_to_sql)
Tolerant parsing: continues past syntax errors, collecting all errors and valid statements (lp_parse_tolerant), designed for IDE/linter integration
Structured error diagnostics: error code, human-readable message, and precise source range (line, column, byte offset)
AST mutation: allocate, clone, insert, remove, and replace nodes programmatically
Visitor pattern: depth-first traversal with enter/leave callbacks and flow control (skip children, abort)
C++ compatible: all headers wrapped in extern "C"

Parser Origin

The parser grammar is extracted from the original SQLite parser (parse.y) with minimal modifications. It uses the same Lemon LALR(1) parser generator and a hand-written tokenizer derived from SQLite's own lexer. This ensures complete and accurate coverage of the SQLite SQL dialect.

Performance

liteparser is built for speed. The Lemon-generated LALR(1) parser with a hand-written tokenizer produces minimal allocations through arena-based memory.

Metric	Value
Parse throughput	~424,000 statements/sec
Parse + round-trip	~99,000 statements/sec
Full test suite (22,694 SQL)	< 0.1 seconds
Memory overhead	~200 bytes per node (arena-allocated, no per-node overhead)

Benchmarked on Apple Silicon. Performance scales linearly with input size.

Memory Model

All AST nodes, strings, and internal structures are allocated from an arena_t. The arena allocates memory in large blocks (default 64 KB) and hands out sub-allocations with bump-pointer speed. When you're done with the AST, a single arena_destroy() call frees everything instantly, no tree walking, no reference counting, no GC.

arena_t *arena = arena_create(64 * 1024);  // 64 KB initial block

LpNode *ast = lp_parse("SELECT 1", arena, &err);
// ... use the AST ...

arena_destroy(arena);  // frees everything in one call

This model makes liteparser ideal for request-scoped parsing (parse, process, destroy) and ensures zero memory leaks regardless of the AST complexity.

Testing

liteparser is tested against SQLite's own test suite, the most comprehensive SQL test corpus in existence:

Metric	Value
Test files scanned	1,227 SQLite `.test`, `.tcl`, and `.sql` files
SQL statements extracted	22,694 unique statements
Round-trip success	100% (22,186 / 22,186 fully-parsed statements)
Crashes	0 on any input

Every statement that parses successfully also round-trips perfectly: parsing, unparsing, and reparsing produce an identical AST.

Additionally, a hand-written test suite of 485 unit tests covers individual grammar rules, edge cases, error recovery, and the mutation API.

Fuzz Testing

liteparser includes an extensive fuzz testing harness (fuzz/fuzz_parse.c) that exercises all parser modes, JSON serialization, SQL unparsing, AST walking, node cloning, equality checking, and roundtrip verification (parse → unparse → re-parse → equality).

The fuzzer uses 126 seed SQL templates covering every grammar production, with a built-in mutation engine that applies SQL-aware mutations: keyword injection, expression splicing, byte flipping, chunk duplication, and cross-seed recombination.

make fuzz              # Standalone fuzzer, 1M iterations
make fuzz-asan         # AddressSanitizer + UBSan (catches memory errors)
make fuzz-libfuzzer    # clang libFuzzer with coverage-guided fuzzing

The standalone fuzzer accepts options:

./fuzz/fuzz_parse --iterations 5000000 --seed 42 --max-len 8192

Metric	Result
Iterations tested	1,000,000+
Crashes	0
ASan/UBSan violations	0
Successful parses	~10% of mutated inputs
Tolerant-mode recoveries	~242k statements from ~1.6M errors

Building

Requires a C compiler (tested with clang and gcc). No other dependencies.

make              # Build the sqlparse CLI tool (release, -O2)
make shared       # Build shared library (.dylib / .so)
make wasm         # Build WebAssembly module (requires emscripten)
make debug        # Debug build (-g -O0)
make test         # Build and run the unit test suite
make test-suite   # Run the full SQLite-extracted test suite
make fuzz          # Run fuzz testing (1M iterations)
make fuzz-asan     # Fuzz with AddressSanitizer + UBSan
make clean        # Remove all build artifacts

To embed in your project, compile the source files in src/ or link against libliteparser.a:

make libliteparser.a
cc -Isrc -o myapp myapp.c -L. -lliteparser

Source Files

The library consists of the following files:

File	Description
`src/liteparser.h`	Public API header (include this)
`src/liteparser_internal.h`	Internal types and builder prototypes
`src/arena.h`	Arena allocator API
`src/parse.h`	Generated parser tokens and declarations
`src/arena.c`	Arena allocator
`src/liteparser.c`	AST builder, JSON serializer, visitor, mutation API
`src/lp_tokenize.c`	Hand-written lexer
`src/lp_unparse.c`	AST-to-SQL unparsing
`src/parse.c`	Lemon-generated LALR parser

Note: parse.c and parse.h are pre-generated from lp_parse.y by the Lemon parser generator and checked into the repo. You don't need the sqlite-master/ directory unless you modify the grammar.

CLI Usage

# Parse SQL to JSON AST
./sqlparse "SELECT * FROM users WHERE age > 18"

# Read from stdin
echo "SELECT 1; SELECT 2" | ./sqlparse

# Tolerant mode (continues past errors)
./sqlparse --tolerant "SELECT 1; GARBAGE; SELECT 2"

# Round-trip: parse and output reconstructed SQL
./sqlparse --unparse "SELECT a, b FROM t WHERE x > 5"

# Compact JSON (no indentation)
./sqlparse --compact "SELECT 1"

# Combine flags
./sqlparse --tolerant --compact "SELECT 1; BAD; SELECT 2"

Quick Start

#include "liteparser.h"

int main(void) {
    arena_t *arena = arena_create(64 * 1024);
    const char *err = NULL;

    // Parse a single statement
    LpNode *node = lp_parse("SELECT * FROM users WHERE age > 18", arena, &err);
    if (!node) { fprintf(stderr, "Error: %s\n", err); return 1; }

    // Serialize to JSON
    char *json = lp_ast_to_json(node, arena, 1);
    printf("%s\n", json);

    // Convert back to SQL
    char *sql = lp_ast_to_sql(node, arena);
    printf("SQL: %s\n", sql);

    arena_destroy(arena);
    return 0;
}

Output:

{
  "kind": "STMT_SELECT",
  "distinct": false,
  "result_columns": [
    {
      "kind": "EXPR_STAR"
    }
  ],
  "from": {
    "kind": "FROM_TABLE",
    "pos": {"line": 1, "col": 15, "offset": 14},
    "name": "users"
  },
  "where": {
    "kind": "EXPR_BINARY_OP",
    "pos": {"line": 1, "col": 27, "offset": 26},
    "op": ">",
    "left": {
      "kind": "EXPR_COLUMN_REF",
      "pos": {"line": 1, "col": 27, "offset": 26},
      "column": "age"
    },
    "right": {
      "kind": "EXPR_LITERAL_INT",
      "pos": {"line": 1, "col": 33, "offset": 32},
      "value": "18"
    }
  }
}
SQL: SELECT * FROM users WHERE age > 18;

WebAssembly (WASM)

liteparser compiles to WebAssembly for use in browsers, Node.js, and any JavaScript runtime. See WASM.md for full documentation, JavaScript API reference, and usage examples.

make wasm          # Produces wasm/dist/liteparser.mjs

API Reference

See API.md for the complete public API reference with descriptions and examples for every function.

AST Structure

Every node is an LpNode tagged union keyed by LpNodeKind:

Statements (LP_STMT_*): SELECT, INSERT, UPDATE, DELETE, CREATE TABLE, etc.
Expressions (LP_EXPR_*): literals, column refs, binary/unary ops, functions, subqueries, CASE, etc.
Clauses: JOIN, ORDER BY, LIMIT, CTE, window definitions, column/table constraints, foreign keys, etc.

Child nodes are LpNode* pointers or LpNodeList (growable array). See liteparser.h for complete type definitions.

Error Diagnostics

In tolerant mode, each LpError includes:

Field	Type	Description
`code`	`LpErrorCode`	`LP_ERR_SYNTAX`, `LP_ERR_ILLEGAL_TOKEN`, `LP_ERR_INCOMPLETE`, `LP_ERR_STACK_OVERFLOW`
`message`	`const char*`	Human-readable error message
`pos`	`LpSrcPos`	Start of error (line, col, offset)
`end_pos`	`LpSrcPos`	End of error range (exclusive)

JSON Output

See JSON_SCHEMA.md for the complete JSON schema documentation.

License

MIT License. See LICENSE file.

Part of the SQLite AI Ecosystem

This project is part of the SQLite AI ecosystem, a collection of extensions that bring modern AI capabilities to the world’s most widely deployed database. The goal is to make SQLite the default data and inference engine for Edge AI applications.

Other projects in the ecosystem include:

SQLite-AI — On-device inference and embedding generation directly inside SQLite.
SQLite-Memory — Markdown-based AI agent memory with semantic search.
SQLite-Vector — Ultra-efficient vector search for embeddings stored as BLOBs in standard SQLite tables.
SQLite-Sync — Local-first CRDT-based synchronization for seamless, conflict-free data sync and real-time collaboration across devices.
SQLite-Agent — Run autonomous AI agents directly from within SQLite databases.
SQLite-MCP — Connect SQLite databases to MCP servers and invoke their tools.
SQLite-JS — Create custom SQLite functions using JavaScript.
Liteparser — A highly efficient and fully compliant SQLite SQL parser.

Learn more at SQLite AI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

liteparser

Features

Parser Origin

Performance

Memory Model

Testing

Fuzz Testing

Building

Source Files

CLI Usage

Quick Start

WebAssembly (WASM)

API Reference

AST Structure

Error Diagnostics

JSON Output

License

Part of the SQLite AI Ecosystem

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
fuzz		fuzz
src		src
test		test
wasm		wasm
.gitignore		.gitignore
API.md		API.md
JSON_SCHEMA.md		JSON_SCHEMA.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
WASM.md		WASM.md

Folders and files

Latest commit

History

Repository files navigation

liteparser

Features

Parser Origin

Performance

Memory Model

Testing

Fuzz Testing

Building

Source Files

CLI Usage

Quick Start

WebAssembly (WASM)

API Reference

AST Structure

Error Diagnostics

JSON Output

License

Part of the SQLite AI Ecosystem

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages