-
-
Notifications
You must be signed in to change notification settings - Fork 48
Description
Flow (just like other data processing frameworks) that operates on DataFrame abstraction (tabular representation of the dataset) can be naturally handled through SQL.
The goal is to create a SQL Parsers that would let us translate SQL Queries to Flow DataFrame pipeline.
We can start from some simpler dialects like SQLite, most likely we won't be able to support all advanced features from PostgreSQL or MySQL, but that's not the goal.
Our goal is to parse basic SQL syntax and be able to add custom functions like:
SELECT
id, name, anything
FROM
parquet_file('path_to_file.parquet')
WHERE
active = true
ORDER BY id DESCThe biggest complexity comes from lack of good SQL Parser options in PHP but there are few options:
- Antlr
We can use Antlr4 with SQLite grammar (it's compatible with php target)
This way we could create a standalone lib flow-php/sql that would be a cleaner abstraction over Antlr4 auto generated code.
- SQLFTW
There is one very promising SQL Parser in PHP, SQLFTW.
The main concern is that it doesn't seem to be actively developed and that it brings hard dependency to dogma/dogma.
We should try to reach out to see if they would be interested in a PR that would drop that dependency (as dogma is just another attempt to build a standard php library in the userland)