Skip to content

SQL Support #1444

@norberttech

Description

@norberttech

Flow (just like other data processing frameworks) that operates on DataFrame abstraction (tabular representation of the dataset) can be naturally handled through SQL.

The goal is to create a SQL Parsers that would let us translate SQL Queries to Flow DataFrame pipeline.

We can start from some simpler dialects like SQLite, most likely we won't be able to support all advanced features from PostgreSQL or MySQL, but that's not the goal.
Our goal is to parse basic SQL syntax and be able to add custom functions like:

SELECT 
  id, name, anything
FROM
  parquet_file('path_to_file.parquet')
WHERE
  active = true
ORDER BY id DESC

The biggest complexity comes from lack of good SQL Parser options in PHP but there are few options:

  1. Antlr

We can use Antlr4 with SQLite grammar (it's compatible with php target)

This way we could create a standalone lib flow-php/sql that would be a cleaner abstraction over Antlr4 auto generated code.

  1. SQLFTW

There is one very promising SQL Parser in PHP, SQLFTW.

The main concern is that it doesn't seem to be actively developed and that it brings hard dependency to dogma/dogma.

We should try to reach out to see if they would be interested in a PR that would drop that dependency (as dogma is just another attempt to build a standard php library in the userland)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions