Skip to content

GH-46522: [C++][FlightRPC] Add Arrow Flight SQL ODBC driver#40939

Merged
lidavidm merged 19 commits intoapache:mainfrom
jbonofre:FLIGHT_SQL_ODBC
May 21, 2025
Merged

GH-46522: [C++][FlightRPC] Add Arrow Flight SQL ODBC driver#40939
lidavidm merged 19 commits intoapache:mainfrom
jbonofre:FLIGHT_SQL_ODBC

Conversation

@jbonofre
Copy link
Member

@jbonofre jbonofre commented Apr 2, 2024

Rationale for this change

An ODBC driver uses the Open Database Connectivity (ODBC) interface that allow applications to access data in DBMS like environment using SQL.
As Arrow Flight provides JDBC and ADBC drivers, similarly Arrow Flight can provide ODBC driver.

What changes are included in this PR?

This PR adds an ODBC driver implementation.

Are these changes tested?

This ODBC driver is coming from a production system (SGA) that is tested/ran for a while.

Are there any user-facing changes?

No change, but new user option to use Arrow Flight.

@jbonofre jbonofre requested a review from lidavidm as a code owner April 2, 2024 05:59
@jbonofre jbonofre marked this pull request as draft April 2, 2024 05:59
@github-actions
Copy link

github-actions bot commented Apr 2, 2024

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@jbonofre
Copy link
Member Author

jbonofre commented Apr 2, 2024

@ianmcook @lidavidm @assignUser as discussed during the community meetings, here's the ODBC driver PR draft.

This PR includes:

  • sources files with legal fixes (I'm checking whoami source dual license/MIT, eventually alternatives)
  • build convenient files (like Brewfile)

I'm still working on:

  • finalize Arrow update (the original code was using Arrow 9.x)
  • finalize the build and cleanup the CMakeLists.txt files

I will also create a GH Issue to track this addition.

Copy link
Member

@jduo jduo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this @jbonofre !

Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection).

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Need to add the license for whereami to the Arrow project itself.

Are the separate brewfile and vcpkg.json necessary or can this utilize the ones already at the top-level?

Should the top-level CMakeLists now include the driver build? Or is that on hold as part of the donation process?

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 4, 2024
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@assignUser
Copy link
Member

finalize Arrow update (the original code was using Arrow 9.x)

I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake!

@jbonofre
Copy link
Member Author

jbonofre commented Apr 5, 2024

Hi @jduo !

Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection).

Yes, I'm doing that (that's why the PR is still a draft).

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Yes, locally I fixed/updated it, but I'm looking for alternative.

Need to add the license for whereami to the Arrow project itself.

I'm checking if we can't find an alternative, else I will update NOTICE file.

Are the separate brewfile and vcpkg.json necessary or can this utilize the ones already at the top-level?

Yes, that's the plan, I'm doing in two steps: fixing the build and integrating in the Arrow ecosystem.

Should the top-level CMakeLists now include the driver build? Or is that on hold as part of the donation process?

Agree, same the same as for brewfile/vcpkg.json: I'm doing that in two steps.

I will push new changes to this PR and "remove" the draft state as soon as it's clean to review (build ok, etc).

@jbonofre
Copy link
Member Author

jbonofre commented Apr 5, 2024

finalize Arrow update (the original code was using Arrow 9.x)

I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake!

No worries, I'm working on it :) I will keep you posted when good for review (when I will remove the draft flag).

@jduo
Copy link
Member

jduo commented Apr 10, 2024

This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.

Yes, locally I fixed/updated it, but I'm looking for alternative.

There was another group that did some work to build a full driver from flightsql-odbc. I haven't seen it though, but they've mentioned it here:
#30622 (comment)

@lidavidm
Copy link
Member

lidavidm commented Jul 2, 2024

Following up here - do you want any help?

@jduo
Copy link
Member

jduo commented Jul 16, 2024

@jbonofre checking in on this. Were you still continuing work on this? Did you need help?
Thanks

@jbonofre
Copy link
Member Author

@jduo yes, I'm back on it (sorry I was busy on Iceberg and ASF stuff). Let me do a new rebase/build/update and I will ping you for review/help.

Thanks ! And sorry again for delay :/

@jbonofre
Copy link
Member Author

jbonofre commented Sep 3, 2024

@lidavidm @jduo hey guys. I'm very very sorry to have been quiet for long. Between vacation and ASF tasks, I was not able to be active enough on Arrow. I'm now back and resuming my work on Arrow, including ODBC.

@lidavidm
Copy link
Member

lidavidm commented Sep 3, 2024

No worries! Glad to see you back around :)

@rpourzand
Copy link

Hey @jbonofre we're excited about the AFS ODBC driver! We want to start using it as soon as we can for a project we're working on and was wondering whether you had a sense of timing where it might be ready to be in someone's hands for development. We would be happy to be early adopters :)

@jduo
Copy link
Member

jduo commented Sep 4, 2024

Thanks @jbonofre . We're around if you need assistance moving this forward.

@jduo
Copy link
Member

jduo commented Oct 1, 2024

Hi @jbonofre , just checking in on if we can help to move this forward.

@jbonofre
Copy link
Member Author

jbonofre commented Oct 2, 2024

Hey @jduo
I started the recase. I will push this.
Would you have time for a Quick sync ?
Thanks !

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

I'm curious, does this version of the driver allow passing arbitrary parameters? I tried using the Dremio ODBC driver but it can't seem to be able to set arbitrary parameters. I have the following in a custom PowerBI connector:

        // This record contains all of the connection string properties we
        // will set for this ODBC driver. The 'Driver' field is required for
        // all ODBC connections. Other properties will vary between ODBC drivers,
        // but generally take Server and Database properties. Note that
        // credential related properties will be set separately.
        ConnectionString = [
            Driver = "Arrow Flight SQL ODBC Driver",
            host = server,
            port = 443,
            environmentId = "200532"
        ],

But i get the following error: ODBC: ERROR [HY000] [Apache Arrow][Flight SQL] (100) Flight returned invalid argument error, with message: environmentId is a required connection parameter

For our use-case we need to be able to pass various parameters as extra headers, which is supported by both ADBC and JDBC.

@jduo
Copy link
Member

jduo commented Oct 2, 2024

I'm curious, does this version of the driver allow passing arbitrary parameters? I tried using the Dremio ODBC driver but it can't seem to be able to set arbitrary parameters. I have the following in a custom PowerBI connector:

        // This record contains all of the connection string properties we
        // will set for this ODBC driver. The 'Driver' field is required for
        // all ODBC connections. Other properties will vary between ODBC drivers,
        // but generally take Server and Database properties. Note that
        // credential related properties will be set separately.
        ConnectionString = [
            Driver = "Arrow Flight SQL ODBC Driver",
            host = server,
            port = 443,
            environmentId = "200532"
        ],

But i get the following error: ODBC: ERROR [HY000] [Apache Arrow][Flight SQL] (100) Flight returned invalid argument error, with message: environmentId is a required connection parameter

For our use-case we need to be able to pass various parameters as extra headers, which is supported by both ADBC and JDBC.

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this) -- is your server looking for environmentId case-insensitively?

@jduo
Copy link
Member

jduo commented Oct 2, 2024

Hey @jduo I started the recase. I will push this. Would you have time for a Quick sync ? Thanks !

Hi @jbonofre , yes we can have a chat. Will send meeting info offline.

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this) -- is your server looking for environmentId case-insensitively?

Yeah it's case-insensitive. I also tried setting environmentid in all lower-case and still get the same error

@aiguofer
Copy link
Contributor

aiguofer commented Oct 2, 2024

Interesting, I do see:

TEST(PopulateCallOptionsTest, GenericOption) {
  FlightSqlConnection connection(odbcabstraction::V_3);
  connection.SetClosed(false);

  Connection::ConnPropertyMap properties;
  properties["Foo"] = "Bar";
  auto options = connection.PopulateCallOptions(properties);
  auto headers = options.headers;
  ASSERT_EQ(1, headers.size());

  // Header name must be lower-case because gRPC will crash if it is not lower-case.
  ASSERT_EQ("foo", headers[0].first);

  // Header value should preserve case.
  ASSERT_EQ("Bar", headers[0].second);
}

Maybe there's a bug with call options not getting set correctly for the HANDSHAKE request? I remember that being an issue with the JDBC driver: #33946

@CurtHagenlocher
Copy link
Contributor

Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this)

This is actually part of the HTTP/2 spec.

@lidavidm lidavidm merged commit ed36107 into apache:main May 21, 2025
40 of 41 checks passed
@lidavidm lidavidm removed the awaiting changes Awaiting changes label May 21, 2025
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Review in progress...)

Also to double check, none of these headers get installed, right? If they do we should block them from getting installed

Longer term...I wonder if the ODBC driver should be in the arrow/ tree, or if we should consider putting it at the top level (next to arrow/, parquet/), or even in its own repo potentially.

Comment on lines +28 to +29
namespace driver {
namespace flight_sql {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's fix the namespace: this should be something like arrow::flight::sql::odbc


namespace {
template <typename T>
int64_t convertDate(typename T::value_type value) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update functions to Arrow naming conventions (ConvertDate, not convertDate)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By codebase convention this should be called "api.h".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, we don't have a separate "include" subdirectory. Headers go next to source files.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of this application? Should it be converted to tests instead?

odbcabstraction::Diagnostics& diagnostic) {
auto* buffer = static_cast<TIME_STRUCT*>(binding->buffer);

tm time{};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, why are we using this instead of C++ stdlib chrono/date?


using odbcabstraction::GetTimeForSecondsSinceEpoch;

TEST(TEST_TIMESTAMP, TIMESTAMP_WITH_MILLI) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the test suite/case name are effectively class names, so you'd normally expect to see TestTimestamp, TimestampWithMilli

namespace odbcabstraction {

/// \brief Supported ODBC versions.
enum OdbcVersion { V_2, V_3, V_4 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use enum class?

constexpr ssize_t MICRO_TO_SECONDS_DIVISOR = 1000000;
constexpr ssize_t NANO_TO_SECONDS_DIVISOR = 1000000000;

typedef struct tagDATE_STRUCT {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these typedef struct? Just use normal struct?

};

struct MetadataSettings {
boost::optional<int32_t> string_column_length_{boost::none};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use stdlib

@github-actions github-actions bot added the awaiting changes Awaiting changes label May 21, 2025
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in progress...)

namespace {

#if defined _WIN32 || defined _WIN64
std::string utf8_to_clocale(const char* utf8str, int len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming conventions: Utf8ToCLocale, utf8_str

std::string utf8_to_clocale(const char* utf8str, int len) {
thread_local boost::locale::generator g;
g.locale_cache_enabled(true);
std::locale loc = g(boost::locale::util::get_system_locale());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't locales thread safe? Can't we initialize this once in a static instead of using a thread local?

namespace flight_sql {
namespace config {

static const char DEFAULT_DSN[] = "Apache Arrow Flight SQL";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason why we sometimes use static const char[] and other times use constexpr std::string_view? IMO the latter is preferable unless there is a reason

Comment on lines +100 to +106
Configuration::Configuration() {
// No-op.
}

Configuration::~Configuration() {
// No-op.
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these are no-ops, use =default

Comment on lines +117 to +139
Set(FlightSqlConnection::DSN, dsn);
Set(FlightSqlConnection::HOST, ReadDsnString(dsn, FlightSqlConnection::HOST));
Set(FlightSqlConnection::PORT, ReadDsnString(dsn, FlightSqlConnection::PORT));
Set(FlightSqlConnection::TOKEN, ReadDsnString(dsn, FlightSqlConnection::TOKEN));
Set(FlightSqlConnection::UID, ReadDsnString(dsn, FlightSqlConnection::UID));
Set(FlightSqlConnection::PWD, ReadDsnString(dsn, FlightSqlConnection::PWD));
Set(FlightSqlConnection::USE_ENCRYPTION,
ReadDsnString(dsn, FlightSqlConnection::USE_ENCRYPTION, DEFAULT_ENABLE_ENCRYPTION));
Set(FlightSqlConnection::TRUSTED_CERTS,
ReadDsnString(dsn, FlightSqlConnection::TRUSTED_CERTS));
Set(FlightSqlConnection::USE_SYSTEM_TRUST_STORE,
ReadDsnString(dsn, FlightSqlConnection::USE_SYSTEM_TRUST_STORE,
DEFAULT_USE_CERT_STORE));
Set(FlightSqlConnection::DISABLE_CERTIFICATE_VERIFICATION,
ReadDsnString(dsn, FlightSqlConnection::DISABLE_CERTIFICATE_VERIFICATION,
DEFAULT_DISABLE_CERT_VERIFICATION));

auto customKeys = ReadAllKeys(dsn);
RemoveAllKnownKeys(customKeys);
for (auto key : customKeys) {
std::string_view key_sv(key);
Set(key, ReadDsnString(dsn, key_sv));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we effectively re-parsing the DSN from scratch for each key? Can't we parse it up front into keys/values then assign?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReadDsnString grabs the DSN value directly from the DSN config file using Windows Driver Manager's implementation of SQLGetPrivateProfileString, which gets key/value pairs from the DSN. SQLGetPrivateProfileString is the only Driver Manager ODBC API that reads from ODBC.INI file, so it is used here.
I think the implementation of SQLGetPrivateProfileString is closed-source, so I don't know if it parses DSN upfront under the hood

while (!connect_str.empty()) {
size_t attr_begin = connect_str.rfind(delimiter);

if (attr_begin == std::string::npos)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't omit braces

/// \param connPropertyMap the map with the Connection properties.
/// \return An instance of the FlightSqlSslConfig.
std::shared_ptr<FlightSqlSslConfig> LoadFlightSslConfigs(
const odbcabstraction::Connection::ConnPropertyMap& connPropertyMap);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

conn_property_map

Comment on lines +208 to +211
int main(int argc, char** argv) {
::testing::InitGoogleTest(&argc, argv);
return RUN_ALL_TESTS();
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need this


using arrow::RecordBatch;

using std::optional;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't using std types, just write std::optional


using std::optional;

class GetTablesReader {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring?

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in progress)

Is there anything explaining how the different sublibraries (odbcabstraction, etc.) fit together here?

constexpr auto SYSTEM_TRUST_STORE_DEFAULT = true;
constexpr auto STORES = {"CA", "MY", "ROOT", "SPC"};

inline std::string GetCerts() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this inline?

using odbcabstraction::MetadataSettings;
using odbcabstraction::ResultSet;

typedef struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the C-ism? Just struct ColumnNames.

std::string curr_parse; // the current string

for (char temp : table_type) { // while still in the string
switch (temp) { // switch depending on the character
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many of these comments are just noise and aren't necessary. I would rather we have docstrings since the purpose of many functions is not necessarily clear. Another thing is that for functions which parse a value, it would be more helpful to document the format being parsed/have examples of valid values

if (config_file.fail()) {
auto error_msg = "Arrow Flight SQL ODBC driver config file not found on \"" +
config_file_path + "\"";
std::cerr << error_msg << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not spam console. If we're going to do this, at least use the logger.

ODBCStatement::ODBCStatement(
ODBCConnection& connection,
std::shared_ptr<driver::odbcabstraction::Statement> spiStatement)
: m_connection(connection),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally don't use the m_ prefix in this codebase. It should be connection_ if it's private.

return;
case SQL_ROWSET_SIZE:
SetAttribute(value, m_rowsetSize);
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do some cases in this switch return, and others break?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cases that break set the value successfully_written which is checked after the break. The ODBC then returns a warning if the attribute is not written successfully.

"HY092");
}
if (!successfully_written) {
GetDiagnostics().AddWarning("Optional value changed.", "01S02",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a warning that an optional value changed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ODBC driver is required to log a warning diagnostic record for optional value changed


SQLSMALLINT evaluatedCType = cType;

// TODO: Get proper default precision and scale from abstraction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we file issues for these TODOs? Is there any plan to tackle them?


inline void ThrowIfNotOK(const arrow::Status& status) {
if (!status.ok()) {
throw odbcabstraction::DriverException(status.message());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

message will throw away other metadata like the status code and status details, perhaps use ToString

return builder.Append(value);
}

odbcabstraction::SqlDataType GetDataTypeFromArrowField_V3(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the '_V3' supposed to denote?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_V3 refers to ODBC Version 3 API backend function, whereas _V2 refers to ODBC version 2 API. Version 3 and Version 2 have slightly different behaviors.

}

odbcabstraction::SqlDataType GetDataTypeFromArrowField_V3(
const std::shared_ptr<arrow::Field>& field, bool useWideChar);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use_wide_char

odbcabstraction::SqlDataType EnsureRightSqlCharType(
odbcabstraction::SqlDataType data_type, bool useWideChar);

int16_t ConvertSqlDataTypeFromV3ToV2(int16_t data_type_v3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why untyped int16_t? Aren't there enums?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is not using an enum because imo enums aren't necessary in this case. The ODBC data type returned here serves as a return output value (to the BI tools etc) only, not for working with other parts of the driver.
The ODBC data type values are fixed macros defined inside the system ODBC library (e.g., sqlext.h):

https://github.com/microsoft/ODBCTest/blob/0d629c7e4ff7b01398a5ac71d20c43362d0f43bf/ODBC%204.0/inc/sqlext.h#L463-L464

const int32_t decimal_precison = 0);

optional<int32_t> GetBufferLength(odbcabstraction::SqlDataType data_type,
const optional<int32_t>& column_size);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it's better to just pass optional<int32_t> than pass by const& (optional<int32_t> is essentially a value type already so avoid the extra reference)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These utils feel like a bit of a grab bag and could probably be organized better

namespace flight_sql {

namespace {
bool IsComplexType(arrow::Type::type type_id) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The codebase usually calls these 'nested' types. Also, there's already is_nested in type_traits.h

case odbcabstraction::SqlDataType_LONGVARBINARY:
return "LONGVARBINARY";
case odbcabstraction::SqlDataType_TYPE_DATE:
case 9:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this part of the enum?

}
}

ArrayConvertTask GetConverter(arrow::Type::type original_type_id,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this should take a full DataType and not just the type ID? In the source it is called in a context where it has the full type. That way you don't need GetDefaultDataTypeForTypeId (Also, maybe it should be moved to flight_sql_result_set_column.cc since it is only used there?)

}

int32_t GetDecimalTypeScale(const std::shared_ptr<arrow::DataType>& decimalType) {
auto decimal128Type = std::dynamic_pointer_cast<arrow::Decimal128Type>(decimalType);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use checked_pointer_cast?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though frankly you probably just want checked_cast. You don't need a new shared_ptr instance here.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, this codebase needs to be brought in line with Arrow conventions over time


/// \brief Tells if ssl is enabled. By default it will be true.
/// \return Whether ssl is enabled.
bool useEncryption() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be either UseEncryption or use_encryption (because this is a trivial getter), ditto for below

bool use_wide_char_;
bool is_bound_;

inline Accessor* GetAccessorForBinding() { return cached_accessor_.get(); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these declared inline?

}

Status Visit(const arrow::Int64Scalar& scalar) override {
writer_.Int64(scalar.value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is rapidJSON's behavior for int64s that are not representable as JSON numbers?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see from tests, rapidJSON can still convert the max and min int64 to strings, so rapidJSON seems to support all numbers within the range of (9223372036854775807,-9223372036854775808). For example the following test works:

TEST(ConvertToJson, Int64) {
ASSERT_EQ("9223372036854775807",
ConvertToJson(arrow::Int64Scalar(9223372036854775807LL)));
// 9223372036854775808ULL is not valid as a signed int64, using workaround
ASSERT_EQ("-9223372036854775808", ConvertToJson(arrow::Int64Scalar(static_cast<int64_t>(
-9223372036854775807LL - 1))));
}

Status ConvertBinaryToBase64StringAndWrite(
const BinaryScalarT& scalar, rapidjson::Writer<rapidjson::StringBuffer>& writer) {
const auto& view = scalar.view();
size_t encoded_size = base64::encoded_size(view.length());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arrow has base64 utilities already, can we use them instead of Boost?

Comment on lines +27 to +37
namespace driver {
namespace odbcabstraction {
class Statement;
class ResultSet;
} // namespace odbcabstraction
} // namespace driver

namespace ODBC {
class ODBCConnection;
class ODBCDescriptor;
} // namespace ODBC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to create type_fwd.h headers like the rest of the Arrow codebase

#include <mutex>

/**
* @brief An abstraction over a generic ODBC handle.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

N.B. this codebase typically uses \brief, not @brief, for Doxygen.

#include <memory>
#include <string>

#define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we get rid of codecvt? It's been deprecated/removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need multiple layers of logging abstractions? especially as this header still requires spdlog...

#include <functional>
#include <string>

#include <spdlog/fmt/bundled/format.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we move to C++20 (soon), we need to use stdlib format, not poke into spdlog's internals

namespace driver {
namespace odbcabstraction {

enum ODBCErrorCodes : int32_t {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) enum class
(2) Given the very specific codes, is there a reference/source for these values?

}

bool FlightSqlStatement::ExecutePrepared() {
assert(prepared_statement_.get() != nullptr);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Arrow DCHECK macros?

using arrow::Result;

namespace {
Result<std::shared_ptr<Array>> MakeEmptyArray(std::shared_ptr<DataType> type,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use MakeArrayOfNull


void ReportSystemFunction(const std::string& function, uint32_t& current_sys_functions,
uint32_t& current_convert_functions);
void ReportNumericFunction(const std::string& function, uint32_t& current_functions);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the purpose of these functions be documented? Even looking at the implementation it's not very clear

if (result != system_functions.end()) {
current_sys_functions |= result->second;
} else if (function == "CONVERT") {
// CAST and CONVERT are system functions from FlightSql/Calcite, but are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK Flight SQL doesn't say anything about the actual SQL dialect?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    // CAST and CONVERT are system functions from FlightSql/Calcite, but are
    // CONVERT functions in ODBC. Assume that if CONVERT is reported as a system
    // function, then CAST and CONVERT are both supported.
    current_convert_functions |= SQL_FN_CVT_CONVERT | SQL_FN_CVT_CAST;

Currently if the server returns CONVERT function support via GetSqlInfo, SQLGetInfo (ODBC API) includes SQL_FN_CVT_CONVERT | SQL_FN_CVT_CAST in the returns to indicate both convert and cast are supported.

I have reached out to @jduo for more context on this code comment. The assumption in the code comment applies to all databases. I think it is not specific to any SQL dialect.

{"SPACE", SQL_FN_STR_SPACE},
{"SUBSTRING", SQL_FN_STR_SUBSTRING},
{"UCASE", SQL_FN_STR_UCASE},
// Additional functions in ODBC but not Calcite:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does Calcite have to do with this? (Or is this perhaps a Dremio implementation detail leaking out?)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it isn't Dremio-specific. The Calcite's SqlJdbcFunctionCall class was referenced for this code. At the top of the file, it says

// The list of functions that can be converted from string to ODBC bitmasks is
// based on Calcite's SqlJdbcFunctionCall class.

And I think this Additional functions ... comment is related to that. There exist a few system functions supported in ODBC, but not yet implemented on Calcite at the time the code is written.

class Diagnostics {
public:
struct DiagnosticsRecord {
std::string msg_text_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a struct, so the fields are public - no need to suffix with _

static const std::unique_ptr<DiagnosticsRecord> TRUNCATION_WARNING(
new DiagnosticsRecord{"String or binary data, right-truncated.", "01004",
ODBCErrorCodes_TRUNCATION_WARNING});
warning_records_.push_back(TRUNCATION_WARNING.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one doesn't need to be a unique_ptr? It shouldn't ever get moved.

// Block while queue is full
std::unique_lock<std::mutex> unique_lock(mtx_);
if (!WaitUntilCanPushOrClosed(unique_lock)) break;
unique_lock.unlock();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why unlock here, when Push has to lock again below? (Pass push the guard instead?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we check if closed_ and break?

if (!WaitUntilCanPushOrClosed(unique_lock)) break;
unique_lock.unlock();

// Only one thread at a time be notified and call supplier
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the basis for this is the use of notify_one, that doesn't actually work AFAIK

@lidavidm
Copy link
Member

I'll continue reviewing tomorrow.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ed36107.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them.

@chrisfw
Copy link

chrisfw commented May 21, 2025

Hi @jbonofre , @rscales, @alinaliBQ,
This is really exciting news! I tried using the Dremio Flight SQL ODBC Driver against a ROAPI dataset, but I ran into an issue with the underlying use of the (relatively) new Utf8View data type by DataFusion (used by ROAPI). Can you tell me if that data type is supported by this driver?

Thank you!
Regards,
Chris

@jduo
Copy link
Member

jduo commented May 21, 2025

Hi @jbonofre , @rscales, @alinaliBQ, This is really exciting news! I tried using the Dremio Flight SQL ODBC Driver against a ROAPI dataset, but I ran into an issue with the underlying use of the (relatively) new Utf8View data type by DataFusion (used by ROAPI). Can you tell me if that data type is supported by this driver?

Thank you! Regards, Chris

Hi @chrisfw .

This driver currently doesn't support Utf8View (it supports the subset of Arrow types that Dremio uses) but it'd be much easier to support now since it is updated with the Arrow codebase. (Likely just a matter of updating switch statements and adding test cases). It'd be great if you added an issue for any additional types that you'd need.

Note that this isn't a complete ODBC driver. It has most of the ODBC implementation logic and all the Flight SQL communication code. -- @alinaliBQ and @rscales are completing the last bits required to implement the ODBC interface.

@chrisfw
Copy link

chrisfw commented May 21, 2025

Thanks for your reply @jduo. I will definitely add an issue to request an enhancement adding the currently unsupported data types I have encounter.

Regards,
Chris

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that should be the last of it.

PACKET_SIZE, // uint32_t - The Packet Size
};

typedef boost::variant<std::string, void*, uint64_t, uint32_t> Attribute;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::variant


/// \brief Statement attributes that can be called at anytime.
////TODO: Document attributes
enum StatementAttributeId {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enum class

TimeoutDuration{static_cast<double>(boost::get<size_t>(value))};
} else {
call_options_.timeout = TimeoutDuration{-1};
// Intentional fall-through.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +125 to +126
// Arbitrarily return a negative value
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this correct? Should wee error instead?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The GetInfoTypeForArrowConvertEntry function returns the corresponding odbc SQLGetInfo field in SQLGetInfo's SQL_CONVERT_* field types for the given Arrow SqlConvert enum value, and returns -1 if there is no corresponding SQLGetInfo field. The caller of GetInfoTypeForArrowConvertEntry does a check to ensure the return value is positive, so if -1 is returned, the caller carries on without erroring out.

In other words, this function GetInfoTypeForArrowConvertEntry is trying to get the server's convert support info, and if the information is not useful to ODBC (no corresponding ODBC SQLGetInfo field), the information is ignored and ODBC doesn't error out. This approach makes it more robust and has less blockers for the user to get data.

}

inline int32_t ScalarToInt32(arrow::UnionScalar* scalar) {
return reinterpret_cast<arrow::Int32Scalar*>(scalar->child_value().get())->value;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checked_cast?

Also, why inline?

namespace flight_sql {
namespace config {

#define TRUE_STR "true"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...is there any actual value to having these constants?

const arrow::flight::FlightCallOptions& call_options,
const std::shared_ptr<FlightInfo>& flight_info, size_t queue_capacity)
: queue_(queue_capacity) {
// FIXME: Endpoint iteration should consider endpoints may be at different hosts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File an issue for this?

Comment on lines +51 to +52
const boost::xpressive::sregex CONNECTION_STR_REGEX(
boost::xpressive::sregex::compile("([^=;]+)=({.+}|[^=;]+|[^;])"));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use stdlib regex?

@jmao-denver
Copy link
Contributor

Congratulations on the merge!

Got an error when trying the build it on Mac M2.

~/git/arrow/cpp/build main *1 ?1 ❯ cmake .. --preset ninja-debug-maximal
...
Vcpkg integrate step - DONE.
-- Retrieving version from /Users/jianfengmao/git/arrow/cpp/build/_deps/azure_sdk-src/sdk/tables/azure-data-tables/src/private/package_version.hpp
-- VERSION_MAJOR 1
-- VERSION_MINOR 0
-- VERSION_PATCH 0
-- VERSION_PRERELEASE beta.4
-- AZ_LIBRARY_VERSION
CMake Error at /opt/homebrew/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:227 (message):
  Could NOT find ODBC (missing: ODBC_INCLUDE_DIR)
Call Stack (most recent call first):
  /opt/homebrew/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:591 (_FPHSA_FAILURE_MESSAGE)
  /opt/homebrew/share/cmake/Modules/FindODBC.cmake:200 (find_package_handle_standard_args)
  cmake_modules/ThirdpartyToolchain.cmake:5584 (find_package)
  CMakeLists.txt:618 (include)

@alinaliBQ
Copy link
Collaborator

Hi @lidavidm, thanks for reviewing, we (Rob and I) will begin looking into those comments after the development in the ODBC layer (PR #46099) has finished.

@alinaliBQ
Copy link
Collaborator

Congratulations on the merge!

Got an error when trying the build it on Mac M2.

~/git/arrow/cpp/build main *1 ?1 ❯ cmake .. --preset ninja-debug-maximal
...
Vcpkg integrate step - DONE.
-- Retrieving version from /Users/jianfengmao/git/arrow/cpp/build/_deps/azure_sdk-src/sdk/tables/azure-data-tables/src/private/package_version.hpp
-- VERSION_MAJOR 1
-- VERSION_MINOR 0
-- VERSION_PATCH 0
-- VERSION_PRERELEASE beta.4
-- AZ_LIBRARY_VERSION
CMake Error at /opt/homebrew/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:227 (message):
  Could NOT find ODBC (missing: ODBC_INCLUDE_DIR)
Call Stack (most recent call first):
  /opt/homebrew/share/cmake/Modules/FindPackageHandleStandardArgs.cmake:591 (_FPHSA_FAILURE_MESSAGE)
  /opt/homebrew/share/cmake/Modules/FindODBC.cmake:200 (find_package_handle_standard_args)
  cmake_modules/ThirdpartyToolchain.cmake:5584 (find_package)
  CMakeLists.txt:618 (include)

Hi Jianfeng, installing the ODBC driver manager on your machine might help with the issue.

@hiroyuki-sato
Copy link
Collaborator

Hello, @jmao-denver

I saw this part in my Gmail. but I can't find the comment in this PR.

-- Looking for backtrace
-- Looking for backtrace - found
-- backtrace facility detected in default set of libraries
-- Found Backtrace: /usr/include  
CMake Error at /usr/share/cmake-3.28/Modules/FindCUDAToolkit.cmake:855 (message):
  Could not find nvcc, please set CUDAToolkit_ROOT.
Call Stack (most recent call first):
  src/arrow/gpu/CMakeLists.txt:44 (find_package)

I'm using the following steps on my macOS.

git clone https://github.com/apache/arrow/
cd arrow/cpp
mkdir build
cd build
cmake .. --preset ninja-debug-maximal \
  -DCMAKE_INSTALL_PREFIX=/tmp/local \
  -DARROW_CUDA=OFF \
  -DARROW_SKYHOOK=OFF \
  -DCMAKE_OSX_SYSROOT=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk \
  -DARROW_EXTRA_ERROR_CONTEXT=OFF
cmake --build .

You need to wait this PR merge. #46622

@jmao-denver
Copy link
Contributor

Hi @hiroyuki-sato, yeah I deleted it because I realized that I just needed to turn CUDA off. The build on my MBP M2 still failed with what seemed to be compatibility issues caused by the version of the compiler I was using. I then went to try the build on Windows with the latest Visual Studio Community Edition (2022). It built but running the produced arrow_odbc_spi_imp_cli.exe generated a crash. I also briefly tried to build it on ubuntu and it failed there as well.

I was basically doing something like on MacOS and Ubuntu.

cmake .. --preset ninja-debug-minimal -DARROW_FLIGHT_SQL_ODBC=ON
cmake --build .

and some additional options on Windows.

@hiroyuki-sato
Copy link
Collaborator

Hi, @jmao-denver
In my opinion, the ODBC driver has just been merged and there are issues like #46622 so I think it's better to wait a bit.
Anyway, this PR already merged. So, If you have any problem. You would better to create a new discussion or issue.

@lidavidm
Copy link
Member

To be clear: what was donated and merged is not a complete ODBC driver. The contributors are still working on the rest of the necessary pieces. (Dremio's original driver was a mix of GPL and Apache licensed code, and of course the GPL licensed code can't be used here.)

@alinaliBQ
Copy link
Collaborator

@lidavidm Yes that's correct. What was merged is not a complete driver.
Folks are welcomed to look at my team's in-progress ODBC driver at #46099 for Windows (it is not tested for macOS/Linux); PR#46099 code is replacement for the GPL license code in the original Dremio ODBC driver. The PR#46099 will be continuously updated with more ODBC development commits.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.