GH-46522: [C++][FlightRPC] Add Arrow Flight SQL ODBC driver#40939
GH-46522: [C++][FlightRPC] Add Arrow Flight SQL ODBC driver#40939lidavidm merged 19 commits intoapache:mainfrom
Conversation
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or In the case of PARQUET issues on JIRA the title also supports: See also: |
|
@ianmcook @lidavidm @assignUser as discussed during the community meetings, here's the ODBC driver PR draft. This PR includes:
I'm still working on:
I will also create a GH Issue to track this addition. |
jduo
left a comment
There was a problem hiding this comment.
Thanks for this @jbonofre !
Some of the code in odbc_impl should be updated to use the same naming conventions as the arrow project (eg ODBCConnection -> odbc_connection).
This covers the flightsql-odbc repo, but there is additional code needed for the ODBC entry points to build a functional driver. That was in warpdrive, but that code isn't Apache-compatible.
Need to add the license for whereami to the Arrow project itself.
Are the separate brewfile and vcpkg.json necessary or can this utilize the ones already at the top-level?
Should the top-level CMakeLists now include the driver build? Or is that on hold as part of the donation process?
There was a problem hiding this comment.
These @copedoc tags are causing doxygen failures in CI:
https://github.com/apache/arrow/actions/runs/8518121914/job/23329735731?pr=40939#step:6:3040
I fear I can't really help on that front but once that's done I'd be happy to continue working on the CMake! |
|
Hi @jduo !
Yes, I'm doing that (that's why the PR is still a draft).
Yes, locally I fixed/updated it, but I'm looking for alternative.
I'm checking if we can't find an alternative, else I will update
Yes, that's the plan, I'm doing in two steps: fixing the build and integrating in the Arrow ecosystem.
Agree, same the same as for I will push new changes to this PR and "remove" the draft state as soon as it's clean to review (build ok, etc). |
No worries, I'm working on it :) I will keep you posted when good for review (when I will remove the draft flag). |
There was another group that did some work to build a full driver from flightsql-odbc. I haven't seen it though, but they've mentioned it here: |
|
Following up here - do you want any help? |
|
@jbonofre checking in on this. Were you still continuing work on this? Did you need help? |
|
@jduo yes, I'm back on it (sorry I was busy on Iceberg and ASF stuff). Let me do a new rebase/build/update and I will ping you for review/help. Thanks ! And sorry again for delay :/ |
|
No worries! Glad to see you back around :) |
|
Hey @jbonofre we're excited about the AFS ODBC driver! We want to start using it as soon as we can for a project we're working on and was wondering whether you had a sense of timing where it might be ready to be in someone's hands for development. We would be happy to be early adopters :) |
|
Thanks @jbonofre . We're around if you need assistance moving this forward. |
|
Hi @jbonofre , just checking in on if we can help to move this forward. |
|
Hey @jduo |
|
I'm curious, does this version of the driver allow passing arbitrary parameters? I tried using the Dremio ODBC driver but it can't seem to be able to set arbitrary parameters. I have the following in a custom PowerBI connector: But i get the following error: For our use-case we need to be able to pass various parameters as extra headers, which is supported by both ADBC and JDBC. |
Last I tried, it supported this. I believe it lower-cases headers though (the C++ grpc client itself requires this) -- is your server looking for environmentId case-insensitively? |
Yeah it's case-insensitive. I also tried setting |
|
Interesting, I do see: Maybe there's a bug with call options not getting set correctly for the HANDSHAKE request? I remember that being an issue with the JDBC driver: #33946 |
This is actually part of the HTTP/2 spec. |
lidavidm
left a comment
There was a problem hiding this comment.
(Review in progress...)
Also to double check, none of these headers get installed, right? If they do we should block them from getting installed
Longer term...I wonder if the ODBC driver should be in the arrow/ tree, or if we should consider putting it at the top level (next to arrow/, parquet/), or even in its own repo potentially.
| namespace driver { | ||
| namespace flight_sql { |
There was a problem hiding this comment.
Let's fix the namespace: this should be something like arrow::flight::sql::odbc
|
|
||
| namespace { | ||
| template <typename T> | ||
| int64_t convertDate(typename T::value_type value) { |
There was a problem hiding this comment.
Please update functions to Arrow naming conventions (ConvertDate, not convertDate)
There was a problem hiding this comment.
By codebase convention this should be called "api.h".
There was a problem hiding this comment.
In general, we don't have a separate "include" subdirectory. Headers go next to source files.
There was a problem hiding this comment.
What is the purpose of this application? Should it be converted to tests instead?
| odbcabstraction::Diagnostics& diagnostic) { | ||
| auto* buffer = static_cast<TIME_STRUCT*>(binding->buffer); | ||
|
|
||
| tm time{}; |
There was a problem hiding this comment.
In general, why are we using this instead of C++ stdlib chrono/date?
|
|
||
| using odbcabstraction::GetTimeForSecondsSinceEpoch; | ||
|
|
||
| TEST(TEST_TIMESTAMP, TIMESTAMP_WITH_MILLI) { |
There was a problem hiding this comment.
FWIW, the test suite/case name are effectively class names, so you'd normally expect to see TestTimestamp, TimestampWithMilli
| namespace odbcabstraction { | ||
|
|
||
| /// \brief Supported ODBC versions. | ||
| enum OdbcVersion { V_2, V_3, V_4 }; |
| constexpr ssize_t MICRO_TO_SECONDS_DIVISOR = 1000000; | ||
| constexpr ssize_t NANO_TO_SECONDS_DIVISOR = 1000000000; | ||
|
|
||
| typedef struct tagDATE_STRUCT { |
There was a problem hiding this comment.
Why are these typedef struct? Just use normal struct?
| }; | ||
|
|
||
| struct MetadataSettings { | ||
| boost::optional<int32_t> string_column_length_{boost::none}; |
| namespace { | ||
|
|
||
| #if defined _WIN32 || defined _WIN64 | ||
| std::string utf8_to_clocale(const char* utf8str, int len) { |
There was a problem hiding this comment.
Naming conventions: Utf8ToCLocale, utf8_str
| std::string utf8_to_clocale(const char* utf8str, int len) { | ||
| thread_local boost::locale::generator g; | ||
| g.locale_cache_enabled(true); | ||
| std::locale loc = g(boost::locale::util::get_system_locale()); |
There was a problem hiding this comment.
Aren't locales thread safe? Can't we initialize this once in a static instead of using a thread local?
| namespace flight_sql { | ||
| namespace config { | ||
|
|
||
| static const char DEFAULT_DSN[] = "Apache Arrow Flight SQL"; |
There was a problem hiding this comment.
Is there a reason why we sometimes use static const char[] and other times use constexpr std::string_view? IMO the latter is preferable unless there is a reason
| Configuration::Configuration() { | ||
| // No-op. | ||
| } | ||
|
|
||
| Configuration::~Configuration() { | ||
| // No-op. | ||
| } |
There was a problem hiding this comment.
If these are no-ops, use =default
| Set(FlightSqlConnection::DSN, dsn); | ||
| Set(FlightSqlConnection::HOST, ReadDsnString(dsn, FlightSqlConnection::HOST)); | ||
| Set(FlightSqlConnection::PORT, ReadDsnString(dsn, FlightSqlConnection::PORT)); | ||
| Set(FlightSqlConnection::TOKEN, ReadDsnString(dsn, FlightSqlConnection::TOKEN)); | ||
| Set(FlightSqlConnection::UID, ReadDsnString(dsn, FlightSqlConnection::UID)); | ||
| Set(FlightSqlConnection::PWD, ReadDsnString(dsn, FlightSqlConnection::PWD)); | ||
| Set(FlightSqlConnection::USE_ENCRYPTION, | ||
| ReadDsnString(dsn, FlightSqlConnection::USE_ENCRYPTION, DEFAULT_ENABLE_ENCRYPTION)); | ||
| Set(FlightSqlConnection::TRUSTED_CERTS, | ||
| ReadDsnString(dsn, FlightSqlConnection::TRUSTED_CERTS)); | ||
| Set(FlightSqlConnection::USE_SYSTEM_TRUST_STORE, | ||
| ReadDsnString(dsn, FlightSqlConnection::USE_SYSTEM_TRUST_STORE, | ||
| DEFAULT_USE_CERT_STORE)); | ||
| Set(FlightSqlConnection::DISABLE_CERTIFICATE_VERIFICATION, | ||
| ReadDsnString(dsn, FlightSqlConnection::DISABLE_CERTIFICATE_VERIFICATION, | ||
| DEFAULT_DISABLE_CERT_VERIFICATION)); | ||
|
|
||
| auto customKeys = ReadAllKeys(dsn); | ||
| RemoveAllKnownKeys(customKeys); | ||
| for (auto key : customKeys) { | ||
| std::string_view key_sv(key); | ||
| Set(key, ReadDsnString(dsn, key_sv)); | ||
| } |
There was a problem hiding this comment.
Are we effectively re-parsing the DSN from scratch for each key? Can't we parse it up front into keys/values then assign?
There was a problem hiding this comment.
ReadDsnString grabs the DSN value directly from the DSN config file using Windows Driver Manager's implementation of SQLGetPrivateProfileString, which gets key/value pairs from the DSN. SQLGetPrivateProfileString is the only Driver Manager ODBC API that reads from ODBC.INI file, so it is used here.
I think the implementation of SQLGetPrivateProfileString is closed-source, so I don't know if it parses DSN upfront under the hood
| while (!connect_str.empty()) { | ||
| size_t attr_begin = connect_str.rfind(delimiter); | ||
|
|
||
| if (attr_begin == std::string::npos) |
| /// \param connPropertyMap the map with the Connection properties. | ||
| /// \return An instance of the FlightSqlSslConfig. | ||
| std::shared_ptr<FlightSqlSslConfig> LoadFlightSslConfigs( | ||
| const odbcabstraction::Connection::ConnPropertyMap& connPropertyMap); |
| int main(int argc, char** argv) { | ||
| ::testing::InitGoogleTest(&argc, argv); | ||
| return RUN_ALL_TESTS(); | ||
| } |
|
|
||
| using arrow::RecordBatch; | ||
|
|
||
| using std::optional; |
There was a problem hiding this comment.
Don't using std types, just write std::optional
|
|
||
| using std::optional; | ||
|
|
||
| class GetTablesReader { |
lidavidm
left a comment
There was a problem hiding this comment.
(in progress)
Is there anything explaining how the different sublibraries (odbcabstraction, etc.) fit together here?
| constexpr auto SYSTEM_TRUST_STORE_DEFAULT = true; | ||
| constexpr auto STORES = {"CA", "MY", "ROOT", "SPC"}; | ||
|
|
||
| inline std::string GetCerts() { |
| using odbcabstraction::MetadataSettings; | ||
| using odbcabstraction::ResultSet; | ||
|
|
||
| typedef struct { |
There was a problem hiding this comment.
Why the C-ism? Just struct ColumnNames.
| std::string curr_parse; // the current string | ||
|
|
||
| for (char temp : table_type) { // while still in the string | ||
| switch (temp) { // switch depending on the character |
There was a problem hiding this comment.
Many of these comments are just noise and aren't necessary. I would rather we have docstrings since the purpose of many functions is not necessarily clear. Another thing is that for functions which parse a value, it would be more helpful to document the format being parsed/have examples of valid values
| if (config_file.fail()) { | ||
| auto error_msg = "Arrow Flight SQL ODBC driver config file not found on \"" + | ||
| config_file_path + "\""; | ||
| std::cerr << error_msg << std::endl; |
There was a problem hiding this comment.
Let's not spam console. If we're going to do this, at least use the logger.
| ODBCStatement::ODBCStatement( | ||
| ODBCConnection& connection, | ||
| std::shared_ptr<driver::odbcabstraction::Statement> spiStatement) | ||
| : m_connection(connection), |
There was a problem hiding this comment.
We generally don't use the m_ prefix in this codebase. It should be connection_ if it's private.
| return; | ||
| case SQL_ROWSET_SIZE: | ||
| SetAttribute(value, m_rowsetSize); | ||
| return; |
There was a problem hiding this comment.
Why do some cases in this switch return, and others break?
There was a problem hiding this comment.
The cases that break set the value successfully_written which is checked after the break. The ODBC then returns a warning if the attribute is not written successfully.
| "HY092"); | ||
| } | ||
| if (!successfully_written) { | ||
| GetDiagnostics().AddWarning("Optional value changed.", "01S02", |
There was a problem hiding this comment.
Why is it a warning that an optional value changed?
There was a problem hiding this comment.
The ODBC driver is required to log a warning diagnostic record for optional value changed
|
|
||
| SQLSMALLINT evaluatedCType = cType; | ||
|
|
||
| // TODO: Get proper default precision and scale from abstraction. |
There was a problem hiding this comment.
Can we file issues for these TODOs? Is there any plan to tackle them?
|
|
||
| inline void ThrowIfNotOK(const arrow::Status& status) { | ||
| if (!status.ok()) { | ||
| throw odbcabstraction::DriverException(status.message()); |
There was a problem hiding this comment.
message will throw away other metadata like the status code and status details, perhaps use ToString
| return builder.Append(value); | ||
| } | ||
|
|
||
| odbcabstraction::SqlDataType GetDataTypeFromArrowField_V3( |
There was a problem hiding this comment.
What is the '_V3' supposed to denote?
There was a problem hiding this comment.
_V3 refers to ODBC Version 3 API backend function, whereas _V2 refers to ODBC version 2 API. Version 3 and Version 2 have slightly different behaviors.
| } | ||
|
|
||
| odbcabstraction::SqlDataType GetDataTypeFromArrowField_V3( | ||
| const std::shared_ptr<arrow::Field>& field, bool useWideChar); |
| odbcabstraction::SqlDataType EnsureRightSqlCharType( | ||
| odbcabstraction::SqlDataType data_type, bool useWideChar); | ||
|
|
||
| int16_t ConvertSqlDataTypeFromV3ToV2(int16_t data_type_v3); |
There was a problem hiding this comment.
Why untyped int16_t? Aren't there enums?
There was a problem hiding this comment.
The function is not using an enum because imo enums aren't necessary in this case. The ODBC data type returned here serves as a return output value (to the BI tools etc) only, not for working with other parts of the driver.
The ODBC data type values are fixed macros defined inside the system ODBC library (e.g., sqlext.h):
| const int32_t decimal_precison = 0); | ||
|
|
||
| optional<int32_t> GetBufferLength(odbcabstraction::SqlDataType data_type, | ||
| const optional<int32_t>& column_size); |
There was a problem hiding this comment.
IMO, it's better to just pass optional<int32_t> than pass by const& (optional<int32_t> is essentially a value type already so avoid the extra reference)
There was a problem hiding this comment.
These utils feel like a bit of a grab bag and could probably be organized better
| namespace flight_sql { | ||
|
|
||
| namespace { | ||
| bool IsComplexType(arrow::Type::type type_id) { |
There was a problem hiding this comment.
The codebase usually calls these 'nested' types. Also, there's already is_nested in type_traits.h
| case odbcabstraction::SqlDataType_LONGVARBINARY: | ||
| return "LONGVARBINARY"; | ||
| case odbcabstraction::SqlDataType_TYPE_DATE: | ||
| case 9: |
There was a problem hiding this comment.
Why isn't this part of the enum?
| } | ||
| } | ||
|
|
||
| ArrayConvertTask GetConverter(arrow::Type::type original_type_id, |
There was a problem hiding this comment.
It seems this should take a full DataType and not just the type ID? In the source it is called in a context where it has the full type. That way you don't need GetDefaultDataTypeForTypeId (Also, maybe it should be moved to flight_sql_result_set_column.cc since it is only used there?)
| } | ||
|
|
||
| int32_t GetDecimalTypeScale(const std::shared_ptr<arrow::DataType>& decimalType) { | ||
| auto decimal128Type = std::dynamic_pointer_cast<arrow::Decimal128Type>(decimalType); |
There was a problem hiding this comment.
Though frankly you probably just want checked_cast. You don't need a new shared_ptr instance here.
lidavidm
left a comment
There was a problem hiding this comment.
In general, this codebase needs to be brought in line with Arrow conventions over time
|
|
||
| /// \brief Tells if ssl is enabled. By default it will be true. | ||
| /// \return Whether ssl is enabled. | ||
| bool useEncryption() const; |
There was a problem hiding this comment.
Should be either UseEncryption or use_encryption (because this is a trivial getter), ditto for below
| bool use_wide_char_; | ||
| bool is_bound_; | ||
|
|
||
| inline Accessor* GetAccessorForBinding() { return cached_accessor_.get(); } |
There was a problem hiding this comment.
Why are these declared inline?
| } | ||
|
|
||
| Status Visit(const arrow::Int64Scalar& scalar) override { | ||
| writer_.Int64(scalar.value); |
There was a problem hiding this comment.
What is rapidJSON's behavior for int64s that are not representable as JSON numbers?
There was a problem hiding this comment.
I see from tests, rapidJSON can still convert the max and min int64 to strings, so rapidJSON seems to support all numbers within the range of (9223372036854775807,-9223372036854775808). For example the following test works:
arrow/cpp/src/arrow/flight/sql/odbc/flight_sql/json_converter_test.cc
Lines 76 to 82 in 00439d1
| Status ConvertBinaryToBase64StringAndWrite( | ||
| const BinaryScalarT& scalar, rapidjson::Writer<rapidjson::StringBuffer>& writer) { | ||
| const auto& view = scalar.view(); | ||
| size_t encoded_size = base64::encoded_size(view.length()); |
There was a problem hiding this comment.
Arrow has base64 utilities already, can we use them instead of Boost?
| namespace driver { | ||
| namespace odbcabstraction { | ||
| class Statement; | ||
| class ResultSet; | ||
| } // namespace odbcabstraction | ||
| } // namespace driver | ||
|
|
||
| namespace ODBC { | ||
| class ODBCConnection; | ||
| class ODBCDescriptor; | ||
| } // namespace ODBC |
There was a problem hiding this comment.
You may want to create type_fwd.h headers like the rest of the Arrow codebase
| #include <mutex> | ||
|
|
||
| /** | ||
| * @brief An abstraction over a generic ODBC handle. |
There was a problem hiding this comment.
N.B. this codebase typically uses \brief, not @brief, for Doxygen.
| #include <memory> | ||
| #include <string> | ||
|
|
||
| #define _SILENCE_CXX17_CODECVT_HEADER_DEPRECATION_WARNING |
There was a problem hiding this comment.
Can we get rid of codecvt? It's been deprecated/removed
There was a problem hiding this comment.
Do we really need multiple layers of logging abstractions? especially as this header still requires spdlog...
| #include <functional> | ||
| #include <string> | ||
|
|
||
| #include <spdlog/fmt/bundled/format.h> |
There was a problem hiding this comment.
Once we move to C++20 (soon), we need to use stdlib format, not poke into spdlog's internals
| namespace driver { | ||
| namespace odbcabstraction { | ||
|
|
||
| enum ODBCErrorCodes : int32_t { |
There was a problem hiding this comment.
(1) enum class
(2) Given the very specific codes, is there a reference/source for these values?
| } | ||
|
|
||
| bool FlightSqlStatement::ExecutePrepared() { | ||
| assert(prepared_statement_.get() != nullptr); |
| using arrow::Result; | ||
|
|
||
| namespace { | ||
| Result<std::shared_ptr<Array>> MakeEmptyArray(std::shared_ptr<DataType> type, |
|
|
||
| void ReportSystemFunction(const std::string& function, uint32_t& current_sys_functions, | ||
| uint32_t& current_convert_functions); | ||
| void ReportNumericFunction(const std::string& function, uint32_t& current_functions); |
There was a problem hiding this comment.
Can the purpose of these functions be documented? Even looking at the implementation it's not very clear
| if (result != system_functions.end()) { | ||
| current_sys_functions |= result->second; | ||
| } else if (function == "CONVERT") { | ||
| // CAST and CONVERT are system functions from FlightSql/Calcite, but are |
There was a problem hiding this comment.
AFAIK Flight SQL doesn't say anything about the actual SQL dialect?
There was a problem hiding this comment.
// CAST and CONVERT are system functions from FlightSql/Calcite, but are
// CONVERT functions in ODBC. Assume that if CONVERT is reported as a system
// function, then CAST and CONVERT are both supported.
current_convert_functions |= SQL_FN_CVT_CONVERT | SQL_FN_CVT_CAST;
Currently if the server returns CONVERT function support via GetSqlInfo, SQLGetInfo (ODBC API) includes SQL_FN_CVT_CONVERT | SQL_FN_CVT_CAST in the returns to indicate both convert and cast are supported.
I have reached out to @jduo for more context on this code comment. The assumption in the code comment applies to all databases. I think it is not specific to any SQL dialect.
| {"SPACE", SQL_FN_STR_SPACE}, | ||
| {"SUBSTRING", SQL_FN_STR_SUBSTRING}, | ||
| {"UCASE", SQL_FN_STR_UCASE}, | ||
| // Additional functions in ODBC but not Calcite: |
There was a problem hiding this comment.
What does Calcite have to do with this? (Or is this perhaps a Dremio implementation detail leaking out?)
There was a problem hiding this comment.
I think it isn't Dremio-specific. The Calcite's SqlJdbcFunctionCall class was referenced for this code. At the top of the file, it says
// The list of functions that can be converted from string to ODBC bitmasks is
// based on Calcite's SqlJdbcFunctionCall class.
And I think this Additional functions ... comment is related to that. There exist a few system functions supported in ODBC, but not yet implemented on Calcite at the time the code is written.
| class Diagnostics { | ||
| public: | ||
| struct DiagnosticsRecord { | ||
| std::string msg_text_; |
There was a problem hiding this comment.
This is a struct, so the fields are public - no need to suffix with _
| static const std::unique_ptr<DiagnosticsRecord> TRUNCATION_WARNING( | ||
| new DiagnosticsRecord{"String or binary data, right-truncated.", "01004", | ||
| ODBCErrorCodes_TRUNCATION_WARNING}); | ||
| warning_records_.push_back(TRUNCATION_WARNING.get()); |
There was a problem hiding this comment.
I think this one doesn't need to be a unique_ptr? It shouldn't ever get moved.
| // Block while queue is full | ||
| std::unique_lock<std::mutex> unique_lock(mtx_); | ||
| if (!WaitUntilCanPushOrClosed(unique_lock)) break; | ||
| unique_lock.unlock(); |
There was a problem hiding this comment.
Why unlock here, when Push has to lock again below? (Pass push the guard instead?)
There was a problem hiding this comment.
Shouldn't we check if closed_ and break?
| if (!WaitUntilCanPushOrClosed(unique_lock)) break; | ||
| unique_lock.unlock(); | ||
|
|
||
| // Only one thread at a time be notified and call supplier |
There was a problem hiding this comment.
If the basis for this is the use of notify_one, that doesn't actually work AFAIK
|
I'll continue reviewing tomorrow. |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit ed36107. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 4 possible false positives for unstable benchmarks that are known to sometimes produce them. |
|
Hi @jbonofre , @rscales, @alinaliBQ, Thank you! |
Hi @chrisfw . This driver currently doesn't support Utf8View (it supports the subset of Arrow types that Dremio uses) but it'd be much easier to support now since it is updated with the Arrow codebase. (Likely just a matter of updating switch statements and adding test cases). It'd be great if you added an issue for any additional types that you'd need. Note that this isn't a complete ODBC driver. It has most of the ODBC implementation logic and all the Flight SQL communication code. -- @alinaliBQ and @rscales are completing the last bits required to implement the ODBC interface. |
|
Thanks for your reply @jduo. I will definitely add an issue to request an enhancement adding the currently unsupported data types I have encounter. Regards, |
lidavidm
left a comment
There was a problem hiding this comment.
Ok, that should be the last of it.
| PACKET_SIZE, // uint32_t - The Packet Size | ||
| }; | ||
|
|
||
| typedef boost::variant<std::string, void*, uint64_t, uint32_t> Attribute; |
|
|
||
| /// \brief Statement attributes that can be called at anytime. | ||
| ////TODO: Document attributes | ||
| enum StatementAttributeId { |
| TimeoutDuration{static_cast<double>(boost::get<size_t>(value))}; | ||
| } else { | ||
| call_options_.timeout = TimeoutDuration{-1}; | ||
| // Intentional fall-through. |
| // Arbitrarily return a negative value | ||
| return -1; |
There was a problem hiding this comment.
Is this correct? Should wee error instead?
There was a problem hiding this comment.
The GetInfoTypeForArrowConvertEntry function returns the corresponding odbc SQLGetInfo field in SQLGetInfo's SQL_CONVERT_* field types for the given Arrow SqlConvert enum value, and returns -1 if there is no corresponding SQLGetInfo field. The caller of GetInfoTypeForArrowConvertEntry does a check to ensure the return value is positive, so if -1 is returned, the caller carries on without erroring out.
In other words, this function GetInfoTypeForArrowConvertEntry is trying to get the server's convert support info, and if the information is not useful to ODBC (no corresponding ODBC SQLGetInfo field), the information is ignored and ODBC doesn't error out. This approach makes it more robust and has less blockers for the user to get data.
| } | ||
|
|
||
| inline int32_t ScalarToInt32(arrow::UnionScalar* scalar) { | ||
| return reinterpret_cast<arrow::Int32Scalar*>(scalar->child_value().get())->value; |
There was a problem hiding this comment.
checked_cast?
Also, why inline?
| namespace flight_sql { | ||
| namespace config { | ||
|
|
||
| #define TRUE_STR "true" |
There was a problem hiding this comment.
...is there any actual value to having these constants?
| const arrow::flight::FlightCallOptions& call_options, | ||
| const std::shared_ptr<FlightInfo>& flight_info, size_t queue_capacity) | ||
| : queue_(queue_capacity) { | ||
| // FIXME: Endpoint iteration should consider endpoints may be at different hosts |
| const boost::xpressive::sregex CONNECTION_STR_REGEX( | ||
| boost::xpressive::sregex::compile("([^=;]+)=({.+}|[^=;]+|[^;])")); |
|
Congratulations on the merge! Got an error when trying the build it on Mac M2. ~/git/arrow/cpp/build main *1 ?1 ❯ cmake .. --preset ninja-debug-maximal |
Hi Jianfeng, installing the ODBC driver manager on your machine might help with the issue. |
|
Hello, @jmao-denver I saw this part in my Gmail. but I can't find the comment in this PR. I'm using the following steps on my macOS. You need to wait this PR merge. #46622 |
|
Hi @hiroyuki-sato, yeah I deleted it because I realized that I just needed to turn CUDA off. The build on my MBP M2 still failed with what seemed to be compatibility issues caused by the version of the compiler I was using. I then went to try the build on Windows with the latest Visual Studio Community Edition (2022). It built but running the produced arrow_odbc_spi_imp_cli.exe generated a crash. I also briefly tried to build it on ubuntu and it failed there as well. I was basically doing something like on MacOS and Ubuntu. and some additional options on Windows. |
|
Hi, @jmao-denver |
|
To be clear: what was donated and merged is not a complete ODBC driver. The contributors are still working on the rest of the necessary pieces. (Dremio's original driver was a mix of GPL and Apache licensed code, and of course the GPL licensed code can't be used here.) |
|
@lidavidm Yes that's correct. What was merged is not a complete driver. |
Rationale for this change
An ODBC driver uses the Open Database Connectivity (ODBC) interface that allow applications to access data in DBMS like environment using SQL.
As Arrow Flight provides JDBC and ADBC drivers, similarly Arrow Flight can provide ODBC driver.
What changes are included in this PR?
This PR adds an ODBC driver implementation.
Are these changes tested?
This ODBC driver is coming from a production system (SGA) that is tested/ran for a while.
Are there any user-facing changes?
No change, but new user option to use Arrow Flight.