feat: Create Table abstraction on top of arrow#832
feat: Create Table abstraction on top of arrow#832yevgenypats wants to merge 15 commits intomainfrom
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #832 +/- ##
==========================================
+ Coverage 47.18% 49.03% +1.85%
==========================================
Files 76 40 -36
Lines 7841 2967 -4874
==========================================
- Hits 3700 1455 -2245
+ Misses 3642 1349 -2293
+ Partials 499 163 -336
☔ View full report in Codecov by Sentry. |
|
I haven't dug into this much yet, but what is the reason for needing to copy the code from |
@hermanschaaf Basically the reason is the |
|
maybe we can do |
hermanschaaf
left a comment
There was a problem hiding this comment.
Looks good, I think this will make our lives a bit easier. Left a couple of comments; I also think, before merging, we should port this to at least one destination and source just to check that it all works and see what it looks like. Maybe we will want to make some tweaks based on that
schemav2/resource.go
Outdated
| // This struct is what we send over the wire to destination. | ||
| // We dont want to reuse the same struct as otherwise we will have to comment on fields which don't get sent over the wire but still accessible | ||
| // code wise | ||
| // type DestinationResource struct { | ||
| // TableName string `json:"table_name"` | ||
| // Data CQTypes `json:"data"` | ||
| // } |
There was a problem hiding this comment.
Let's update / remove this comment so that it's correct for v2
schemav2/resource.go
Outdated
| Table *Table | ||
| // This is sorted result data by column name | ||
| data map[string]any | ||
| bldr array.Builder |
There was a problem hiding this comment.
Why is this an array.Builder, rather than an array.RecordBuilder? 🤔 Also it seems like the builder is not publicly accessible, but I figure we need it (or rather the Record built from it) for sending over the wire?
schemav2/resource.go
Outdated
| return r.Item | ||
| } | ||
|
|
||
| //nolint:revive |
There was a problem hiding this comment.
Not sure if we'll still need the commented out code here; I think we do need it in some form, but let's remove if not.
| return nil | ||
| } | ||
|
|
||
| func (t *Table) ToArrowSchema() *arrow.Schema { |
There was a problem hiding this comment.
We used this in quite a few places (e.g. the filetypes library), maybe we should keep it? We can mark it as deprecated instead.
I guess the idea here is to remove all the Arrow references from v1. If that's the way we want to go it's fine, we can update the couple of destinations that used it.
| tavlesV2 := TablesV1ToV2(tables) | ||
| if err := s.Plugin.DeleteStale(ctx, tavlesV2, req.Source, req.Timestamp.AsTime()); err != nil { |
There was a problem hiding this comment.
| tavlesV2 := TablesV1ToV2(tables) | |
| if err := s.Plugin.DeleteStale(ctx, tavlesV2, req.Source, req.Timestamp.AsTime()); err != nil { | |
| tablesV2 := TablesV1ToV2(tables) | |
| if err := s.Plugin.DeleteStale(ctx, tablesV2, req.Source, req.Timestamp.AsTime()); err != nil { |
schemav2/resource.go
Outdated
|
|
||
| // Resource represents a row in it's associated table, it carries a reference to the original item, and automatically | ||
| // generates an Id based on Table's Columns. Resource data can be accessed by the Get and Set methods | ||
| type Resource struct { |
There was a problem hiding this comment.
Let's update the comment on Resource here to talk about what it's for and how it's supposed to work with Arrow. For example, once all the fields have been set and finalized, I guess the Build method should be called to create an arrow.Record with a single row, which should then be accessed through some other method.
|
@hermanschaaf per our discussion - this PR now might look a bit daunting but actually I simplified things and removed Per your suggestion I'll also open a PR for postgres and maybe another destination to test that it work and also to show how it looks like before merging |
| table := &Table{ | ||
| Name: name, | ||
| Description: description, | ||
| Columns: columns, |
There was a problem hiding this comment.
Couple more metadata items we'll need here I think (e.g. title, pk constraint name, is incremental)
There was a problem hiding this comment.
Added constraint name and incremental - Where title is used? I don't think we had it in v2 .
There was a problem hiding this comment.
Yeah, title is relatively minor and no destinations use it afaik, so maybe no need to include it here. It just depends on whether we're okay with some information being lost in the conversion. Generally speaking it's used by source plugins to generate table docs.
schema/resource.go
Outdated
| // This is sorted result data by column name | ||
| data CQTypes | ||
| data map[string]any | ||
| bldr array.Builder |
There was a problem hiding this comment.
I still think this needs to be array.RecordBuilder?
There was a problem hiding this comment.
I kept it an interface but yeah can do RecordBuilder first and only change to interface if this will be needed.
schema/resource.go
Outdated
| } | ||
| return r.Set(CqIDColumn.Name, b) | ||
| } | ||
| // func (r *Resource) CalculateCQID(deterministicCQID bool) error { |
There was a problem hiding this comment.
Probably need to bring this back or delete it if it was unused
There was a problem hiding this comment.
I brought it back with panic("not implemented") as Resource is here just for some of the table definition to compile but it is not used anywhere as v3 still support only destination.
| @@ -0,0 +1,245 @@ | |||
| package destination | |||
There was a problem hiding this comment.
This file should probably be renamed schemav2v3.go
|
The general idea looks good from a read-through, let's see how it goes in an actual destination. The |
This makes `pb` directory public as Im preparing for [v3](#832) and removing some code duplication that we had with versions around clients. Now old servers and clients will be able to just import an old version instead of copying all the old versions to the new SDK.
17f06b7 to
7a2f096
Compare
This upgrades the SDK to v3 so `schema.Column.Type` now holds `arrow.DataType` and removes all CQTypes. This also removes `source plugins because this will come in a follow-up PR (This will ensure this sdk wont be merged into sources). Instead of #832
This is something we discussed while we were in process of migration but I wasn't able to keep everything in head so wanted to do that in two steps. Now that the 1st step is complete and arrow seems to work at least on destination side we can create a better Go abstraction (basically the same Table abstraction we have just with the
arrow.DataTypeinsteadValueTypeforType) this should simplify any sdk and destination code very much as well make it more testable and more maintinable as the arrow schema abstraction is too low level to work with on the in the "day-2-day".