Skip to content

feat: Create Table abstraction on top of arrow#832

Closed
yevgenypats wants to merge 15 commits intomainfrom
feat/table_abstraction
Closed

feat: Create Table abstraction on top of arrow#832
yevgenypats wants to merge 15 commits intomainfrom
feat/table_abstraction

Conversation

@yevgenypats
Copy link
Contributor

@yevgenypats yevgenypats commented Apr 29, 2023

This is something we discussed while we were in process of migration but I wasn't able to keep everything in head so wanted to do that in two steps. Now that the 1st step is complete and arrow seems to work at least on destination side we can create a better Go abstraction (basically the same Table abstraction we have just with the arrow.DataType instead ValueType for Type) this should simplify any sdk and destination code very much as well make it more testable and more maintinable as the arrow schema abstraction is too low level to work with on the in the "day-2-day".

@github-actions github-actions bot added the feat label Apr 29, 2023
@codecov
Copy link

codecov bot commented Apr 29, 2023

Codecov Report

Patch coverage: 16.66% and project coverage change: +1.85 🎉

Comparison is base (6cbcd8e) 47.18% compared to head (6a53a7c) 49.03%.

❗ Current head 6a53a7c differs from pull request most recent head ebd4661. Consider uploading reports for the commit ebd4661 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #832      +/-   ##
==========================================
+ Coverage   47.18%   49.03%   +1.85%     
==========================================
  Files          76       40      -36     
  Lines        7841     2967    -4874     
==========================================
- Hits         3700     1455    -2245     
+ Misses       3642     1349    -2293     
+ Partials      499      163     -336     
Impacted Files Coverage Δ
clients/destination/v0/destination.go 36.12% <ø> (ø)
internal/backends/local/local.go 59.45% <ø> (ø)
schema/arrow.go 0.00% <0.00%> (-13.40%) ⬇️
schema/column.go 7.59% <0.00%> (-40.56%) ⬇️
schema/meta.go 7.14% <0.00%> (ø)
schema/resource.go 0.00% <0.00%> (-39.29%) ⬇️
schema/table.go 33.78% <0.00%> (-9.19%) ⬇️
schema/testdata.go 0.00% <0.00%> (ø)
serve/destination.go 54.61% <ø> (ø)
internal/memdb/memdb.go 83.22% <84.21%> (-0.44%) ⬇️

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@yevgenypats yevgenypats marked this pull request as ready for review April 29, 2023 14:55
@github-actions github-actions bot added feat and removed feat labels Apr 29, 2023
@github-actions
Copy link

github-actions bot commented Apr 29, 2023

⏱️ Benchmark results

Comparing with 47da73d

  • Glob-2 ns/op: 179.2 ⬆️ 0.11% increase vs. 47da73d
  • BufferedScanner-2 ns/op: 9.326 ⬆️ 0.44% increase vs. 47da73d
  • LogReader-2 ns/op: 30.46 ⬇️ 0.03% decrease vs. 47da73d

@hermanschaaf
Copy link
Member

I haven't dug into this much yet, but what is the reason for needing to copy the code from schema to schemav2? Couldn't we use the Table struct from the schema package, but introduce a function to go from Arrow schema to Table? What changed between the old version and the new version?

@yevgenypats
Copy link
Contributor Author

I haven't dug into this much yet, but what is the reason for needing to copy the code from schema to schemav2? Couldn't we use the Table struct from the schema package, but introduce a function to go from Arrow schema to Table? What changed between the old version and the new version?

@hermanschaaf Basically the reason is the Type and we need to be able to use the old one because when we introduce it on the source side we need to still be able to go to v1 Table and send it over the wire

@yevgenypats
Copy link
Contributor Author

yevgenypats commented May 1, 2023

maybe we can do schema and deprecate the old one but that would be a bit more painful. we will need to bump to v3 and then import v2

Copy link
Member

@hermanschaaf hermanschaaf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I think this will make our lives a bit easier. Left a couple of comments; I also think, before merging, we should port this to at least one destination and source just to check that it all works and see what it looks like. Maybe we will want to make some tweaks based on that

Comment on lines 25 to 31
// This struct is what we send over the wire to destination.
// We dont want to reuse the same struct as otherwise we will have to comment on fields which don't get sent over the wire but still accessible
// code wise
// type DestinationResource struct {
// TableName string `json:"table_name"`
// Data CQTypes `json:"data"`
// }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update / remove this comment so that it's correct for v2

Table *Table
// This is sorted result data by column name
data map[string]any
bldr array.Builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an array.Builder, rather than an array.RecordBuilder? 🤔 Also it seems like the builder is not publicly accessible, but I figure we need it (or rather the Record built from it) for sending over the wire?

return r.Item
}

//nolint:revive
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we'll still need the commented out code here; I think we do need it in some form, but let's remove if not.

return nil
}

func (t *Table) ToArrowSchema() *arrow.Schema {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We used this in quite a few places (e.g. the filetypes library), maybe we should keep it? We can mark it as deprecated instead.

I guess the idea here is to remove all the Arrow references from v1. If that's the way we want to go it's fine, we can update the couple of destinations that used it.

Comment on lines 196 to 197
tavlesV2 := TablesV1ToV2(tables)
if err := s.Plugin.DeleteStale(ctx, tavlesV2, req.Source, req.Timestamp.AsTime()); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tavlesV2 := TablesV1ToV2(tables)
if err := s.Plugin.DeleteStale(ctx, tavlesV2, req.Source, req.Timestamp.AsTime()); err != nil {
tablesV2 := TablesV1ToV2(tables)
if err := s.Plugin.DeleteStale(ctx, tablesV2, req.Source, req.Timestamp.AsTime()); err != nil {


// Resource represents a row in it's associated table, it carries a reference to the original item, and automatically
// generates an Id based on Table's Columns. Resource data can be accessed by the Get and Set methods
type Resource struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's update the comment on Resource here to talk about what it's for and how it's supposed to work with Arrow. For example, once all the fields have been set and finalized, I guess the Build method should be called to create an arrow.Record with a single row, which should then be accessed through some other method.

@yevgenypats yevgenypats requested a review from hermanschaaf May 1, 2023 16:37
@yevgenypats
Copy link
Contributor Author

yevgenypats commented May 1, 2023

@hermanschaaf per our discussion - this PR now might look a bit daunting but actually I simplified things and removed source just in the meanwhile - this way we wont accidentally merge it to source as v3 doesn't yet support it. dest though should be much easier and we don't have schema and schemav2 so we make the breaking change in a more goish way.

Per your suggestion I'll also open a PR for postgres and maybe another destination to test that it work and also to show how it looks like before merging

table := &Table{
Name: name,
Description: description,
Columns: columns,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple more metadata items we'll need here I think (e.g. title, pk constraint name, is incremental)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added constraint name and incremental - Where title is used? I don't think we had it in v2 .

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, title is relatively minor and no destinations use it afaik, so maybe no need to include it here. It just depends on whether we're okay with some information being lost in the conversion. Generally speaking it's used by source plugins to generate table docs.

// This is sorted result data by column name
data CQTypes
data map[string]any
bldr array.Builder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this needs to be array.RecordBuilder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kept it an interface but yeah can do RecordBuilder first and only change to interface if this will be needed.

}
return r.Set(CqIDColumn.Name, b)
}
// func (r *Resource) CalculateCQID(deterministicCQID bool) error {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably need to bring this back or delete it if it was unused

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I brought it back with panic("not implemented") as Resource is here just for some of the table definition to compile but it is not used anywhere as v3 still support only destination.

@@ -0,0 +1,245 @@
package destination
Copy link
Member

@hermanschaaf hermanschaaf May 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file should probably be renamed schemav2v3.go

@hermanschaaf
Copy link
Member

The general idea looks good from a read-through, let's see how it goes in an actual destination. The Resource type also needs to be fleshed out a bit more before it can be used in a source, it looks like.

kodiakhq bot pushed a commit that referenced this pull request May 2, 2023
This makes `pb` directory public as Im preparing for [v3](#832) and removing some code duplication that we had with versions around clients. Now old servers and clients will be able to just import an old version instead of copying all the old versions to the new SDK.
@yevgenypats yevgenypats force-pushed the feat/table_abstraction branch from 17f06b7 to 7a2f096 Compare May 2, 2023 11:21
@yevgenypats yevgenypats closed this May 9, 2023
hermanschaaf pushed a commit that referenced this pull request May 9, 2023
This upgrades the SDK to v3 so `schema.Column.Type` now holds
`arrow.DataType` and removes all CQTypes.

This also removes `source plugins because this will come in a follow-up
PR (This will ensure this sdk wont be merged into sources).

Instead of #832
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants