Skip to content

Splitting a single logical resource across multiple files (was: Group of resources + remote schema) #572

@paulgirard

Description

@paulgirard

Dear all,

Resources can be grouped. To leverage splitting data into multiple files having the same structure. I need this to cut my dataset by archival sources used. First because it's easier to version, then it's easier when editions are needed to correct (not have to open a HUGE CSV), last it makes validations errors located to more precise files and finally it makes much more sense for a dataset which has been created by transcribing many different archival books.

About resources group, I think it's kind of accepted that for a set of resources gathered in one same group, the schema can be set only for the first resource of the group.
If not set in specs (there are not specs for group as far as I know), I've seen code which load package that way.

My current issue is that with a package which has a group of resources, if a schema is indicated for all the resources of the group (which are numerous in my case > 1000) then the lib will load the same schema as many times as the number or resources.
Which is not ideal... above all if the basepath is actually remote...
Thus I see two ways out :

  • a change my package by removing schema in my grouped resources but the first one
  • I update the datapackage-js to load only the first schema is resources are in a group

The affected datapackage : https://github.com/medialab/ricardo_data/blob/master/datapackage.json

I am about to try the first path by removing the schema in all but first grouped resource.
Changing datapackage-js to add a special behaviour for group should not be that hard.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions