Skip to content

Make the copying more resilient#2029

Merged
andrewpollock merged 1 commit into
google:masterfrom
andrewpollock:copy_safer
Mar 5, 2024
Merged

Make the copying more resilient#2029
andrewpollock merged 1 commit into
google:masterfrom
andrewpollock:copy_safer

Conversation

@andrewpollock
Copy link
Copy Markdown
Contributor

Some filenames wind up with an embedded quote character or are very long due to the CPE vendor/product name and this seemed to cause some copy operations to fail.

Some filenames wind up with an embedded quote character or are very long
due to the CPE vendor/product name and this seemed to cause some copy
operations to fail.
@andrewpollock andrewpollock enabled auto-merge (squash) March 4, 2024 06:24
@andrewpollock andrewpollock merged commit c938a66 into google:master Mar 5, 2024
andrewpollock added a commit to andrewpollock/osv.dev that referenced this pull request Mar 5, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in google#1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in google#1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until
a new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
  returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
  listing and blob retrieval (this can happen if `combine-to-osv`
  happens to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
  performed when `ignore_last_import_time` is in effect (addressing
  head scratching TODO from @michaelkeder)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
  real-world behavior and run time (this adds ~13 minutes to an import
  run on just the CVE GCS bucket)
andrewpollock added a commit to andrewpollock/osv.dev that referenced this pull request Mar 7, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in google#1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in google#1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until
a new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
  returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
  listing and blob retrieval (this can happen if `combine-to-osv`
  happens to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
  performed when `ignore_last_import_time` is in effect (addressing
  head scratching TODO from @michaelkeder)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
  real-world behavior and run time (this adds ~13 minutes to an import
  run on just the CVE GCS bucket)
andrewpollock added a commit that referenced this pull request May 2, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (#2023, #2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in #1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in #1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until a
new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
listing and blob retrieval (this can happen if `combine-to-osv` happens
to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
performed when `ignore_last_import_time` is in effect (addressing head
scratching TODO from @michaelkedar)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
real-world behavior and run time (this adds ~13 minutes to an import run
on just the CVE GCS bucket)

---------

Signed-off-by: Andrew Pollock <apollock@google.com>
@andrewpollock andrewpollock deleted the copy_safer branch May 23, 2024 05:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants