Skip to content

Do not emit partially converted records#1707

Merged
andrewpollock merged 5 commits into
google:masterfrom
andrewpollock:abort_partial_conversions
Oct 10, 2023
Merged

Do not emit partially converted records#1707
andrewpollock merged 5 commits into
google:masterfrom
andrewpollock:abort_partial_conversions

Conversation

@andrewpollock
Copy link
Copy Markdown
Contributor

Rather than risk false positives and emit records where the Introduced version was mapped to a commit and the Fixed version was not, fail to convert these records. Introduce a new failure mode outcome, so these can be easily identified afterwards.

Rather than risk false positives and emit records where the Introduced
version was mapped to a commit and the Fixed version was not, fail to
convert these records. Introduce a new failure mode outcome, so these
can be easily identified afterwards.
@andrewpollock andrewpollock merged commit 5d4a9cb into google:master Oct 10, 2023
andrewpollock added a commit to andrewpollock/osv.dev that referenced this pull request Mar 5, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in google#1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in google#1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until
a new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
  returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
  listing and blob retrieval (this can happen if `combine-to-osv`
  happens to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
  performed when `ignore_last_import_time` is in effect (addressing
  head scratching TODO from @michaelkeder)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
  real-world behavior and run time (this adds ~13 minutes to an import
  run on just the CVE GCS bucket)
andrewpollock added a commit to andrewpollock/osv.dev that referenced this pull request Mar 7, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in google#1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in google#1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until
a new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
  returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
  listing and blob retrieval (this can happen if `combine-to-osv`
  happens to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
  performed when `ignore_last_import_time` is in effect (addressing
  head scratching TODO from @michaelkeder)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
  real-world behavior and run time (this adds ~13 minutes to an import
  run on just the CVE GCS bucket)
andrewpollock added a commit that referenced this pull request May 2, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that
are no longer present in GCS.

Related changes (#2023, #2029) to the NVD CVE generation cause CVEs no
longer being generated (due to changes in the heuristics like that made
in #1707) to remain existing in GCS.

This PR addresses cases like this and the need identified in #1467 by
adding a deletion phase to the importing of new/updated records. The
functionality is flag-protected, it won't go live in Production until a
new `--delete` flag is included so in the execution of the importer.

Incidental changes:

- use the GCS bucket `directory_path` to efficiently filter the blobs
returned when listing bucket contents
- make blob retrieval resilient to blob generation change between blob
listing and blob retrieval (this can happen if `combine-to-osv` happens
to have run in between these two points in time)
- fix a behavior inconsistency with schema validation not being
performed when `ignore_last_import_time` is in effect (addressing head
scratching TODO from @michaelkedar)
- tidy up the existing tests, making them more readable and debuggable
- add a slow manual test against live data in staging to validate
real-world behavior and run time (this adds ~13 minutes to an import run
on just the CVE GCS bucket)

---------

Signed-off-by: Andrew Pollock <apollock@google.com>
@andrewpollock andrewpollock deleted the abort_partial_conversions branch May 23, 2024 05:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants