Do not emit partially converted records#1707
Merged
andrewpollock merged 5 commits intoOct 10, 2023
Merged
Conversation
Rather than risk false positives and emit records where the Introduced version was mapped to a commit and the Fixed version was not, fail to convert these records. Introduce a new failure mode outcome, so these can be easily identified afterwards.
oliverchang
approved these changes
Oct 9, 2023
andrewpollock
added a commit
to andrewpollock/osv.dev
that referenced
this pull request
Mar 5, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that are no longer present in GCS. Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no longer being generated (due to changes in the heuristics like that made in google#1707) to remain existing in GCS. This PR addresses cases like this and the need identified in google#1467 by adding a deletion phase to the importing of new/updated records. The functionality is flag-protected, it won't go live in Production until a new `--delete` flag is included so in the execution of the importer. Incidental changes: - use the GCS bucket `directory_path` to efficiently filter the blobs returned when listing bucket contents - make blob retrieval resilient to blob generation change between blob listing and blob retrieval (this can happen if `combine-to-osv` happens to have run in between these two points in time) - fix a behavior inconsistency with schema validation not being performed when `ignore_last_import_time` is in effect (addressing head scratching TODO from @michaelkeder) - tidy up the existing tests, making them more readable and debuggable - add a slow manual test against live data in staging to validate real-world behavior and run time (this adds ~13 minutes to an import run on just the CVE GCS bucket)
andrewpollock
added a commit
to andrewpollock/osv.dev
that referenced
this pull request
Mar 7, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that are no longer present in GCS. Related changes (google#2023, google#2029) to the NVD CVE generation cause CVEs no longer being generated (due to changes in the heuristics like that made in google#1707) to remain existing in GCS. This PR addresses cases like this and the need identified in google#1467 by adding a deletion phase to the importing of new/updated records. The functionality is flag-protected, it won't go live in Production until a new `--delete` flag is included so in the execution of the importer. Incidental changes: - use the GCS bucket `directory_path` to efficiently filter the blobs returned when listing bucket contents - make blob retrieval resilient to blob generation change between blob listing and blob retrieval (this can happen if `combine-to-osv` happens to have run in between these two points in time) - fix a behavior inconsistency with schema validation not being performed when `ignore_last_import_time` is in effect (addressing head scratching TODO from @michaelkeder) - tidy up the existing tests, making them more readable and debuggable - add a slow manual test against live data in staging to validate real-world behavior and run time (this adds ~13 minutes to an import run on just the CVE GCS bucket)
andrewpollock
added a commit
that referenced
this pull request
May 2, 2024
Detect and plumb through to worker for deletion from Datastore Bugs that are no longer present in GCS. Related changes (#2023, #2029) to the NVD CVE generation cause CVEs no longer being generated (due to changes in the heuristics like that made in #1707) to remain existing in GCS. This PR addresses cases like this and the need identified in #1467 by adding a deletion phase to the importing of new/updated records. The functionality is flag-protected, it won't go live in Production until a new `--delete` flag is included so in the execution of the importer. Incidental changes: - use the GCS bucket `directory_path` to efficiently filter the blobs returned when listing bucket contents - make blob retrieval resilient to blob generation change between blob listing and blob retrieval (this can happen if `combine-to-osv` happens to have run in between these two points in time) - fix a behavior inconsistency with schema validation not being performed when `ignore_last_import_time` is in effect (addressing head scratching TODO from @michaelkedar) - tidy up the existing tests, making them more readable and debuggable - add a slow manual test against live data in staging to validate real-world behavior and run time (this adds ~13 minutes to an import run on just the CVE GCS bucket) --------- Signed-off-by: Andrew Pollock <apollock@google.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rather than risk false positives and emit records where the Introduced version was mapped to a commit and the Fixed version was not, fail to convert these records. Introduce a new failure mode outcome, so these can be easily identified afterwards.