Add a second importer cronjob to run in delete mode#2133
Conversation
This runs the code introduced in google#2030 that takes many hours to complete and isn't practical to run inline with regular imports. Addresses google#1467
| args: | ||
| # TODO(michaelkedar): ssh secrets | ||
| # TODO(michaelkedar): single source of truth w/ terraform config | ||
| - "--public_log_bucket=osv-test-public-import-logs" |
There was a problem hiding this comment.
I'm not sure, but this might overwrite the args in the base yaml file
There was a problem hiding this comment.
It looks like the reverse?
$ kubectl kustomize environments/oss-vdb-test/
.
.
.
apiVersion: batch/v1
kind: CronJob
metadata:
name: importer-deleter
spec:
concurrencyPolicy: Forbid
jobTemplate:
spec:
template:
spec:
containers:
- args:
- --delete
- --deletion_threshold=10
image: importer
imagePullPolicy: Always
name: importer-deleter
resources:
limits:
cpu: 1
memory: 2G
requests:
cpu: 1
memory: 1G
securityContext:
privileged: true
volumeMounts:
- mountPath: /work
name: ssd
nodeSelector:
cloud.google.com/gke-nodepool: importer-pool
restartPolicy: OnFailure
tolerations:
- key: workloadType
operator: Equal
value: importer-pool
volumes:
- hostPath:
path: /mnt/disks/ssd0
name: ssd
schedule: '* */6 * * *'
I'll consolidate all of the args into the staging-specific file
It doesn't look like arg merging between the base and the environment-specific file works
… into cronjob_gcs_delete
For now, this seems to be the way to keep everything contained to staging?
| cpu: 1 | ||
| memory: "2G" | ||
| nodeSelector: | ||
| cloud.google.com/gke-nodepool: importer-pool |
There was a problem hiding this comment.
I think this importer-pool has only one node, which iirc was done so that the main importer could have some storage persist.
Could @another-rex confirm if this might be a problem?
There was a problem hiding this comment.
Yes, it only has one node with only 2 CPU, which means the deleter and the importer can't be scheduled at the same time.
|
Hmm, I think this is going to require additional tweaking based on an attempt at manually running the job: it looks like it will make the existing importer job unschedulable for the duration due to CPU limitations in the cluster? |
|
Does the importer deleter need the checked out git repos on disk? The main reason the importer is in its own pool is to avoid having to perform a full pull of the git repositories every time it runs. |
It doesn't, yet. Eventually deletion detection for Git repo sources will get reinstated as well, so it might be significant then? Trying to think of the best places to put some signposting to avoid that becoming a problem later... |
This is currently unnecessary (in deletion mode, Git repositories are not processed), and prevents concurrent scheduling with regular importer runs.
It's `--duration_threshold_pct` and express the flags in a more reviewer-friendly way for future changes.
This way it doesn't collide with the regular importer runs, and cause them not to schedule, and it needs more RAM.
Manual testing yesterday: ``` 2024-05-07 16:15:53.026 AEST Begin processing bucket for deletions: cve-osv 2024-05-07 16:17:45.536 AEST Counted 41686 Bugs for cve-osv in Datastore 2024-05-07 16:17:45.591 AEST Listing blobs in gs://osv-test-cve-osv-conversion/osv-output 2024-05-07 16:17:48.846 AEST Parallel-parsing 37280 blobs in cve-osv 2024-05-07 16:59:23.708 AEST Processing 37280 parallel-parsed blobs in cve-osv 2024-05-07 16:59:24.192 AEST Counted 37278 parsed vulnerabilities (from 37280 blobs) for cve-osv 2024-05-07 17:00:00.394 AEST 5002 Bugs in Datastore considered deleted from GCS for cve-osv 2024-05-07 17:00:00.394 AEST Cowardly refusing to delete 5002 missing records from GCS for: cve-osv ```
michaelkedar
left a comment
There was a problem hiding this comment.
LGTM, though some minor nits
| tolerations: | ||
| - key: workloadType | ||
| operator: Equal | ||
| value: default-pool |
There was a problem hiding this comment.
This isn't necessary.
The toleration was to allow jobs to be run on node pools with taints (e.g. the importer pool prevents anything without the importer-pool workloadType from being run on it)
The default poll has no such taints.
| nodeSelector: | ||
| - cloud.google.com/gke-nodepool: default-pool |
There was a problem hiding this comment.
This also isn't explicitly needed - all the other node pools have taints so it will be de facto scheduled on the default-pool
… into cronjob_gcs_delete
Thanks, deloused. |
This runs the code introduced in #2030 that takes many hours to complete and isn't practical to run inline with regular imports.
Addresses #1467