Skip to content

Add a second importer cronjob to run in delete mode#2133

Merged
andrewpollock merged 18 commits into
google:masterfrom
andrewpollock:cronjob_gcs_delete
May 8, 2024
Merged

Add a second importer cronjob to run in delete mode#2133
andrewpollock merged 18 commits into
google:masterfrom
andrewpollock:cronjob_gcs_delete

Conversation

@andrewpollock
Copy link
Copy Markdown
Contributor

This runs the code introduced in #2030 that takes many hours to complete and isn't practical to run inline with regular imports.

Addresses #1467

This runs the code introduced in google#2030 that takes many hours to complete
and isn't practical to run inline with regular imports.

Addresses google#1467
Comment on lines +15 to +18
args:
# TODO(michaelkedar): ssh secrets
# TODO(michaelkedar): single source of truth w/ terraform config
- "--public_log_bucket=osv-test-public-import-logs"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, but this might overwrite the args in the base yaml file

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the reverse?

$ kubectl kustomize environments/oss-vdb-test/
.
.
.
apiVersion: batch/v1
kind: CronJob
metadata:
  name: importer-deleter
spec:
  concurrencyPolicy: Forbid
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - args:
            - --delete
            - --deletion_threshold=10
            image: importer
            imagePullPolicy: Always
            name: importer-deleter
            resources:
              limits:
                cpu: 1
                memory: 2G
              requests:
                cpu: 1
                memory: 1G
            securityContext:
              privileged: true
            volumeMounts:
            - mountPath: /work
              name: ssd
          nodeSelector:
            cloud.google.com/gke-nodepool: importer-pool
          restartPolicy: OnFailure
          tolerations:
          - key: workloadType
            operator: Equal
            value: importer-pool
          volumes:
          - hostPath:
              path: /mnt/disks/ssd0
            name: ssd
  schedule: '* */6 * * *'

I'll consolidate all of the args into the staging-specific file

Comment thread deployment/clouddeploy/gke-workers/base/kustomization.yaml Outdated
andrewpollock and others added 3 commits May 2, 2024 03:34
It doesn't look like arg merging between the base and the
environment-specific file works
@andrewpollock andrewpollock requested a review from michaelkedar May 2, 2024 03:42
cpu: 1
memory: "2G"
nodeSelector:
cloud.google.com/gke-nodepool: importer-pool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this importer-pool has only one node, which iirc was done so that the main importer could have some storage persist.

Could @another-rex confirm if this might be a problem?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it only has one node with only 2 CPU, which means the deleter and the importer can't be scheduled at the same time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed nodeSelector in 6f1885e

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you PTAL as of 5caf9c1?

@andrewpollock
Copy link
Copy Markdown
Contributor Author

Hmm, I think this is going to require additional tweaking based on an attempt at manually running the job: it looks like it will make the existing importer job unschedulable for the duration due to CPU limitations in the cluster?

@another-rex
Copy link
Copy Markdown
Contributor

Does the importer deleter need the checked out git repos on disk? The main reason the importer is in its own pool is to avoid having to perform a full pull of the git repositories every time it runs.

@andrewpollock
Copy link
Copy Markdown
Contributor Author

Does the importer deleter need the checked out git repos on disk? The main reason the importer is in its own pool is to avoid having to perform a full pull of the git repositories every time it runs

It doesn't, yet. Eventually deletion detection for Git repo sources will get reinstated as well, so it might be significant then?

Trying to think of the best places to put some signposting to avoid that becoming a problem later...

This is currently unnecessary (in deletion mode, Git repositories are
not processed), and prevents concurrent scheduling with regular importer
runs.
It's `--duration_threshold_pct` and express the flags in a more
reviewer-friendly way for future changes.
This way it doesn't collide with the regular importer runs, and cause
them not to schedule, and it needs more RAM.
@andrewpollock andrewpollock requested a review from michaelkedar May 7, 2024 23:54
andrewpollock and others added 2 commits May 8, 2024 00:21
Manual testing yesterday:

```
2024-05-07 16:15:53.026 AEST Begin processing bucket for deletions: cve-osv
2024-05-07 16:17:45.536 AEST Counted 41686 Bugs for cve-osv in Datastore
2024-05-07 16:17:45.591 AEST Listing blobs in gs://osv-test-cve-osv-conversion/osv-output
2024-05-07 16:17:48.846 AEST Parallel-parsing 37280 blobs in cve-osv
2024-05-07 16:59:23.708 AEST Processing 37280 parallel-parsed blobs in cve-osv
2024-05-07 16:59:24.192 AEST Counted 37278 parsed vulnerabilities (from 37280 blobs) for cve-osv
2024-05-07 17:00:00.394 AEST 5002 Bugs in Datastore considered deleted from GCS for cve-osv
2024-05-07 17:00:00.394 AEST Cowardly refusing to delete 5002 missing records from GCS for: cve-osv
```
Copy link
Copy Markdown
Member

@michaelkedar michaelkedar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though some minor nits

Comment on lines +12 to +15
tolerations:
- key: workloadType
operator: Equal
value: default-pool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't necessary.

The toleration was to allow jobs to be run on node pools with taints (e.g. the importer pool prevents anything without the importer-pool workloadType from being run on it)

The default poll has no such taints.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 56f0ee2

Comment on lines +39 to +40
nodeSelector:
- cloud.google.com/gke-nodepool: default-pool
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also isn't explicitly needed - all the other node pools have taints so it will be de facto scheduled on the default-pool

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed in 71dcb05

@andrewpollock
Copy link
Copy Markdown
Contributor Author

LGTM, though some minor nits

Thanks, deloused.

@andrewpollock andrewpollock merged commit 5aa9653 into google:master May 8, 2024
@andrewpollock andrewpollock deleted the cronjob_gcs_delete branch May 23, 2024 04:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants