Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

[Backport 5.1] Allow embeddings job to exclude failed files from the index#55530

Merged
coury-clark merged 2 commits into5.1from
backport-55180-to-5.1
Aug 2, 2023
Merged

[Backport 5.1] Allow embeddings job to exclude failed files from the index#55530
coury-clark merged 2 commits into5.1from
backport-55180-to-5.1

Conversation

@gl-srgr
Copy link
Contributor

@gl-srgr gl-srgr commented Aug 2, 2023

Related PRs on main:

  • 55180: Implements new feature
  • 55528: Updates migration metadata to use leaves that are present on 5.1 instead of using a leaves only available on main.

Steps for this PR:

  1. cherry-picked 55180 commit
  2. updated metadata.yaml with 5.1's resulting value from running sg migration leaves
  3. ran sg generate to confirm no other changes required

Description:

When a text input is submitted for generating embeddings the response may be null. If we attempt retries and still cannot generate embeddings for this input text then we return an error which calls for failing the entire embed repo job.

Slack thread

https://github.com/sourcegraph/sourcegraph/issues/55469

This PR introduces a configuration ExcludeChunkOnError. When set to true an embed repo job will proceed with the rest of the embed repo job when these generate embeddings errors occur. However, the file that generated the input text which received an error is excluded from the index as to avoid partially indexing the file.

I'll add more details on the first iteration of this solution and the trade offs in a separate comment.

Test plan

new tests for embed.go and embedding clients

gl-srgr and others added 2 commits August 2, 2023 11:18
When a text input is submitted for generating embeddings the response
may be null. If we attempt retries and still cannot generate embeddings
for this input text then we return an error which calls for failing the
entire embed repo job.

[Slack
thread](https://sourcegraph.slack.com/archives/C053L1AQ0BC/p1688676751106069)

[Issue](https://github.com/sourcegraph/sourcegraph/issues/55469)

This PR introduces a configuration `ExcludeChunkOnError`. When set to
true an embed repo job will proceed with the rest of the embed repo job
when these generate embeddings errors occur. However, the file that
generated the input text which received an error is excluded from the
index as to avoid partially indexing the file.

I'll add more details on the first iteration of this solution and the
trade offs in a separate comment.

<!-- All pull requests REQUIRE a test plan:
https://docs.sourcegraph.com/dev/background-information/testing_principles
-->

Embed test cases added

(cherry picked from commit b8e31fd)
@sourcegraph-bot
Copy link
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 3f0afe4...2eac4c5.

Notify File(s)
@efritz enterprise/cmd/worker/internal/embeddings/repo/handler.go

@sourcegraph-bot
Copy link
Contributor

📖 Storybook live preview

@coury-clark coury-clark merged commit 6531eec into 5.1 Aug 2, 2023
@coury-clark coury-clark deleted the backport-55180-to-5.1 branch August 2, 2023 18:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants