This repository was archived by the owner on Sep 30, 2024. It is now read-only.
[Backport 5.1] Allow embeddings job to exclude failed files from the index#55530
Merged
coury-clark merged 2 commits into5.1from Aug 2, 2023
Merged
[Backport 5.1] Allow embeddings job to exclude failed files from the index#55530coury-clark merged 2 commits into5.1from
coury-clark merged 2 commits into5.1from
Conversation
When a text input is submitted for generating embeddings the response may be null. If we attempt retries and still cannot generate embeddings for this input text then we return an error which calls for failing the entire embed repo job. [Slack thread](https://sourcegraph.slack.com/archives/C053L1AQ0BC/p1688676751106069) [Issue](https://github.com/sourcegraph/sourcegraph/issues/55469) This PR introduces a configuration `ExcludeChunkOnError`. When set to true an embed repo job will proceed with the rest of the embed repo job when these generate embeddings errors occur. However, the file that generated the input text which received an error is excluded from the index as to avoid partially indexing the file. I'll add more details on the first iteration of this solution and the trade offs in a separate comment. <!-- All pull requests REQUIRE a test plan: https://docs.sourcegraph.com/dev/background-information/testing_principles --> Embed test cases added (cherry picked from commit b8e31fd)
Contributor
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 3f0afe4...2eac4c5.
|
Contributor
coury-clark
approved these changes
Aug 2, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Related PRs on
main:Steps for this PR:
metadata.yamlwith 5.1's resulting value from runningsg migration leavessg generateto confirm no other changes requiredDescription:
When a text input is submitted for generating embeddings the response may be null. If we attempt retries and still cannot generate embeddings for this input text then we return an error which calls for failing the entire embed repo job.
Slack thread
https://github.com/sourcegraph/sourcegraph/issues/55469
This PR introduces a configuration
ExcludeChunkOnError. When set to true an embed repo job will proceed with the rest of the embed repo job when these generate embeddings errors occur. However, the file that generated the input text which received an error is excluded from the index as to avoid partially indexing the file.I'll add more details on the first iteration of this solution and the trade offs in a separate comment.
Test plan
new tests for embed.go and embedding clients