Skip to content

Conversation

@phymbert
Copy link
Collaborator

@phymbert phymbert commented Feb 24, 2024

Context

If multiple slots are computing embedding, only the first one is updated.

Changes

Continue to update remaining slots in update_slots in the main loop on embedding task.
Test scenario moved to parallel feature.

Closes #5655

…t request.

server: tests: add multi users embeddings as fixed
@phymbert phymbert requested review from ggerganov and ngxson February 24, 2024 12:05
@phymbert phymbert added bug Something isn't working server/webui labels Feb 24, 2024
Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go 🚀

@phymbert
Copy link
Collaborator Author

I will enjoy this PR to add OAI compatible embeddings concurrent scenario

Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!

@phymbert phymbert merged commit 9e359a4 into master Feb 24, 2024
@phymbert phymbert deleted the hotfix/server-issue-5655-concurrent-embedding-final branch February 24, 2024 18:16
jordankanter pushed a commit to jordankanter/llama.cpp that referenced this pull request Mar 13, 2024
ggml-org#5699)

* server: ggml-org#5655 - continue to update other slots on embedding concurrent request.

* server: tests: add multi users embeddings as fixed

* server: tests: adding OAI compatible embedding concurrent endpoint

* server: tests: adding OAI compatible embedding with multiple inputs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working server/webui

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation fault

4 participants