-
Notifications
You must be signed in to change notification settings - Fork 14.1k
Closed
Labels
Description
It would be very useful to add multi-response support per slot so that a single request would be able to generate n independent completions. This functionality is useful in different situations - for example, a FIM completion can provide multiple alternative suggestions at a smaller or equal compute cost compared to running them sequentially.
I think this can be implemented by adding multiple sequence id per slot (instead of having just one like we currently do). However, I am not sure how yet much complexity would be introduced to support this.