-
Notifications
You must be signed in to change notification settings - Fork 154
Description
The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the ext_proc protocol via gRPC.
When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:
NGINX -> NJS subrequest -> Go -> EPP
the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using proxy_pass.
This story is to utilize the Go app and finish this functionality.
If the InferencePool's EndpointPickerFailureMode is set to FailOpen, then if the EPP isn't available, NGINX should decide which endpoint to send to (by sending to the configured upstream). It's not immediately obvious how this decision can be made in NGINX, but I'm sure it can be done. Maybe we use a map, configure a variable that will be populated with the endpoint, and if the variable is empty, then send to the configured upstream.
Acceptance Criteria:
- write an NJS module that will send a subrequest to the Go application
- remove the
extractModelfunction, as it is no longer needed
- remove the
- this subrequest should have the client request's body and headers to ensure the model name is used by the EPP to determine the proper AI workload endpoint
- NGINX should
proxy_passthe client request to the endpoint returned in the header from the EPP query (instead of the upstream for that InferencePool) - If the EPP is unavailable, and the InferencePool's EndpointPickerFailureMode is set to FailOpen, then NGINX should just
proxy_passto the upstream for that InferencePool - all existing supported filters/conditions in HTTPRoutes still work when routing to Inference workloads
- see second part of dev notes below, as this may require an additional story if it's too much work
Dev notes:
- since we are likely going to have to call the NJS module using
js_content, this means that the module takes over the request and we can't pass the endpoint value back to the calling location block in nginx toproxy_pass. Instead, we probably have to useinternalRedirectin the NJS module to forward to an internal nginx location that will thenproxy_passto the final endpoint. This will mimic the way we currently usehttpmatches.jsto redirect to an internal location for http matching conditions. The internal location should have all the config in it (like proxy_set_header, include policies, etc), while the external location that callsjs_contentwill essentially have nothing else configured in it. Something like the following, where the NJS module redirects to the internal location once it gets the endpoint:
location /my-ai-path {
js_content epp.getAndSendToEndpoint;
}
location /_ngf-internal-something-unique-for-ai-path {
internal;
proxy_set_header Host...
proxy_set_header Connection...
proxy_http_version 1.1;
include /some/policy/...
etc, etc...
proxy_pass $ai_endpoint;
}
Something else to consider (and this may require an additional ticket) is the case where a user specifies HTTP matching conditions for an InferencePool route. Currently, HTTP matching conditions already result in an NJS internal redirect, so now we are either going to need two internal redirects (nested), or somehow our epp NJS module calls the httpmatches module to perform that lookup in addition to what it's already doing.
Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md
Metadata
Metadata
Assignees
Labels
Type
Projects
Status