Skip to content

Inference Extension: Query the Endpoint Picker #3838

@sjberman

Description

@sjberman

The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the ext_proc protocol via gRPC.

When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:

NGINX -> NJS subrequest -> Go -> EPP

the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using proxy_pass.

This story is to utilize the Go app and finish this functionality.

If the InferencePool's EndpointPickerFailureMode is set to FailOpen, then if the EPP isn't available, NGINX should decide which endpoint to send to (by sending to the configured upstream). It's not immediately obvious how this decision can be made in NGINX, but I'm sure it can be done. Maybe we use a map, configure a variable that will be populated with the endpoint, and if the variable is empty, then send to the configured upstream.

Acceptance Criteria:

  • write an NJS module that will send a subrequest to the Go application
    • remove the extractModel function, as it is no longer needed
  • this subrequest should have the client request's body and headers to ensure the model name is used by the EPP to determine the proper AI workload endpoint
  • NGINX should proxy_pass the client request to the endpoint returned in the header from the EPP query (instead of the upstream for that InferencePool)
  • If the EPP is unavailable, and the InferencePool's EndpointPickerFailureMode is set to FailOpen, then NGINX should just proxy_pass to the upstream for that InferencePool
  • all existing supported filters/conditions in HTTPRoutes still work when routing to Inference workloads
    • see second part of dev notes below, as this may require an additional story if it's too much work

Dev notes:

  • since we are likely going to have to call the NJS module using js_content, this means that the module takes over the request and we can't pass the endpoint value back to the calling location block in nginx to proxy_pass. Instead, we probably have to use internalRedirect in the NJS module to forward to an internal nginx location that will then proxy_pass to the final endpoint. This will mimic the way we currently use httpmatches.js to redirect to an internal location for http matching conditions. The internal location should have all the config in it (like proxy_set_header, include policies, etc), while the external location that calls js_content will essentially have nothing else configured in it. Something like the following, where the NJS module redirects to the internal location once it gets the endpoint:
location /my-ai-path {
     js_content epp.getAndSendToEndpoint;
}

location /_ngf-internal-something-unique-for-ai-path {
      internal;

      proxy_set_header Host...
      proxy_set_header Connection...
      proxy_http_version 1.1;

      include /some/policy/...

     etc, etc...

     proxy_pass $ai_endpoint;
}

Something else to consider (and this may require an additional ticket) is the case where a user specifies HTTP matching conditions for an InferencePool route. Currently, HTTP matching conditions already result in an NJS internal redirect, so now we are either going to need two internal redirects (nested), or somehow our epp NJS module calls the httpmatches module to perform that lookup in addition to what it's already doing.

Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md

Metadata

Metadata

Assignees

Labels

area/inference-extensionRelated to the Gateway API Inference ExtensionenhancementNew feature or requestrefinedRequirements are refined and the issue is ready to be implemented.size/largeEstimated to be completed within two weeks

Type

No type

Projects

Status

✅ Done

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions