Inference Extension: Query the Endpoint Picker

The Endpoint Picker (EPP) is a component deployed by the Inference Platform Owner that chooses which AI workload endpoint should be the one to receive a client request. It uses the [ext_proc protocol](https://github.com/kubernetes-sigs/gateway-api-inference-extension/tree/main/docs/proposals/004-endpoint-picker-protocol) via gRPC.

When NGINX receives a client request destined for an AI workload, it needs to call to the EPP to get the proper endpoint to send to. However, NGINX cannot speak gRPC/ext_proc. We need some middleware to help with this. For our initial iteration, this probably requires two pieces: a Go application that can send the request to the EPP, and an NJS module that can initiate a subrequest from NGINX to the Go application. The flow is as follows:

NGINX -> NJS subrequest -> Go -> EPP

the EPP responds with the desired AI endpoint in a header, and then NGINX should forward the client request to that endpoint using `proxy_pass`.

This story is to utilize the Go app and finish this functionality.

If the InferencePool's [EndpointPickerFailureMode](https://gateway-api-inference-extension.sigs.k8s.io/reference/spec/#endpointpickerfailuremode) is set to FailOpen, then if the EPP isn't available, NGINX should decide which endpoint to send to (by sending to the configured upstream). It's not immediately obvious how this decision can be made in NGINX, but I'm sure it can be done. Maybe we use a map, configure a variable that will be populated with the endpoint, and if the variable is empty, then send to the configured upstream.

Acceptance Criteria:
- write an NJS module that will send a subrequest to the Go application
   - remove the `extractModel` function, as it is no longer needed
- this subrequest should have the client request's body and headers to ensure the model name is used by the EPP to determine the proper AI workload endpoint
- NGINX should `proxy_pass` the client request to the endpoint returned in the header from the EPP query (instead of the upstream for that InferencePool)
- If the EPP is unavailable, and the InferencePool's EndpointPickerFailureMode is set to FailOpen, then NGINX should just `proxy_pass` to the upstream for that InferencePool
- all existing supported filters/conditions in HTTPRoutes still work when routing to Inference workloads
     - see second part of dev notes below, as this may require an additional story if it's too much work 

## Dev notes:
- since we are likely going to have to call the NJS module using `js_content`, this means that the module takes over the request and we can't pass the endpoint value back to the calling location block in nginx to `proxy_pass`. Instead, we probably have to use `internalRedirect` in the NJS module to forward to an internal nginx location that will then `proxy_pass` to the final endpoint. This will mimic the way we currently use `httpmatches.js` to redirect to an internal location for http matching conditions. The internal location should have all the config in it (like proxy_set_header, include policies, etc), while the external location that calls `js_content` will essentially have nothing else configured in it. Something like the following, where the NJS module redirects to the internal location once it gets the endpoint:

```
location /my-ai-path {
     js_content epp.getAndSendToEndpoint;
}

location /_ngf-internal-something-unique-for-ai-path {
      internal;

      proxy_set_header Host...
      proxy_set_header Connection...
      proxy_http_version 1.1;

      include /some/policy/...

     etc, etc...

     proxy_pass $ai_endpoint;
}
```

Something else to consider (and this may require an additional ticket) is the case where a user specifies HTTP matching conditions for an InferencePool route. Currently, HTTP matching conditions already result in an NJS internal redirect, so now we are either going to need two internal redirects (nested), or somehow our epp NJS module calls the httpmatches module to perform that lookup in addition to what it's already doing.

Design doc: https://github.com/nginx/nginx-gateway-fabric/blob/main/docs/proposals/gateway-inference-extension.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inference Extension: Query the Endpoint Picker #3838

Dev notes:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inference Extension: Query the Endpoint Picker #3838

Description

Dev notes:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions