Last updated: 11-10-2025
The Nutanix Enterprise AI (NAI) Inference API reference describes the RESTful and streaming APIs you can use to interact with the Inference Endpoints deployed on the NAI platform. The REST API is currently in version 1 (v1).
The best practice for ensuring consistent behaviour and output from a model is to use a pinned model version. Model behaviours tend to change between versions.
All API users should have valid API Key credentials to send API calls to the Inference Endpoint. The Inference APIs are compliant with OpenAI API specifications, where applicable. Nutanix has added new tags to support models from other model catalogue providers. You can invoke these APIs using the OpenAI API clients.
For more information on OpenAI Spec, see https://github.com/openai/openai-openapi/
The Nutanix Enterprise AI (NAI) Management API reference describes the RESTful APIs available to manage entities on the NAI platform. These APIs are currently in the experimental stage.
The API documentation is publicly accessible to all valid users without requiring special permissions, and is intended for viewing purposes only. This document covers the Inference and Management APIs available in the Nutanix Enterprise AI 2.5 release.
create a new API key
new API key create request object
| endpoints | Array of strings |
| name required | string |
| unifiedEndpoints | Array of strings Used only when gateway mode is enabled |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/apikeys \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "endpoints": [ "string" ], "name": "string", "unifiedEndpoints": [ "string" ] }'
{- "data": {
- "id": "string",
- "key": "string"
}, - "msg": "string"
}delete API key with ID
| apikey_id required | string API key ID |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/apikeys/{apikey_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}list apikeys
| limit | integer limit |
| offset | integer offset |
| owner_id | string owner ID for which the API keys have to be returned |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/apikeys \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "apikeys": [
- {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "endpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "id": "string",
- "key": "string",
- "name": "string",
- "status": "string",
- "unifiedEndpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}search API keys
list options object
required | Array of objects (dto.FilterOptions) |
| limit required | integer |
| offset required | integer >= 0 |
required | Array of objects (dto.SortOptions) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/apikeys/search \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "filters": [ { "field": "string", "operation": "startsWith", "values": [ "string" ] } ], "limit": 0, "offset": 0, "sort": [ { "field": "string", "order": "ASCENDING" } ] }'
{- "data": {
- "apikeys": [
- {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "endpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "id": "string",
- "key": "string",
- "name": "string",
- "status": "string",
- "unifiedEndpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}update API key with status and endpoint ID list
| apikey_id required | string API key ID |
API key update request object
| endpoints | Array of strings |
| status | string |
| unifiedEndpoints | Array of strings Used only when gateway mode is enabled |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/apikeys/{apikey_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "endpoints": [ "string" ], "status": "string", "unifiedEndpoints": [ "string" ] }'
{- "data": {
- "id": "string",
- "key": "string"
}, - "msg": "string"
}search audit logs
list options object
required | Array of objects (dto.FilterOptions) |
| limit required | integer |
| offset required | integer >= 0 |
required | Array of objects (dto.SortOptions) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/auditlogs \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "filters": [ { "field": "string", "operation": "startsWith", "values": [ "string" ] } ], "limit": 0, "offset": 0, "sort": [ { "field": "string", "order": "ASCENDING" } ] }'
{- "data": {
- "auditlogs": [
- {
- "createdAt": "string",
- "description": "string",
- "entityAffected": "string",
- "entityType": "User",
- "eventType": "Create",
- "id": "string",
- "userName": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}get a catalog by ID
| catalog_id required | string catalog ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs/{catalog_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "adminEnabled": true,
- "contextLength": 0,
- "createdAt": "string",
- "createdBy": "string",
- "deprecated": true,
- "description": "string",
- "developer": "string",
- "id": "string",
- "license": "string",
- "modelName": "string",
- "modelRevision": "string",
- "modelSizeInGB": 0,
- "modelType": "Text Generation",
- "modelUrl": "string",
- "quantization": "float8",
- "runtimes": [
- {
- "modelCapabilities": [
- "string"
], - "name": "vllm",
- "supportedPlatforms": [
- "nvidia-gpu-passthrough"
]
}
], - "sourceHub": "HuggingFace",
- "tokenRequired": true,
- "updatedAt": "string",
- "validated": true
}, - "msg": "string"
}create a new catalog
new create catalog request object
| additionalProperties | object (dto.CatalogAdditionalProperties) |
| adminEnabled required | boolean |
| contextLength required | integer |
| description required | string |
| developer required | string |
| license | string |
| modelName required | string ModelName represents the HuggingFace RepoID |
| modelRevision required | string |
| modelSizeInGB required | integer |
| modelType required | string (enum.ModelType) Enum: "Text Generation" "Embedding" "Reranker" "Safety" "Vision" "Image Generation" "Image Classification" "Object Detection" "Custom" |
| modelUrl required | string |
| quantization required | string (enum.Quantization) Enum: "float8" "float16" "bfloat16" |
required | Array of objects (dto.CreateRuntimeRequest) |
| sourceHub required | string (enum.CatalogSourceHub) Enum: "HuggingFace" "NvidiaNIM" |
| tokenRequired required | boolean |
| validated required | boolean |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "additionalProperties": {}, "adminEnabled": true, "contextLength": 0, "description": "string", "developer": "string", "license": "string", "modelName": "string", "modelRevision": "string", "modelSizeInGB": 0, "modelType": "Text Generation", "modelUrl": "string", "quantization": "float8", "runtimes": [ { "additionalProperties": { "additionalArgs": { "hfTransformersArgs": { "maxNumTokens": 0 }, "tgiArgs": { "customArgs": { "property1": "string", "property2": "string" }, "maxNumTokens": 0 }, "vllmArgs": { "customArgs": { "property1": "string", "property2": "string" }, "maxNumTokens": 0 } }, "additionalEnvs": [ { "name": "string", "value": "string" } ], "fetchNimFromHf": true, "nimGpuConfig": { "property1": [ { "contextLength": -1, "gpuCount": 0 } ], "property2": [ { "contextLength": -1, "gpuCount": 0 } ] }, "tgiGpuConfig": { "property1": [ { "contextLength": -1, "gpuCount": 0 } ], "property2": [ { "contextLength": -1, "gpuCount": 0 } ] } }, "image": "string", "modelCapabilities": [ "string" ], "name": "vllm-nvidia-gpu-passthrough", "resources": { "acceleratorMemory": { "kvCachePerToken": null, "modelWeights": null, "textActivationsPerToken": null, "visionActivationsPerImagePerSeq": null }, "cpu": 0, "ram": null } } ], "sourceHub": "HuggingFace", "tokenRequired": true, "validated": true }'
{- "data": {
- "id": "string"
}, - "msg": "string"
}delete catalog by ID
| catalog_id required | string catalog ID |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs/{catalog_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}get catalogs by limit and offset
| model_name | string model name |
| deprecated | boolean filter catalog items on deprecated field |
| source_hub | string Enum: "HuggingFace" "NvidiaNIM" filter catalog items on source hub |
| limit | integer limit |
| offset | integer offset |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "catalogs": [
- {
- "adminEnabled": true,
- "contextLength": 0,
- "createdAt": "string",
- "createdBy": "string",
- "deprecated": true,
- "description": "string",
- "developer": "string",
- "id": "string",
- "license": "string",
- "modelName": "string",
- "modelRevision": "string",
- "modelSizeInGB": 0,
- "modelType": "Text Generation",
- "modelUrl": "string",
- "quantization": "float8",
- "runtimes": [
- {
- "modelCapabilities": [
- "string"
], - "name": "vllm",
- "supportedPlatforms": [
- "nvidia-gpu-passthrough"
]
}
], - "sourceHub": "HuggingFace",
- "tokenRequired": true,
- "updatedAt": "string",
- "validated": true
}
], - "totalCount": 0
}, - "msg": "string"
}get requirements of a catalog entry
new catalog requirements object
| acceleratorMemory | number >= 0 |
| acceleratorProduct | string |
| engine | string (enum.BaseEngine) Enum: "vllm" "tgi" "nim" "hf-transformers" "custom-model-server" |
| modelName required | string |
| platform required | string (enum.Platform) Enum: "nvidia-gpu-passthrough" "nvidia-vgpu" "nvidia-mig" "amd-gpu-passthrough" "intel-amx-cpu" "cpu" |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs/requirements \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "acceleratorMemory": null, "acceleratorProduct": "string", "engine": "vllm", "modelName": "string", "platform": "nvidia-gpu-passthrough" }'
{- "data": {
- "modelName": "string",
- "modelRevision": "string",
- "modelSizeInGB": 0,
- "modelType": "Text Generation",
- "resourceTable": {
- "property1": [
- {
- "acceleratorCount": 0,
- "contextLength": 0,
- "cpu": 0,
- "ram": 0
}
], - "property2": [
- {
- "acceleratorCount": 0,
- "contextLength": 0,
- "cpu": 0,
- "ram": 0
}
]
}, - "sourceHub": "HuggingFace"
}, - "msg": "string"
}search catalogs
list options object
required | Array of objects (dto.FilterOptions) |
| limit required | integer |
| offset required | integer >= 0 |
required | Array of objects (dto.SortOptions) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs/search \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "filters": [ { "field": "string", "operation": "startsWith", "values": [ "string" ] } ], "limit": 0, "offset": 0, "sort": [ { "field": "string", "order": "ASCENDING" } ] }'
{- "data": {
- "catalogs": [
- {
- "adminEnabled": true,
- "contextLength": 0,
- "createdAt": "string",
- "createdBy": "string",
- "deprecated": true,
- "description": "string",
- "developer": "string",
- "id": "string",
- "license": "string",
- "modelName": "string",
- "modelRevision": "string",
- "modelSizeInGB": 0,
- "modelType": "Text Generation",
- "modelUrl": "string",
- "quantization": "float8",
- "runtimes": [
- {
- "modelCapabilities": [
- "string"
], - "name": "vllm",
- "supportedPlatforms": [
- "nvidia-gpu-passthrough"
]
}
], - "sourceHub": "HuggingFace",
- "tokenRequired": true,
- "updatedAt": "string",
- "validated": true
}
], - "totalCount": 0
}, - "msg": "string"
}Update one or more catalog entries.
List of catalog entries to update
| adminEnabled | boolean |
| catalogId required | string |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/catalogs \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": null,
- "msg": "string",
- "status": "Success"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/cluster/health \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "inconsistentResources": [
- {
- "id": "string",
- "name": "string",
- "resourceType": "HfToken"
}
]
}, - "msg": "string"
}retrieves information about the kubernetes cluster
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/cluster/info \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \
{- "data": {
- "acceleratorCount": "string",
- "acceleratorMemory": "string",
- "cpu": "string",
- "disk": "string",
- "memory": "string",
- "name": "string",
- "nodes": [
- {
- "accelerators": [
- {
- "count": "string",
- "machine": "string",
- "memory": "string",
- "platform": "nvidia-gpu-passthrough",
- "product": "string",
- "type": "GPU Passthrough"
}
], - "cpu": {
- "count": "string",
- "memory": "string",
- "platform": "nvidia-gpu-passthrough"
}, - "disk": "string",
- "ip": "string",
- "name": "string",
- "status": "string",
- "version": "string"
}
], - "status": "string",
- "version": "string"
}, - "msg": "string"
}retrieves information about the Cluster Configurations
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/cluster/config \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \
{- "data": {
- "eula": {
- "accepted": true,
- "company": "string",
- "name": "string",
- "updatedAt": "string"
}, - "hfUrlImport": {
- "enabled": true
}, - "manualUpload": {
- "enabled": true
}, - "proxySettings": {
- "name": "string",
- "password": "string",
- "port": 1,
- "protocols": [
- "string"
], - "proxyAddress": "string",
- "username": "string"
}, - "pulse": {
- "accepted": true,
- "updatedAt": "string"
}, - "rsyslogServer": {
- "logConfigs": [
- {
- "logSeverity": 7,
- "logType": "audit"
}
], - "name": "string",
- "port": 1,
- "protocol": "string",
- "serverAddress": "string"
}
}, - "msg": "string"
}restore the cluster from database
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/cluster/restore \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": null,
- "msg": "string"
}update cluster configuration status
new cluster config update request object
object (dto.EULAAcceptRequest) | |
object (dto.HfURLImportEnableUpdateRequest) | |
object (dto.ManualUploadEnableUpdateRequest) | |
object (dto.ProxySettings) | |
object (dto.PulseUpdateRequest) | |
object (dto.RsyslogServerInfo) |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/cluster/config \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "eula": { "accepted": true, "company": "string", "name": "string" }, "hfUrlImport": { "enabled": true }, "manualUpload": { "enabled": true }, "proxySettings": { "name": "string", "password": "string", "port": 1, "protocols": [ "string" ], "proxyAddress": "string", "username": "string" }, "pulse": { "accepted": true }, "rsyslogServer": { "logConfigs": [ { "logSeverity": 0, "logType": "audit" } ], "name": "string", "port": 1, "protocol": "string", "serverAddress": "string" } }'
{- "data": null,
- "msg": "string"
}create a new credential
new create credential request object
required | object |
| name required | string |
| type required | string (enum.CredentialType) Enum: "hf" "ngc" "s3" "provider" |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/credentials \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "data": { "property1": "string", "property2": "string" }, "name": "string", "type": "hf" }'
{- "data": {
- "id": "string"
}, - "msg": "string"
}delete an existing credential
| credential_id required | string credential ID |
| force | boolean force delete |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/credentials/{credential_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}get an existing credential by ID
| credential_id required | string credential ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/credentials/{credential_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "createdAt": "string",
- "data": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "name": "string",
- "owner": "string",
- "type": "hf",
- "updatedAt": "string"
}, - "msg": "string"
}list credentials
| type | Array of strings type(s) of the credential |
| limit | integer limit |
| offset | integer offset |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/credentials \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "secrets": [
- {
- "createdAt": "string",
- "data": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "name": "string",
- "owner": "string",
- "type": "hf",
- "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}update an existing credential
| credential_id required | string credential ID |
updated credential object
object | |
| name | string |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/credentials/{credential_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "data": { "property1": "string", "property2": "string" }, "name": "string" }'
{- "msg": "string"
}The chat completions API generates a model response from a list of messages comprising a conversation. This request creates a model response for the given chat conversation.
| Authorization required | string Default: Bearer <Add access token here> API key for authentication |
Parameters required to create the chat completion request.
object ChatTemplateKwargs provides a way to add non-standard parameters to the request body. Additional kwargs to pass to the template renderer. Will be accessible by the chat template. Such as think mode for qwen3. "chat_template_kwargs": {"enable_thinking": false} https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes | |
| frequency_penalty | number |
| function_call | any Deprecated: use ToolChoice instead. |
Array of objects (openai.FunctionDefinition) Deprecated: use Tools instead. | |
object LogitBias is must be a token id string (specified by their token ID in the tokenizer), not a word string.
incorrect: | |
| logprobs | boolean LogProbs indicates whether to return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message. This option is currently not available on the gpt-4-vision-preview model. |
| max_completion_tokens | integer MaxCompletionTokens An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens https://platform.openai.com/docs/guides/reasoning |
| max_tokens | integer MaxTokens The maximum number of tokens that can be generated in the chat completion. This value can be used to control costs for text generated via API. This value is now deprecated in favor of max_completion_tokens, and is not compatible with o1 series models. refs: https://platform.openai.com/docs/api-reference/chat/create#chat-create-max_tokens |
Array of objects (openai.ChatCompletionMessage) | |
object Metadata to store with the completion. | |
| model | string |
| n | integer |
| parallel_tool_calls | any Disable the default behavior of parallel tool calls by setting it: false. |
object Configuration for a predicted output. | |
| presence_penalty | number |
| reasoning_effort | string Controls effort on reasoning for reasoning models. It can be set to "low", "medium", or "high". |
object (openai.ChatCompletionResponseFormat) | |
| seed | integer |
| service_tier | string Enum: "auto" "default" "flex" "priority" Specifies the latency tier to use for processing the request. |
| stop | Array of strings |
| store | boolean Store can be set to true to store the output of this completion request for use in distillations and evals. https://platform.openai.com/docs/api-reference/chat/create#chat-create-store |
| stream | boolean |
object Options for streaming response. Only set this when you set stream: true. | |
| temperature | number |
| tool_choice | any This can be either a string or an ToolChoice object. |
Array of objects (openai.Tool) | |
| top_logprobs | integer TopLogProbs is an integer between 0 and 5 specifying the number of most likely tokens to return at each token position, each with an associated log probability. logprobs must be set to true if this parameter is used. |
| top_p | number |
| user | string |
curl --request POST \ --url https://www.nutanix.dev//enterpriseai/v1/chat/completions \ --header 'Accept: application/json' \ --header 'Authorization: Bearer <Add access token here>' \ --header 'Content-Type: application/json' \ --data '{ "chat_template_kwargs": { "property1": {}, "property2": {} }, "frequency_penalty": null, "function_call": null, "functions": [ { "description": "string", "name": "string", "parameters": null, "strict": true } ], "logit_bias": { "property1": 0, "property2": 0 }, "logprobs": true, "max_completion_tokens": 0, "max_tokens": 0, "messages": [ { "content": "string", "function_call": { "arguments": "string", "name": "string" }, "multiContent": [ { "image_url": { "detail": "high", "url": "string" }, "text": "string", "type": "text" } ], "name": "string", "reasoning_content": "string", "refusal": "string", "role": "string", "tool_call_id": "string", "tool_calls": [ { "function": { "arguments": "string", "name": "string" }, "id": "string", "index": 0, "type": "function" } ] } ], "metadata": { "property1": "string", "property2": "string" }, "model": "string", "n": 0, "parallel_tool_calls": null, "prediction": null, "presence_penalty": null, "reasoning_effort": "string", "response_format": { "json_schema": { "description": "string", "name": "string", "schema": null, "strict": true }, "type": "json_object" }, "seed": 0, "service_tier": null, "stop": [ "string" ], "store": true, "stream": true, "stream_options": null, "temperature": null, "tool_choice": null, "tools": [ { "function": { "description": "string", "name": "string", "parameters": null, "strict": true }, "type": "function" } ], "top_logprobs": 0, "top_p": null, "user": "string" }'
{- "choices": [
- {
- "content_filter_results": {
- "hate": {
- "filtered": true,
- "severity": "string"
}, - "jailbreak": {
- "detected": true,
- "filtered": true
}, - "profanity": {
- "detected": true,
- "filtered": true
}, - "self_harm": {
- "filtered": true,
- "severity": "string"
}, - "sexual": {
- "filtered": true,
- "severity": "string"
}, - "violence": {
- "filtered": true,
- "severity": "string"
}
}, - "finish_reason": "string",
- "index": 0,
- "logprobs": {
- "content": [
- {
- "bytes": [
- 0
], - "logprob": 0,
- "token": "string",
- "top_logprobs": [
- {
- "bytes": [
- 0
], - "logprob": 0,
- "token": "string"
}
]
}
]
}, - "message": {
- "content": "string",
- "function_call": {
- "arguments": "string",
- "name": "string"
}, - "multiContent": [
- {
- "image_url": {
- "detail": "high",
- "url": "string"
}, - "text": "string",
- "type": "text"
}
], - "name": "string",
- "reasoning_content": "string",
- "refusal": "string",
- "role": "string",
- "tool_call_id": "string",
- "tool_calls": [
- {
- "function": {
- "arguments": "string",
- "name": "string"
}, - "id": "string",
- "index": 0,
- "type": "function"
}
]
}
}
], - "created": 0,
- "id": "string",
- "model": "string",
- "object": "string",
- "prompt_filter_results": [
- {
- "content_filter_results": {
- "hate": {
- "filtered": true,
- "severity": "string"
}, - "jailbreak": {
- "detected": true,
- "filtered": true
}, - "profanity": {
- "detected": true,
- "filtered": true
}, - "self_harm": {
- "filtered": true,
- "severity": "string"
}, - "sexual": {
- "filtered": true,
- "severity": "string"
}, - "violence": {
- "filtered": true,
- "severity": "string"
}
}, - "index": 0
}
], - "service_tier": "auto",
- "system_fingerprint": "string",
- "usage": {
- "completion_tokens": 0,
- "completion_tokens_details": {
- "accepted_prediction_tokens": 0,
- "audio_tokens": 0,
- "reasoning_tokens": 0,
- "rejected_prediction_tokens": 0
}, - "prompt_tokens": 0,
- "prompt_tokens_details": {
- "audio_tokens": 0,
- "cached_tokens": 0
}, - "total_tokens": 0
}
}The embedding API generates a vector representation of a given input that can be easily consumed by machine learning models and algorithms. This request creates an embedding vector representing the input text.
| Authorization required | string Default: Bearer <Add access token here> API key for authentication |
Parameters required to generate a request.
| dimensions | integer Dimensions The number of dimensions the resulting output embeddings should have. Only supported in text-embedding-3 and later models. |
| encoding_format | string (openai.EmbeddingEncodingFormat) Enum: "float" "base64" |
| input | any |
| input_type | string |
| model | string (openai.EmbeddingModel) Enum: "text-similarity-ada-001" "text-similarity-babbage-001" "text-similarity-curie-001" "text-similarity-davinci-001" "text-search-ada-doc-001" "text-search-ada-query-001" "text-search-babbage-doc-001" "text-search-babbage-query-001" "text-search-curie-doc-001" "text-search-curie-query-001" "text-search-davinci-doc-001" "text-search-davinci-query-001" "code-search-ada-code-001" "code-search-ada-text-001" "code-search-babbage-code-001" "code-search-babbage-text-001" "text-embedding-ada-002" "text-embedding-3-small" "text-embedding-3-large" |
| truncate | string |
| user | string |
curl --request POST \ --url https://www.nutanix.dev//enterpriseai/v1/embeddings \ --header 'Accept: application/json' \ --header 'Authorization: Bearer <Add access token here>' \ --header 'Content-Type: application/json' \ --data '{ "dimensions": 0, "encoding_format": "float", "input": null, "input_type": "string", "model": "text-similarity-ada-001", "truncate": "string", "user": "string" }'
{- "data": [
- {
- "embedding": [
- 0
], - "index": 0,
- "object": "string"
}
], - "model": "text-similarity-ada-001",
- "object": "string",
- "usage": {
- "completion_tokens": 0,
- "completion_tokens_details": {
- "accepted_prediction_tokens": 0,
- "audio_tokens": 0,
- "reasoning_tokens": 0,
- "rejected_prediction_tokens": 0
}, - "prompt_tokens": 0,
- "prompt_tokens_details": {
- "audio_tokens": 0,
- "cached_tokens": 0
}, - "total_tokens": 0
}
}The image generation API generates images from a text prompt.
| Authorization required | string Default: Bearer <Add access token here> API key for authentication |
Parameters required to create the image generation request
| background | string |
| cfg_scale | number |
| inference_steps | integer |
| model | string |
| moderation | string |
| n | integer |
| output_compression | integer |
| output_format | string |
| prompt | string |
| quality | string |
| response_format | string |
| seed | integer |
| size | string |
| style | string |
| user | string |
curl --request POST \ --url https://www.nutanix.dev//enterpriseai/v1/images/generations \ --header 'Accept: application/json' \ --header 'Authorization: Bearer <Add access token here>' \ --header 'Content-Type: application/json' \ --data '{ "background": "string", "cfg_scale": null, "inference_steps": 0, "model": "string", "moderation": "string", "n": 0, "output_compression": 0, "output_format": "string", "prompt": "string", "quality": "string", "response_format": "string", "seed": 0, "size": "string", "style": "string", "user": "string" }'
{- "artifacts": [
- {
- "base64": "string",
- "finishReason": "string",
- "seed": 0
}
], - "created": 0,
- "data": [
- {
- "b64_json": "string",
- "revised_prompt": "string",
- "url": "string"
}
], - "usage": {
- "input_tokens": 0,
- "input_tokens_details": {
- "image_tokens": 0,
- "text_tokens": 0
}, - "output_tokens": 0,
- "total_tokens": 0
}
}Rerank API turns user input into text chunks. Every chunk will include the query and a portion of the document text. Chunk size depends on the model.
| Authorization required | string Default: Bearer <Add access token here> API key for authentication |
Parameters required to create the rerank request
required | Array of objects (dto.Document) |
| model required | string |
| query required | string |
| return_documents | boolean |
| top_n | integer >= 0 |
| truncate | boolean |
curl --request POST \ --url https://www.nutanix.dev//enterpriseai/v1/rerank \ --header 'Accept: application/json' \ --header 'Authorization: Bearer <Add access token here>' \ --header 'Content-Type: application/json' \ --data '{ "documents": [ { "text": "string" } ], "model": "string", "query": "string", "return_documents": true, "top_n": 0, "truncate": true }'
{- "meta": {
- "tokens": {
- "input_tokens": 0,
- "output_tokens": 0
}
}, - "results": [
- {
- "document": "string",
- "index": 0,
- "relevance_score": 0
}
]
}Lists the currently available models and provides basic information, such as the owner and availability
| Authorization required | string Default: Bearer <Add access token here> API key for authentication |
curl --request GET \ --url https://www.nutanix.dev//enterpriseai/v1/models \ --header 'Accept: application/json' \ --header 'Authorization: Bearer <Add access token here>' \ --header 'Content-Type: application/json' \
{- "data": [
- {
- "created": 0,
- "id": "string",
- "object": "string",
- "owned_by": "string"
}
], - "object": "string"
}create a new endpoint
new create endpoint request object
| acceleratorCount required | integer |
| acceleratorProduct | string |
object (dto.InferenceEngineArgs) | |
| apiKeys | Array of strings |
| cpu required | integer |
| description | string |
| engine required | string (enum.BaseEngine) Enum: "vllm" "tgi" "nim" "hf-transformers" "custom-model-server" |
| maxInstances | integer >= 1 |
| memoryInGi required | integer |
| minInstances | integer >= 1 |
| modelId required | string |
| name required | string <= 20 characters |
| platform required | string (enum.Platform) Enum: "nvidia-gpu-passthrough" "nvidia-vgpu" "nvidia-mig" "amd-gpu-passthrough" "intel-amx-cpu" "cpu" |
| quantization | string (enum.Quantization) Enum: "float8" "float16" "bfloat16" |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "acceleratorCount": 0, "acceleratorProduct": "string", "advancedConfig": { "hfTransformersArgs": { "maxNumTokens": 0 }, "tgiArgs": { "customArgs": { "property1": "string", "property2": "string" }, "maxNumTokens": 0 }, "vllmArgs": { "customArgs": { "property1": "string", "property2": "string" }, "maxNumTokens": 0 } }, "apiKeys": [ "string" ], "cpu": 0, "description": "string", "engine": "vllm", "maxInstances": 1, "memoryInGi": 0, "minInstances": 1, "modelId": "string", "name": "string", "platform": "nvidia-gpu-passthrough", "quantization": "float8" }'
{- "data": {
- "id": "string"
}, - "msg": "string"
}delete endpoint by ID
| endpoint_id required | string endpoint ID |
| force | boolean force delete |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/{endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}get an endpoint by ID
| endpoint_id required | string endpoint ID |
| expand | Array of strings query param to denote what all extra fields to fetch |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/{endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "acceleratorCount": 0,
- "acceleratorProduct": "string",
- "actualInstances": 0,
- "adminEnabled": true,
- "advancedConfig": {
- "maxNumTokens": 0
}, - "catalog": {
- "name": "string"
}, - "cpu": 0,
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "description": "string",
- "endPointUrl": [
- "string"
], - "engine": "vllm",
- "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "maxInstances": 0,
- "memory": 0,
- "minInstances": 0,
- "modelCapabilities": [
- "string"
], - "modelId": "string",
- "modelName": "string",
- "name": "string",
- "platform": "nvidia-gpu-passthrough",
- "updatedAt": "string",
- "validated": true
}, - "msg": "string"
}list API keys by endpoint ID
| endpoint_id required | string endpoint ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/apikeys/{endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "apikeys": [
- {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "endpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "id": "string",
- "key": "string",
- "name": "string",
- "status": "string",
- "unifiedEndpoints": [
- {
- "id": "string",
- "name": "string"
}
], - "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}list endpoints
| expand | Array of strings query param to denote what all extra fields to fetch |
| limit | integer limit |
| offset | integer offset |
| owner_id | string owner ID for which the endpoints have to be returned |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "endpoints": [
- {
- "acceleratorCount": 0,
- "acceleratorProduct": "string",
- "actualInstances": 0,
- "adminEnabled": true,
- "advancedConfig": {
- "maxNumTokens": 0
}, - "catalog": {
- "name": "string"
}, - "cpu": 0,
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "description": "string",
- "endPointUrl": [
- "string"
], - "engine": "vllm",
- "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "maxInstances": 0,
- "memory": 0,
- "minInstances": 0,
- "modelCapabilities": [
- "string"
], - "modelId": "string",
- "modelName": "string",
- "name": "string",
- "platform": "nvidia-gpu-passthrough",
- "updatedAt": "string",
- "validated": true
}
], - "totalCount": 0
}, - "msg": "string"
}get instances for a given endpoint ID
| endpoint_id required | string endpoint ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/{endpoint_id}/instances \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "instances": [
- {
- "name": "string"
}
]
}, - "msg": "string"
}search endpoints
| expand | Array of strings query param to denote what all extra fields to fetch |
list options object
required | Array of objects (dto.FilterOptions) |
| limit required | integer |
| offset required | integer >= 0 |
required | Array of objects (dto.SortOptions) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/search \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "filters": [ { "field": "string", "operation": "startsWith", "values": [ "string" ] } ], "limit": 0, "offset": 0, "sort": [ { "field": "string", "order": "ASCENDING" } ] }'
{- "data": {
- "endpoints": [
- {
- "acceleratorCount": 0,
- "acceleratorProduct": "string",
- "actualInstances": 0,
- "adminEnabled": true,
- "advancedConfig": {
- "maxNumTokens": 0
}, - "catalog": {
- "name": "string"
}, - "cpu": 0,
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "description": "string",
- "endPointUrl": [
- "string"
], - "engine": "vllm",
- "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "maxInstances": 0,
- "memory": 0,
- "minInstances": 0,
- "modelCapabilities": [
- "string"
], - "modelId": "string",
- "modelName": "string",
- "name": "string",
- "platform": "nvidia-gpu-passthrough",
- "updatedAt": "string",
- "validated": true
}
], - "totalCount": 0
}, - "msg": "string"
}update an existing endpoint by ID
| endpoint_id required | string endpoint ID |
update existing endpoint object
| description | string |
| maxInstances | integer >= 1 |
| minInstances | integer >= 1 |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/{endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "description": "string", "maxInstances": 1, "minInstances": 1 }'
{- "msg": "string"
}update state of an existing endpoint
| endpoint_id required | string endpoint ID |
update state of existing endpoint object
| state required | string (enum.EndpointStateAction) Enum: "hibernate" "activate" |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/state/{endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "state": "hibernate" }'
{- "msg": "string"
}validate endpoint object
| endpoint_name required | string endpoint name |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/endpoints/validate \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}add new license
add license object
Array of objects (dto.License) |
curl --request PUT \ --url https://www.nutanix.dev//api/enterpriseai/v1/licenses \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "licenses": [ { "licenseClusterUuid": "string", "licenseKey": "string", "meterType": "VCPU" } ] }'
{- "msg": "string"
}list licenses
| expand | Array of strings query param to denote what all extra fields to fetch |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/licenses \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "licenses": [
- {
- "dateOfExpiry": "string",
- "daysToExpire": 0,
- "daysToGracePeriodExpiry": 0,
- "id": "string",
- "isTrial": true,
- "isViolated": true,
- "lastUpdated": "string",
- "licenseClusterUuid": "string",
- "licenseKey": "string",
- "meter": {
- "consumption": 0,
- "quantity": 0,
- "type": "VCPU"
}, - "product": "string",
- "tier": "PRO"
}
]
}, - "msg": "string"
}validate license
validate license object
Array of objects (dto.License) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/licenses/validate \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "licenses": [ { "licenseClusterUuid": "string", "licenseKey": "string", "meterType": "VCPU" } ] }'
{- "msg": "string"
}download logs file from pod instances
| entityType required | string Entity type. Examples: endpoint, model |
| entityId required | string Entity ID |
| instance | string Instance name. It is required for the endpoint, and it's optional for the model. |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/logs/download \ --header 'Accept: application/octet-stream' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
get logs from pod instances
| entityType required | string Entity type (endpoint only). Note: In response filenames, this is in lowercase. |
| entityId required | string Entity ID |
| instance required | string Instance name |
| tailLines | integer Number of lines to tail from the end of the logs |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/logs \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "entityId": "string",
- "entityType": "endpoint",
- "logs": [
- {
- "message": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/metrics/health \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \
{- "data": {
- "misconfiguredServiceMonitors": [
- "string"
], - "missingServiceMonitors": [
- "string"
], - "reasons": [
- "string"
], - "status": "healthy"
}, - "msg": "string"
}sends metric request to metrics endpoint
new metric query object
object (dto.ExtraMetricParams) | |
| metricName required | string (constants.MetricName) Enum: "requestLatencyAverage" "requestLatencyPercentile99" "requestLatencyPercentile95" "requestLatencyPercentile50" "requestCount" "llmMetricsApiUsageCount" "cpuUsage" "memoryUsage" "diskUsage" "gpuUtil" "gpuCount" "usedGpuCount" "gpuMemoryUsage" "serviceHealth" "endpointStatusCount" "infrastructureSummary" "infrastructureSummaryNode" "timeToFirstTokenAverage" "timeToFirstTokenPercentile99" "timeToFirstTokenPercentile95" "timeToFirstTokenPercentile50" "timePerOutputTokenAverage" "timePerOutputTokenPercentile99" "timePerOutputTokenPercentile95" "timePerOutputTokenPercentile50" "requestThroughput" "inputTokenCount" "outputTokenCount" "outputTokenThroughput" "numRequestsRunning" "numRequestsWaiting" "modelApiUsageCount" "endpointApiUsageCount" "v4ApiUsageCount" "accessControlApiUsageCount" "logsApiUsageCount" "apikeyLastUsedAt" |
| outputType required | string (constants.MetricOutputType) Enum: "timeseries" "table" |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/metrics/query \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "extraMetricParams": { "downsamplingInterval": 0, "end": 0, "filters": { "property1": [ "string" ], "property2": [ "string" ] }, "groupBy": [ "endpointName" ], "limit": 0, "sortOrder": "ASC", "start": 0 }, "metricName": "requestLatencyAverage", "outputType": "timeseries" }'
{- "data": {
- "outputType": "timeseries",
- "result": null
}, - "msg": "string"
}create a new model
new create model request object
object (dto.ModelProvider) | |
| name required | string <= 32 characters |
| sourceFormat required | string (enum.SourceModelFormat) Enum: "hf" "nim" |
object (dto.StorageProvider) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/models \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "modelProvider": { "catalogId": "string", "customModelDetails": { "developer": "string", "modelCapabilities": [ "string" ], "modelSizeInGB": 0 } }, "name": "string", "sourceFormat": "hf", "storageProvider": { "huggingFaceHub": { "repoId": "string", "repoVersion": "string" }, "nfs": { "modelPath": "string", "serverIp": "string", "sharePath": "string" }, "s3": { "accessKey": "string", "bucket": "string", "endpoint": "string", "modelPath": "string", "secretKey": "string" } } }'
{- "data": {
- "id": "string"
}, - "msg": "string"
}delete a model by ID
| model_id required | string model ID |
| force | boolean force delete |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/models/{model_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/models/capabilities \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "capabilities": {
- "property1": [
- "string"
], - "property2": [
- "string"
]
}
}, - "msg": "string"
}list models
| name | string model name |
| owner_id | string owner ID for which the models have to be returned |
| limit | integer limit |
| offset | integer offset |
| expand | Array of strings query param to denote what all extra fields to fetch |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/models/list \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "models": [
- {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "modelDetails": {
- "adminEnabled": true,
- "catalogId": "string",
- "catalogSourceHub": "HuggingFace",
- "deprecated": true,
- "developer": "string",
- "modelCapabilities": [
- "string"
], - "modelSizeInGB": 0,
- "modelUrl": "string",
- "repoName": "string",
- "repoVersion": "string",
- "validated": true
}, - "name": "string",
- "outputFormat": "hf",
- "storageProvider": {
- "huggingFaceHub": {
- "repoId": "string",
- "repoVersion": "string"
}, - "nfs": {
- "modelPath": "string",
- "serverIp": "string",
- "sharePath": "string"
}, - "s3": {
- "accessKey": "string",
- "bucket": "string",
- "endpoint": "string",
- "modelPath": "string",
- "secretKey": "string"
}
}, - "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}get a model by ID
| model_id required | string model ID |
| expand | Array of strings query param to denote what all extra fields to fetch |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/models/{model_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "modelDetails": {
- "adminEnabled": true,
- "catalogId": "string",
- "catalogSourceHub": "HuggingFace",
- "deprecated": true,
- "developer": "string",
- "modelCapabilities": [
- "string"
], - "modelSizeInGB": 0,
- "modelUrl": "string",
- "repoName": "string",
- "repoVersion": "string",
- "validated": true
}, - "name": "string",
- "outputFormat": "hf",
- "storageProvider": {
- "huggingFaceHub": {
- "repoId": "string",
- "repoVersion": "string"
}, - "nfs": {
- "modelPath": "string",
- "serverIp": "string",
- "sharePath": "string"
}, - "s3": {
- "accessKey": "string",
- "bucket": "string",
- "endpoint": "string",
- "modelPath": "string",
- "secretKey": "string"
}
}, - "updatedAt": "string"
}, - "msg": "string"
}search models
| expand | Array of strings query param to denote what all extra fields to fetch |
list options object
required | Array of objects (dto.FilterOptions) |
| limit required | integer |
| offset required | integer >= 0 |
required | Array of objects (dto.SortOptions) |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/models/search \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "filters": [ { "field": "string", "operation": "startsWith", "values": [ "string" ] } ], "limit": 0, "offset": 0, "sort": [ { "field": "string", "order": "ASCENDING" } ] }'
{- "data": {
- "models": [
- {
- "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "expandValues": {
- "property1": "string",
- "property2": "string"
}, - "id": "string",
- "modelDetails": {
- "adminEnabled": true,
- "catalogId": "string",
- "catalogSourceHub": "HuggingFace",
- "deprecated": true,
- "developer": "string",
- "modelCapabilities": [
- "string"
], - "modelSizeInGB": 0,
- "modelUrl": "string",
- "repoName": "string",
- "repoVersion": "string",
- "validated": true
}, - "name": "string",
- "outputFormat": "hf",
- "storageProvider": {
- "huggingFaceHub": {
- "repoId": "string",
- "repoVersion": "string"
}, - "nfs": {
- "modelPath": "string",
- "serverIp": "string",
- "sharePath": "string"
}, - "s3": {
- "accessKey": "string",
- "bucket": "string",
- "endpoint": "string",
- "modelPath": "string",
- "secretKey": "string"
}
}, - "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}create a new provider
provider
| credential required | string |
| hostname required | string |
| name required | string <= 20 characters |
| port required | integer <= 65535 |
| prefix | string |
| schema required | string (enum.Schema) Enum: "openai" "gcp" "anthropic" |
| type required | string (enum.LLMProviderType) Enum: "nai" "openai" "anthropic" |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/providers \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "credential": "string", "hostname": "string", "name": "string", "port": 65535, "prefix": "string", "schema": "openai", "type": "nai" }'
{- "data": "string",
- "msg": "string"
}delete a provider by ID
| provider_id required | string provider ID |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/providers/{provider_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/providers \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "providers": [
- {
- "createdAt": "string",
- "credential": "string",
- "hostname": "string",
- "id": "string",
- "name": "string",
- "port": 0,
- "prefix": "string",
- "schema": "openai",
- "type": "nai",
- "updatedAt": "string"
}
], - "totalCount": 0
}, - "msg": "string"
}get a provider by ID
| provider_id required | string provider ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/providers/{provider_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "createdAt": "string",
- "credential": "string",
- "hostname": "string",
- "id": "string",
- "name": "string",
- "port": 0,
- "prefix": "string",
- "schema": "openai",
- "type": "nai",
- "updatedAt": "string"
}, - "msg": "string"
}update a provider by ID
| provider_id required | string provider ID |
provider
| credential | string |
curl --request PATCH \ --url https://www.nutanix.dev//api/enterpriseai/v1/providers/{provider_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "credential": "string" }'
{- "msg": "string"
}create a new unified endpoint
new create unified endpoint request object
| apiKeys | Array of strings |
| mode required | string (enum.UnifiedEndpointMode) Enum: "loadbalancer" "fallback" |
| name required | string <= 20 characters |
| rateLimitCelExpression | string |
object (dto.RateLimitConfig) | |
required | Array of objects (dto.TargetEndpointConfig) non-empty |
curl --request POST \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "apiKeys": [ "string" ], "mode": "loadbalancer", "name": "string", "rateLimitCelExpression": "string", "rateLimitConfig": { "cost": null, "durationUnit": null, "value": 0 }, "targetEndpointConfigs": [ { "endpointType": "internal", "internalEndpointId": "string", "priority": 0, "providerId": "string", "providerModelName": "string", "weight": 0 } ] }'
{- "data": {
- "id": "string"
}, - "msg": "string"
}delete a unified endpoint
| unified_endpoint_id required | string unified endpoint ID |
curl --request DELETE \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints/{unified_endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}get a unified endpoint by ID
| unified_endpoint_id required | string unified endpoint ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints/{unified_endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "apiKeys": [
- "string"
], - "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "id": "string",
- "mode": "loadbalancer",
- "name": "string",
- "targetEndpoints": [
- {
- "endpointId": "string",
- "endpointType": "internal",
- "priority": 0,
- "providerId": "string",
- "providerModelName": "string",
- "weight": 0
}
], - "updatedAt": "string"
}, - "msg": "string"
}list all API key rate limit configurations for a unified endpoint
| unified_endpoint_id required | string unified endpoint ID |
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints/{unified_endpoint_id}/ratelimit \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": [
- {
- "apiKeyId": "string",
- "apiKeyName": "string",
- "rateLimitConfig": {
- "cost": "input_tokens",
- "durationUnit": "Second",
- "value": 0
}
}
], - "msg": "string"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "data": {
- "totalCount": 0,
- "unifiedEndpoints": [
- {
- "apiKeys": [
- "string"
], - "createdAt": "string",
- "createdBy": {
- "id": "string",
- "username": "string"
}, - "id": "string",
- "mode": "loadbalancer",
- "name": "string",
- "targetEndpoints": [
- {
- "endpointId": "string",
- "endpointType": "internal",
- "priority": 0,
- "providerId": "string",
- "providerModelName": "string",
- "weight": 0
}
], - "updatedAt": "string"
}
]
}, - "msg": "string"
}update a unified endpoint
| unified_endpoint_id required | string unified endpoint ID |
update unified endpoint request object
| mode | string (enum.UnifiedEndpointMode) Enum: "loadbalancer" "fallback" |
| rateLimitCelExpression | string |
object (dto.RateLimitConfig) | |
Array of objects (dto.TargetEndpointConfig) non-empty |
curl --request PUT \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints/{unified_endpoint_id} \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \ --data '{ "mode": "loadbalancer", "rateLimitCelExpression": "string", "rateLimitConfig": { "cost": null, "durationUnit": null, "value": 0 }, "targetEndpointConfigs": [ { "endpointType": "internal", "internalEndpointId": "string", "priority": 0, "providerId": "string", "providerModelName": "string", "weight": 0 } ] }'
{- "msg": "string"
}update rate limit configuration for API keys on a unified endpoint
| unified_endpoint_id required | string unified endpoint ID |
API key specific rate limit configurations
| apiKeyId required | string |
object (dto.RateLimitConfig) |
curl --request PUT \ --url https://www.nutanix.dev//api/enterpriseai/v1/unified-endpoints/{unified_endpoint_id}/ratelimit \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \ --header 'Content-Type: application/json' \
{- "msg": "string"
}retrieves user details of the logged in user
curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/users/me \ --header 'Accept: application/json' \ --header 'Authorization: Basic <basic_auth_token>' \
{- "data": {
- "meta": {
- "accessControlViolated": {
- "endpointViolated": true,
- "modelViolated": true
}, - "endpointCreated": true,
- "modelsImported": true
}, - "user": {
- "id": "string",
- "role": "MLUser",
- "username": "string"
}
}, - "msg": "string"
}curl --request GET \ --url https://www.nutanix.dev//api/enterpriseai/v1/version \ --header 'Accept: application/json' \ --header 'Content-Type: application/json' \
{- "data": {
- "version": "string"
}, - "msg": "string"
}