Helm Values Reference¶
Audience: Deployment engineers installing model engine into a customer environment.
Purpose: Reference for every configurable value, organized by deployment concern.
Full chart source: charts/model-engine/values_sample.yaml
High-Risk Values¶
Read this before starting your installation
The following values have non-obvious defaults or silent failure modes. Getting them wrong causes hard-to-diagnose issues.
| Value | Default | Risk | Impact if wrong |
|---|---|---|---|
db.runDbMigrationScript |
false |
HIGH | Schema not initialized on first install — model creation fails with cryptic DB errors |
config.values.infra.prometheus_server_address |
unset | HIGH | KEDA scale-to-zero silently broken for sync endpoints with min_workers=0 |
config.values.launch.vllm_repository |
vllm (resolves to Scale's ECR) |
HIGH | Endpoint creation succeeds but pods stay INITIALIZING — image pull fails silently |
celeryBrokerType |
sqs |
HIGH | Wrong broker for non-AWS clouds — all async endpoints broken |
config.values.infra.cloud_provider |
aws |
HIGH | Wrong storage/auth/registry clients loaded for non-AWS environments |
TODO
Change db.runDbMigrationScript default to true in values.yaml.
1. Minimum Viable Config¶
These are the values you must set for the service to start. All other values have safe defaults. Getting any of these wrong will prevent the control plane from coming up or prevent any endpoint from being created successfully.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
tag |
string | — | Yes | LLM Engine Docker image tag to deploy |
image.gatewayRepository |
string | — | Yes | Docker repository for the gateway image |
image.builderRepository |
string | — | Yes | Docker repository for the endpoint builder image |
image.cacherRepository |
string | — | Yes | Docker repository for the cacher image |
image.forwarderRepository |
string | — | Yes | Docker repository for the forwarder image |
secrets.kubernetesDatabaseSecretName |
string | llm-engine-postgres-credentials |
Yes (one of two) | Kubernetes Secret name containing DATABASE_URL. Mutually exclusive with secrets.cloudDatabaseSecretName |
secrets.cloudDatabaseSecretName |
string | — | Yes (one of two) | Cloud-provider secret name (e.g., AWS Secrets Manager) containing database credentials |
serviceAccount.annotations |
map | — | Yes | Annotations to apply to the control-plane service account. On EKS, set eks.amazonaws.com/role-arn |
config.values.infra.cloud_provider |
string | aws |
Yes | Cloud provider: aws, azure, or onprem |
config.values.infra.k8s_cluster_name |
string | main_cluster |
Yes | Kubernetes cluster name used for resource tagging and lookups |
config.values.infra.dns_host_domain |
string | llm-engine.domain.com |
Yes | Base domain for endpoint hostnames |
config.values.infra.default_region |
string | us-east-1 |
Yes | Default cloud region for all resource operations |
config.values.infra.ml_account_id |
string | "000000000000" |
Yes | Cloud account/subscription ID |
config.values.infra.docker_repo_prefix |
string | 000000000000.dkr.ecr.us-east-1.amazonaws.com |
Yes | Prefix prepended to all inference image repositories |
config.values.infra.redis_host |
string | — | Yes (if not using secret) | Hostname of the Redis cluster used by the inference control plane |
config.values.infra.s3_bucket |
string | llm-engine |
Yes | S3 bucket (or equivalent) for storing fine-tuning artifacts and other assets |
config.values.launch.endpoint_namespace |
string | llm-engine |
Yes | Kubernetes namespace where inference endpoint pods are created |
config.values.launch.cache_redis_aws_url |
string | — | Yes (one of three) | Full Redis URL used by the cacher. Exactly one of cache_redis_aws_url, cache_redis_azure_host, or cache_redis_aws_secret_name must be set |
Minimal Working YAML¶
tag: "abc123def456"
image:
gatewayRepository: public.ecr.aws/b2z8n5q1/model-engine
builderRepository: public.ecr.aws/b2z8n5q1/model-engine
cacherRepository: public.ecr.aws/b2z8n5q1/model-engine
forwarderRepository: public.ecr.aws/b2z8n5q1/model-engine
pullPolicy: Always
secrets:
kubernetesDatabaseSecretName: llm-engine-postgres-credentials
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/k8s-main-llm-engine
db:
runDbMigrationScript: true # required on first install
runDbInitScript: false
config:
values:
infra:
cloud_provider: aws
k8s_cluster_name: my-cluster
dns_host_domain: llm-engine.example.com
default_region: us-east-1
ml_account_id: "111122223333"
docker_repo_prefix: "111122223333.dkr.ecr.us-east-1.amazonaws.com"
redis_host: my-redis.use1.cache.amazonaws.com
s3_bucket: my-llm-engine-bucket
launch:
endpoint_namespace: llm-engine
cache_redis_aws_url: redis://my-redis.use1.cache.amazonaws.com:6379/15
s3_file_llm_fine_tuning_job_repository: "s3://my-llm-engine-bucket/llm-ft-job-repository"
hf_user_fine_tuned_weights_prefix: "s3://my-llm-engine-bucket/fine_tuned_weights"
vllm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/vllm"
tensorrt_llm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/tensorrt-llm"
batch_inference_vllm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/llm-engine/batch-infer-vllm"
2. Cloud-Specific Config¶
AWS (Reference Configuration)¶
AWS is the default. The values below represent the full set of AWS-specific fields.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.infra.cloud_provider |
string | aws |
Yes | Set to aws |
config.values.infra.default_region |
string | us-east-1 |
Yes | AWS region for ECR, SQS, and other resources |
config.values.infra.ml_account_id |
string | — | Yes | AWS account ID (12 digits, quoted as string) |
config.values.infra.docker_repo_prefix |
string | — | Yes | ECR registry prefix: <account>.dkr.ecr.<region>.amazonaws.com |
config.values.infra.redis_host |
string | — | Yes (or use secret) | ElastiCache hostname |
config.values.infra.redis_aws_secret_name |
string | — | No | AWS Secrets Manager secret name containing Redis connection info. Fields: scheme, host, port, auth_token (optional), query_params (optional) |
config.values.infra.s3_bucket |
string | llm-engine |
Yes | S3 bucket name for artifacts |
config.values.launch.cache_redis_aws_url |
string | — | Yes (one of three) | Full Redis URL: redis://<host>:<port>/<db> |
config.values.launch.cache_redis_aws_secret_name |
string | — | Yes (one of three) | AWS Secrets Manager secret with field cache-url containing full Redis URL |
config.values.launch.sqs_profile |
string | default |
No | AWS profile for SQS operations |
config.values.launch.sqs_queue_policy_template |
string | — | Yes (for async) | IAM policy template for per-endpoint SQS queues. Must grant sqs:* to the LLM Engine role |
config.values.launch.sqs_queue_tag_template |
string | — | No | JSON template for SQS queue tags |
celeryBrokerType |
string | sqs |
Yes | Use sqs for AWS async endpoints |
serviceAccount.annotations."eks.amazonaws.com/role-arn" |
string | — | Yes | IRSA role ARN for the control-plane service account |
# AWS reference config diff
config:
values:
infra:
cloud_provider: aws
default_region: us-east-1
ml_account_id: "111122223333"
docker_repo_prefix: "111122223333.dkr.ecr.us-east-1.amazonaws.com"
redis_host: my-redis.use1.cache.amazonaws.com
s3_bucket: my-llm-engine-bucket
launch:
cache_redis_aws_url: redis://my-redis.use1.cache.amazonaws.com:6379/15
sqs_profile: default
sqs_queue_policy_template: >
{
"Version": "2012-10-17",
"Statement": [{
"Effect": "Allow",
"Principal": {"AWS": "arn:aws:iam::111122223333:role/k8s-main-llm-engine"},
"Action": "sqs:*",
"Resource": "arn:aws:sqs:us-east-1:111122223333:${queue_name}"
}]
}
celeryBrokerType: sqs
serviceAccount:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/k8s-main-llm-engine
Azure (Diff from AWS)¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.infra.cloud_provider |
string | — | Yes | Set to azure |
config.values.infra.default_region |
string | — | Yes | Azure region (e.g., eastus) |
config.values.launch.cache_redis_azure_host |
string | — | Yes | Azure Cache for Redis hostname: <name>.redis.cache.windows.net:6380 |
keyvaultName |
string | llm-engine-keyvault |
Yes | Azure Key Vault name for secret retrieval |
celeryBrokerType |
string | — | Yes | Set to elasticache for Azure Service Bus-backed broker |
Azure Service Bus: broker_pool_limit
When using Azure Service Bus as the Celery broker, do not set broker_pool_limit=0. This was previously thought to help with connection management but actually causes idle AMQP connections to drop, resulting in 503 errors on async endpoints. The fix (removing broker_pool_limit=0) is tracked in commit 9deb59f1. Leave this at the library default.
# Azure diff
config:
values:
infra:
cloud_provider: azure
default_region: eastus
ml_account_id: "your-subscription-id"
docker_repo_prefix: "myregistry.azurecr.io"
launch:
cache_redis_azure_host: my-llm-engine-cache.redis.cache.windows.net:6380
# Do NOT set cache_redis_aws_url for Azure
keyvaultName: my-llm-engine-keyvault
celeryBrokerType: elasticache # Azure Service Bus-backed
GCP / On-Premises (Diff from AWS)¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.infra.cloud_provider |
string | — | Yes | Set to onprem |
config.values.launch.cache_redis_onprem_url |
string | — | Yes (one of three) | Explicit Redis URL for on-prem: redis://redis:6379/0. Highest priority — takes precedence over all other Redis URL fields |
celeryBrokerType |
string | — | Yes | Set to elasticache to use Redis as the Celery broker instead of SQS |
celery_broker_type_redis |
bool | null |
No | Alternative override flag to force Redis broker regardless of celeryBrokerType |
# On-prem / GCP diff
config:
values:
infra:
cloud_provider: onprem
default_region: us-central1
ml_account_id: "my-gcp-project"
docker_repo_prefix: "gcr.io/my-gcp-project"
launch:
cache_redis_onprem_url: redis://redis.llm-engine.svc.cluster.local:6379/0
celeryBrokerType: elasticache
celery_broker_type_redis: true
Cloud matrix
For a full per-cloud capability and limitation matrix, see cloud-matrix.md.
3. GPU / Hardware Config¶
Balloon Pods¶
Balloon pods are low-priority placeholder deployments that keep GPU nodes warm. When real inference pods need to be scheduled, they preempt the balloon pods, eliminating cold-start node provisioning time.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
balloons[].acceleratorName |
string | — | Yes | GPU type identifier. Must match node labels. Supported: nvidia-ampere-a10, nvidia-ampere-a100, nvidia-tesla-t4, nvidia-hopper-h100, cpu |
balloons[].replicaCount |
integer | 0 |
Yes | Number of balloon pods to maintain for this GPU type. Set to 0 to disable warming for that type |
balloons[].gpuCount |
integer | 1 |
No | Number of GPUs each balloon pod requests. Relevant for multi-GPU balloon pods (e.g., gpuCount: 4 for H100 nodes) |
balloonConfig.reserveHighPriority |
bool | true |
No | If true, only high-priority pods can preempt balloon pods. If false, any pod can preempt balloons, which may cause unintended disruption |
balloonNodeSelector |
map | {node-lifecycle: normal} |
No | Node selector applied to all balloon pod deployments. Restricts balloons to on-demand (non-spot) nodes by default |
balloonConfig:
reserveHighPriority: true
balloonNodeSelector:
node-lifecycle: normal
balloons:
- acceleratorName: nvidia-ampere-a10
replicaCount: 2
- acceleratorName: nvidia-ampere-a100
replicaCount: 1
- acceleratorName: nvidia-hopper-h100
replicaCount: 1
gpuCount: 4
- acceleratorName: nvidia-tesla-t4
replicaCount: 0
- acceleratorName: cpu
replicaCount: 0
Image Cache¶
Image caching pre-pulls large inference images onto GPU nodes so that endpoint scale-up does not spend time pulling multi-GB images. Each device entry specifies a node selector (and optional tolerations) to target a specific GPU node pool.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
imageCache.devices[].name |
string | — | Yes | Logical name for this device pool (e.g., a10, h100) |
imageCache.devices[].nodeSelector |
map | — | Yes | Label selector targeting nodes in this device pool |
imageCache.devices[].tolerations |
list | [] |
No | Tolerations for GPU taint. Required for GPU node pools with nvidia.com/gpu:NoSchedule taint |
imageCache:
devices:
- name: cpu
nodeSelector:
cpu-only: "true"
- name: a10
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-ampere-a10
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: a100
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-ampere-a100
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: t4
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-tesla-t4
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: h100
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-hopper-h100
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: h100-1g20gb
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-hopper-h100-1g20gb
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
- name: h100-3g40gb
nodeSelector:
k8s.amazonaws.com/accelerator: nvidia-hopper-h100-3g40gb
tolerations:
- key: "nvidia.com/gpu"
operator: "Exists"
effect: "NoSchedule"
Recommended Hardware Tables¶
These tables are used by the LLM Engine to suggest appropriate hardware configurations when a user creates an endpoint without specifying hardware. The engine selects the first matching tier from byModelName (exact match by model name slug), then falls back to byGpuMemoryGb (based on model weight size in GPU memory).
byGpuMemoryGb¶
Tiers are evaluated in ascending gpu_memory_le order. The first tier where the model's estimated GPU memory requirement is less than or equal to gpu_memory_le is selected.
gpu_memory_le (GB) |
CPUs | GPUs | Memory | Storage | GPU Type | nodes_per_worker |
|---|---|---|---|---|---|---|
| 24 | 10 | 1 | 24Gi | 80Gi | nvidia-ampere-a10 | 1 |
| 48 | 20 | 2 | 48Gi | 80Gi | nvidia-ampere-a10 | 1 |
| 96 | 40 | 4 | 96Gi | 96Gi | nvidia-ampere-a10 | 1 |
| 180 | 20 | 2 | 160Gi | 160Gi | nvidia-hopper-h100 | 1 |
| 320 | 40 | 4 | 320Gi | 320Gi | nvidia-hopper-h100 | 1 |
| 640 | 80 | 8 | 800Gi | 640Gi | nvidia-hopper-h100 | 1 |
| 640 | 80 | 8 | 800Gi | 640Gi | nvidia-hopper-h100 | 2 |
byModelName¶
Exact overrides by model name slug. Takes precedence over byGpuMemoryGb.
| Model Name | CPUs | GPUs | Memory | Storage | GPU Type | nodes_per_worker |
|---|---|---|---|---|---|---|
| llama-3-8b-instruct-262k | 20 | 2 | 40Gi | 40Gi | nvidia-hopper-h100 | 1 |
| deepseek-coder-v2 | 160 | 8 | 800Gi | 640Gi | nvidia-hopper-h100 | 1 |
| deepseek-coder-v2-instruct | 160 | 8 | 800Gi | 640Gi | nvidia-hopper-h100 | 1 |
recommendedHardware:
byGpuMemoryGb:
- gpu_memory_le: 24
cpus: 10
gpus: 1
memory: 24Gi
storage: 80Gi
gpu_type: nvidia-ampere-a10
nodes_per_worker: 1
- gpu_memory_le: 48
cpus: 20
gpus: 2
memory: 48Gi
storage: 80Gi
gpu_type: nvidia-ampere-a10
nodes_per_worker: 1
# ... additional tiers
byModelName:
- name: deepseek-coder-v2-instruct
cpus: 160
gpus: 8
memory: 800Gi
storage: 640Gi
gpu_type: nvidia-hopper-h100
nodes_per_worker: 1
Control-Plane Node Selector, Tolerations, and Affinity¶
These apply to the LLM Engine control-plane deployments (gateway, cacher, builder) — not to inference endpoint pods.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
nodeSelector |
map | {node-lifecycle: normal} |
No | Node selector for control-plane pods. Default pins to on-demand nodes |
tolerations |
list | [] |
No | Tolerations for control-plane pods |
affinity |
map | {} |
No | Affinity rules for control-plane pods |
nodeSelector:
node-lifecycle: normal
kubernetes.io/arch: amd64
tolerations:
- key: "dedicated"
operator: "Equal"
value: "llm-engine"
effect: "NoSchedule"
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: llm-engine-gateway
topologyKey: kubernetes.io/hostname
4. Autoscaling¶
Gateway Horizontal Pod Autoscaler¶
The HPA governs the number of gateway replicas based on concurrent request load. This applies only to the control-plane gateway deployment, not to inference endpoint pods.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
autoscaling.horizontal.enabled |
bool | true |
No | Enable HPA for the gateway deployment |
autoscaling.horizontal.minReplicas |
integer | 2 |
No | Minimum number of gateway replicas |
autoscaling.horizontal.maxReplicas |
integer | 10 |
No | Maximum number of gateway replicas |
autoscaling.horizontal.targetConcurrency |
integer | 50 |
No | Target average concurrent requests per replica before scaling out |
autoscaling.vertical.enabled |
bool | false |
No | Enable Vertical Pod Autoscaler (VPA) for control-plane deployments. Requires VPA operator installed in cluster |
autoscaling.prewarming.enabled |
bool | false |
No | Enable endpoint pre-warming (reserved for future use) |
Celery Autoscaler¶
For async (queue-backed) endpoints, a separate Celery autoscaler process monitors queue depth and scales inference pods accordingly.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
celery_autoscaler.enabled |
bool | true |
No | Enable the Celery autoscaler for async endpoint scaling |
celery_autoscaler.num_shards |
integer | 3 |
No | Number of autoscaler shard instances. More shards reduces per-shard queue-watching load at high endpoint counts |
KEDA (Scale-to-Zero for Sync Endpoints)¶
KEDA enables sync/streaming endpoints to scale from zero replicas to one when the first request arrives. This is distinct from the HPA (which cannot scale below minReplicas).
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
keda.cooldownPeriod |
integer | 300 |
No | Seconds KEDA waits after the last request before scaling a sync endpoint down to zero |
config.values.infra.prometheus_server_address |
string | unset | Yes (for KEDA) | Address of the Prometheus server that KEDA queries for endpoint request metrics |
KEDA requires Prometheus
config.values.infra.prometheus_server_address must be set for KEDA scale-to-zero to function. If it is unset, sync endpoints with min_workers=0 will silently fail to scale up from zero — the endpoint will appear healthy but all requests will hang until manually scaled.
KEDA vs HPA: mutual exclusivity
KEDA and HPA are mutually exclusive per endpoint. When an endpoint has min_workers=0, KEDA manages scaling from 0 to 1. Once at 1+ replicas, the HPA (if configured) takes over scaling above 1. Do not configure both KEDA and HPA to manage the same endpoint's replica range. Additionally, KEDA can only scale from 0 to 1 — it does not replace the HPA for scaling beyond 1 replica.
autoscaling:
horizontal:
enabled: true
minReplicas: 2
maxReplicas: 10
targetConcurrency: 50
vertical:
enabled: false
celery_autoscaler:
enabled: true
num_shards: 3
keda:
cooldownPeriod: 300
config:
values:
infra:
prometheus_server_address: "http://prometheus-server.istio-system.svc.cluster.local:80"
5. Networking¶
Istio VirtualService and DestinationRule¶
LLM Engine uses Istio for traffic routing when config.values.launch.istio_enabled is true. The VirtualService routes external traffic to the gateway service; the DestinationRule configures connection pool and outlier detection.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
virtualservice.enabled |
bool | true |
No | Create an Istio VirtualService for the gateway |
virtualservice.hostDomains |
list | [llm-engine.domain.com] |
Yes (if enabled) | List of hostnames this VirtualService responds to. Must match your Istio gateway configuration |
virtualservice.gateways |
list | [default/internal-gateway] |
Yes (if enabled) | Istio Gateway resources to attach this VirtualService to. Format: <namespace>/<gateway-name> |
virtualservice.annotations |
map | {} |
No | Additional annotations for the VirtualService resource |
destinationrule.enabled |
bool | true |
No | Create an Istio DestinationRule for the gateway service |
destinationrule.annotations |
map | {} |
No | Additional annotations for the DestinationRule resource |
hostDomain.prefix |
string | http:// |
No | URL scheme prefix used when constructing endpoint host URLs. Set to https:// for TLS-terminated clusters |
service.type |
string | ClusterIP |
No | Kubernetes Service type for the gateway. Use ClusterIP with Istio; change to LoadBalancer only if managing ingress outside Istio |
service.port |
integer | 80 |
No | Port exposed by the gateway Kubernetes Service |
config.values.launch.istio_enabled |
bool | true |
No | Whether Istio service mesh is active. When false, VirtualService/DestinationRule resources are not used and direct service routing applies |
virtualservice:
enabled: true
hostDomains:
- llm-engine.example.com
gateways:
- default/internal-gateway
destinationrule:
enabled: true
hostDomain:
prefix: https://
service:
type: ClusterIP
port: 80
config:
values:
launch:
istio_enabled: true
Redis TLS and Authentication¶
These values control TLS and authentication for the Redis connection used by KEDA and endpoint metrics.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
redis.enableTLS |
bool | false |
No | Enable TLS for the Redis connection. Required for Azure Cache for Redis (port 6380) and any Redis with TLS enforced |
redis.enableAuth |
bool | false |
No | Enable password/token authentication for Redis. Required when the Redis cluster has AUTH configured |
redis.auth |
string | null |
No | Redis AUTH password or token. Only used when enableAuth: true. Store this in a Kubernetes Secret rather than directly in values |
redis.kedaSecretName |
string | "" |
No | Name of a Kubernetes Secret containing Redis credentials for KEDA's ScaledObject. KEDA reads this directly; leave empty to use unauthenticated Redis |
redis.unsafeSsl |
bool | false |
No | Skip TLS certificate verification. Use only in development environments with self-signed certificates |
redis:
enableTLS: true
enableAuth: true
auth: "" # set via --set redis.auth=$REDIS_TOKEN or from a secret
kedaSecretName: "keda-redis-secret"
unsafeSsl: false
6. Observability¶
Datadog Integration¶
LLM Engine supports two Datadog toggles that must both be set consistently.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
datadog.enabled |
bool | false |
No | Mount the Datadog agent socket and inject Datadog environment variables into control-plane pods. Requires the Datadog agent DaemonSet to be running on the cluster |
dd_trace_enabled |
bool | true |
No | Top-level Helm toggle that controls whether the DD_TRACE_ENABLED environment variable is set to true in control-plane containers |
config.values.launch.dd_trace_enabled |
bool | false |
No | Service-config-level toggle that controls whether the application code initializes the ddtrace library at startup. Must match dd_trace_enabled to avoid partial tracing |
Two toggles, one feature
dd_trace_enabled (top-level) and config.values.launch.dd_trace_enabled are independent toggles that together control Datadog APM. Setting only one of them produces a broken state: traces may be emitted but not received, or the agent socket may be mounted but no spans generated. Always set both to the same value.
Logging¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.launch.sensitive_log_mode |
bool | false |
No | When true, suppresses logging of request/response payloads and other PII-containing fields. Enable in customer environments that process sensitive data |
debug_mode |
bool/null | null |
No | Enables verbose debug logging across infrastructure components (gateway, cacher, builder). Produces high log volume — use only for troubleshooting |
config:
values:
launch:
sensitive_log_mode: true
debug_mode: null # set to true only during active debugging
7. Security / Compliance¶
Pod Security Context¶
The pod security context applies to all containers within a pod and controls user/group identity and filesystem permissions. Uncomment and set these values when using a hardened base image (e.g., Chainguard).
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
podSecurityContext.runAsUser |
integer | unset | No | UID to run all containers as. Chainguard images use 65532 (nonroot) |
podSecurityContext.runAsGroup |
integer | unset | No | GID to run all containers as |
podSecurityContext.runAsNonRoot |
bool | unset | No | Enforce that no container runs as UID 0. Set to true for all production deployments |
podSecurityContext.fsGroup |
integer | unset | No | GID for volume mounts. Set to match runAsGroup so mounted secrets and configmaps are readable |
Container Security Context¶
The container security context applies to each individual container and controls Linux capabilities and filesystem access.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
containerSecurityContext.allowPrivilegeEscalation |
bool | unset | No | Prevent the process from gaining additional privileges via setuid/setgid. Set to false in all production deployments |
containerSecurityContext.readOnlyRootFilesystem |
bool | unset | No | Mount the container root filesystem as read-only. Set to false if the application writes to /tmp or other paths on the root fs |
containerSecurityContext.capabilities.drop |
list | unset | No | Linux capabilities to drop. Set to ["ALL"] to remove all capabilities and then add back only what is needed |
containerSecurityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: false
capabilities:
drop:
- ALL
Inference Pod Security (serviceTemplate)¶
These values apply to the inference endpoint pods created by the builder — not to the control-plane pods. They are injected into each endpoint's pod spec via the service template.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
serviceTemplate.securityContext.capabilities.drop |
list | ["all"] |
No | Linux capabilities to drop from inference containers. Default drops all capabilities |
serviceTemplate.mountInfraConfig |
bool | true |
No | Mount the infra ConfigMap into inference pods. Required for the endpoint to read cloud configuration |
serviceTemplate.createServiceAccount |
bool | true |
No | Create a dedicated Kubernetes ServiceAccount for inference pods in the endpoint namespace |
serviceTemplate.serviceAccountName |
string | model-engine |
No | Name of the ServiceAccount created for inference pods |
serviceTemplate.serviceAccountAnnotations |
map | — | No | Annotations for the inference pod ServiceAccount. On EKS, set eks.amazonaws.com/role-arn to the inference IAM role |
serviceTemplate:
securityContext:
capabilities:
drop:
- all
mountInfraConfig: true
createServiceAccount: true
serviceAccountName: model-engine
serviceAccountAnnotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/llm-engine
"helm.sh/hook": pre-install,pre-upgrade
"helm.sh/hook-weight": "-2"
FIPS / Federal Compliance¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
celery_enable_sha256 |
bool/null | null |
No | When true, forces Celery to use SHA-256 message signing instead of the default SHA-1. Required in FIPS-mode environments and any environment with federal compliance mandates (FedRAMP, IL4/IL5) |
Coordinated rollout required for celery_enable_sha256
Changing celery_enable_sha256 requires a coordinated rollout. In-flight Celery tasks signed with SHA-1 cannot be verified by workers expecting SHA-256, and vice versa. During the transition window, drain all queues before deploying new workers. Rolling updates without draining will cause task signature verification failures and silently dropped async requests.
8. Replica and Resource Tuning¶
Replica Counts¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
replicaCount.gateway |
integer | 2 |
No | Number of gateway replicas. Minimum 2 for production HA. Overridden by HPA when autoscaling.horizontal.enabled: true |
replicaCount.cacher |
integer | 1 |
No | Number of cacher replicas. The cacher maintains a local cache of Kubernetes state (endpoint pods, services). Single replica is usually sufficient |
replicaCount.builder |
integer | 1 |
No | Number of builder replicas. The builder handles endpoint creation and image build jobs. Single replica is usually sufficient |
Resources¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
resources.requests.cpu |
string/integer | 2 |
No | CPU request for control-plane pods (gateway, cacher, builder) |
resources.requests.memory |
string | unset | No | Memory request for control-plane pods |
resources.limits.cpu |
string/integer | unset | No | CPU limit for control-plane pods |
resources.limits.memory |
string | unset | No | Memory limit for control-plane pods |
replicaCount:
gateway: 2
cacher: 1
builder: 1
resources:
requests:
cpu: 2
memory: 4Gi
limits:
cpu: 4
memory: 8Gi
Pod Disruption Budget¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
podDisruptionBudget.enabled |
bool | true |
No | Create a PodDisruptionBudget for the gateway deployment to ensure availability during node drain and rolling updates |
podDisruptionBudget.minAvailable |
integer/string | 1 |
No | Minimum number (or percentage) of gateway pods that must remain available during voluntary disruptions |
Database Initialization¶
| Value | Type | Default (values.yaml) |
Default (values_sample.yaml) |
Required | Description |
|---|---|---|---|---|---|
db.runDbMigrationScript |
bool | true |
false |
Yes on first install | Run Alembic schema migrations as a pre-install/pre-upgrade Job. Must be true on first install or the database schema will not be initialized |
db.runDbInitScript |
bool | false |
false |
No | Run the database initialization script (seed data). Only needed on fresh installs that require initial seed data |
First install: set runDbMigrationScript: true
values_sample.yaml ships with db.runDbMigrationScript: false. On a brand-new install, the database schema does not exist yet. Without migrations, model creation will fail with cryptic PostgreSQL errors about missing tables. Always override this to true on first install. After initial migration, subsequent upgrades will apply incremental migrations automatically when set to true.
Database Engine Tuning¶
These values tune the SQLAlchemy connection pool. Defaults are appropriate for most deployments. Increase pool_size and max_overflow only when you observe connection exhaustion errors under high gateway concurrency.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.infra.db_engine_pool_size |
integer | 10 |
No | Number of persistent connections in the SQLAlchemy connection pool per process |
config.values.infra.db_engine_max_overflow |
integer | 10 |
No | Maximum number of connections allowed above pool_size. Total max connections = pool_size + max_overflow |
config.values.infra.db_engine_echo |
bool | false |
No | Log all SQL statements. Produces extremely high log volume — use only for debugging SQL query issues |
config.values.infra.db_engine_echo_pool |
bool | false |
No | Log all connection pool events (checkout, checkin, overflow). Use only for debugging connection pool exhaustion |
config.values.infra.db_engine_disconnect_strategy |
string | pessimistic |
No | Strategy for detecting stale/broken connections. pessimistic tests the connection before each use (safe but adds a small latency). optimistic assumes connections are valid until proven otherwise |
config:
values:
infra:
db_engine_pool_size: 10
db_engine_max_overflow: 10
db_engine_echo: false
db_engine_echo_pool: false
db_engine_disconnect_strategy: "pessimistic"
LLM Inference Image Repositories¶
These values specify the Docker repository paths for each supported inference backend. They are combined with config.values.infra.docker_repo_prefix at endpoint creation time to form the full image URI.
vllm_repository: always override in customer environments
The default value vllm is a short relative path that resolves to Scale's internal ECR registry when combined with Scale's docker_repo_prefix. In customer environments with a different registry prefix, endpoint pods will attempt to pull from a non-existent or inaccessible image path. The pods will appear to be INITIALIZING with no clear error. Always set vllm_repository to the full repository path or a prefix-relative path that exists in your registry.
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.launch.vllm_repository |
string | vllm |
Yes | Repository path for vLLM inference images. This is the most commonly used inference backend |
config.values.launch.tensorrt_llm_repository |
string | tensorrt-llm |
No | Repository path for TensorRT-LLM inference images |
config.values.launch.batch_inference_vllm_repository |
string | llm-engine/batch-infer-vllm |
No | Repository path for batch inference images (used by batch completion endpoints) |
config.values.launch.tgi_repository |
string | text-generation-inference |
No | Repository path for HuggingFace Text Generation Inference images |
config.values.launch.lightllm_repository |
string | lightllm |
No | Repository path for LightLLM inference images |
config.values.launch.sglang_repository |
string | null |
No | Repository path for SGLang inference images. Optional; leave unset if SGLang is not used |
config.values.launch.user_inference_base_repository |
string | launch/inference |
No | Base repository for custom user-defined inference images |
config.values.launch.user_inference_pytorch_repository |
string | launch/inference/pytorch |
No | Repository for custom PyTorch inference images |
config.values.launch.user_inference_tensorflow_repository |
string | launch/inference/tf |
No | Repository for custom TensorFlow inference images |
config.values.launch.docker_image_layer_cache_repository |
string | launch-docker-build-cache |
No | Repository used as a layer cache during Docker image builds for custom endpoints |
config:
values:
launch:
# Always override these in customer environments
vllm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/vllm"
tensorrt_llm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/tensorrt-llm"
batch_inference_vllm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/llm-engine/batch-infer-vllm"
tgi_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/text-generation-inference"
lightllm_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/lightllm"
user_inference_base_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/launch/inference"
user_inference_pytorch_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/launch/inference/pytorch"
user_inference_tensorflow_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/launch/inference/tf"
docker_image_layer_cache_repository: "111122223333.dkr.ecr.us-east-1.amazonaws.com/launch-docker-build-cache"
Fine-Tuning Storage¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
config.values.launch.s3_file_llm_fine_tuning_job_repository |
string | s3://llm-engine/llm-ft-job-repository |
Yes | S3 URI (or equivalent) where fine-tuning job artifacts (checkpoints, adapters) are stored |
config.values.launch.hf_user_fine_tuned_weights_prefix |
string | s3://llm-engine/fine_tuned_weights |
Yes | S3 URI prefix for storing user-uploaded fine-tuned model weights |
Image Pull Policy¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
image.pullPolicy |
string | Always |
No | Kubernetes image pull policy for all control-plane images. Always ensures the latest tag is always pulled. Set to IfNotPresent to avoid redundant pulls when using immutable tags |
AWS ConfigMap¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
aws.configMap.name |
string | default-config |
No | Name of the Kubernetes ConfigMap containing the AWS CLI configuration |
aws.configMap.create |
bool | true |
No | Whether to create the AWS ConfigMap as part of the Helm release |
aws.profileName |
string | default |
No | AWS profile name to use from the ConfigMap |
Image Builder Service Account¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
imageBuilderServiceAccount.create |
bool | true |
No | Create a dedicated ServiceAccount for the image builder. This account needs ECR push/pull permissions |
imageBuilderServiceAccount.annotations |
map | — | No | Annotations for the image builder ServiceAccount. On EKS, set eks.amazonaws.com/role-arn to a role with ECR permissions |
imageBuilderServiceAccount:
create: true
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::111122223333:role/k8s-main-llm-engine-image-builder
Miscellaneous¶
| Value | Type | Default | Required | Description |
|---|---|---|---|---|
spellbook.enabled |
bool | false |
No | Enable Spellbook integration. Reserved for Scale internal use |
context |
string | production |
No | Deployment context tag. Used for labeling and log correlation. Set to a meaningful environment name (e.g., staging, production, customer-prod) |
celery_broker_type_redis |
bool/null | null |
No | When true, forces the Celery broker to use Redis regardless of the celeryBrokerType value. Useful for on-prem and GCP deployments where SQS is unavailable |
keyvaultName |
string | llm-engine-keyvault |
No | Azure Key Vault name. Only used when cloud_provider: azure |