🐍 Python Client Data Type Reference¶
CompletionOutput ¶
Bases: BaseModel
Represents the output of a completion request to a model.
CompletionStreamOutput ¶
CompletionSyncResponse ¶
Bases: BaseModel
Response object for a synchronous prompt completion.
request_id
instance-attribute
¶
The unique ID of the corresponding Completion request. This request_id
is generated on the server, and all logs
associated with the request are grouped by the request_id
, which allows for easier troubleshooting of errors as
follows:
- When running the Scale-hosted LLM Engine, please provide the
request_id
in any bug reports. - When running the self-hosted LLM Engine, the
request_id
serves as a trace ID in your observability provider.
CompletionStreamResponse ¶
Bases: BaseModel
Response object for a stream prompt completion task.
request_id
instance-attribute
¶
The unique ID of the corresponding Completion request. This request_id
is generated on the server, and all logs
associated with the request are grouped by the request_id
, which allows for easier troubleshooting of errors as
follows:
- When running the Scale-hosted LLM Engine, please provide the
request_id
in any bug reports. - When running the self-hosted LLM Engine, the
request_id
serves as a trace ID in your observability provider.
output
class-attribute
instance-attribute
¶
Completion output.
CreateFineTuneResponse ¶
Bases: BaseModel
Response object for creating a FineTune.
id
class-attribute
instance-attribute
¶
The ID of the FineTune.
GetFineTuneResponse ¶
Bases: BaseModel
Response object for retrieving a FineTune.
id
class-attribute
instance-attribute
¶
The ID of the FineTune.
fine_tuned_model
class-attribute
instance-attribute
¶
fine_tuned_model: Optional[str] = Field(
default=None,
description="Name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete",
)
The name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete.
ListFineTunesResponse ¶
Bases: BaseModel
Response object for listing FineTunes.
jobs
class-attribute
instance-attribute
¶
jobs: List[GetFineTuneResponse] = Field(
...,
description="List of fine-tuning jobs and their statuses.",
)
A list of FineTunes, represented as GetFineTuneResponse
s.
CancelFineTuneResponse ¶
Bases: BaseModel
Response object for cancelling a FineTune.
success
class-attribute
instance-attribute
¶
Whether the cancellation succeeded.
GetLLMEndpointResponse ¶
Bases: BaseModel
Response object for retrieving a Model.
name
class-attribute
instance-attribute
¶
name: str = Field(
description="The name of the model. Use this for making inference requests to the model."
)
The name of the model. Use this for making inference requests to the model.
source
class-attribute
instance-attribute
¶
The source of the model, e.g. Hugging Face.
inference_framework
class-attribute
instance-attribute
¶
inference_framework: LLMInferenceFramework = Field(
description="The inference framework used by the model."
)
(For self-hosted users) The inference framework used by the model.
id
class-attribute
instance-attribute
¶
id: Optional[str] = Field(
default=None,
description="(For self-hosted users) The autogenerated ID of the model.",
)
(For self-hosted users) The autogenerated ID of the model.
model_name
class-attribute
instance-attribute
¶
model_name: Optional[str] = Field(
default=None,
description="(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.",
)
(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as name
.
status
class-attribute
instance-attribute
¶
The status of the model (can be one of "READY", "UPDATE_PENDING", "UPDATE_IN_PROGRESS", "UPDATE_FAILED", "DELETE_IN_PROGRESS").
inference_framework_tag
class-attribute
instance-attribute
¶
inference_framework_tag: Optional[str] = Field(
default=None,
description="(For self-hosted users) The Docker image tag used to run the model.",
)
(For self-hosted users) The Docker image tag used to run the model.
num_shards
class-attribute
instance-attribute
¶
num_shards: Optional[int] = Field(
default=None,
description="(For self-hosted users) The number of shards.",
)
(For self-hosted users) The number of shards.
quantize
class-attribute
instance-attribute
¶
quantize: Optional[Quantization] = Field(
default=None,
description="(For self-hosted users) The quantization method.",
)
(For self-hosted users) The quantization method.
spec
class-attribute
instance-attribute
¶
spec: Optional[GetModelEndpointResponse] = Field(
default=None,
description="(For self-hosted users) Model endpoint details.",
)
(For self-hosted users) Model endpoint details.
ListLLMEndpointsResponse ¶
Bases: BaseModel
Response object for listing Models.
model_endpoints
class-attribute
instance-attribute
¶
A list of Models, represented as GetLLMEndpointResponse
s.
DeleteLLMEndpointResponse ¶
Bases: BaseModel
Response object for deleting a Model.
deleted
class-attribute
instance-attribute
¶
Whether the deletion succeeded.
ModelDownloadRequest ¶
Bases: BaseModel
Request object for downloading a model.
ModelDownloadResponse ¶
Bases: BaseModel
Response object for downloading a model.
urls
class-attribute
instance-attribute
¶
urls: Dict[str, str] = Field(
...,
description="Dictionary of (file_name, url) pairs to download the model from.",
)
UploadFileResponse ¶
Bases: BaseModel
Response object for uploading a file.
id
class-attribute
instance-attribute
¶
ID of the uploaded file.
GetFileResponse ¶
GetFileContentResponse ¶
Bases: BaseModel
Response object for retrieving a file's content.
ListFilesResponse ¶
Bases: BaseModel
Response object for listing files.
files
class-attribute
instance-attribute
¶
List of file IDs, names, and sizes.
DeleteFileResponse ¶
Bases: BaseModel
Response object for deleting a file.
deleted
class-attribute
instance-attribute
¶
Whether deletion was successful.
CreateBatchCompletionsRequestContent ¶
Bases: BaseModel
temperature
class-attribute
instance-attribute
¶
Temperature of the sampling. Setting to 0 equals to greedy sampling.
stop_sequences
class-attribute
instance-attribute
¶
List of sequences to stop the completion at.
return_token_log_probs
class-attribute
instance-attribute
¶
Whether to return the log probabilities of the tokens.
presence_penalty
class-attribute
instance-attribute
¶
Only supported in vllm, lightllm Penalize new tokens based on whether they appear in the text so far. 0.0 means no penalty
frequency_penalty
class-attribute
instance-attribute
¶
Only supported in vllm, lightllm Penalize new tokens based on their existing frequency in the text so far. 0.0 means no penalty
top_k
class-attribute
instance-attribute
¶
Controls the number of top tokens to consider. -1 means consider all tokens.
top_p
class-attribute
instance-attribute
¶
Controls the cumulative probability of the top tokens to consider. 1.0 means consider all tokens.
CreateBatchCompletionsModelConfig ¶
Bases: BaseModel
checkpoint_path
class-attribute
instance-attribute
¶
Path to the checkpoint to load the model from.
num_shards
class-attribute
instance-attribute
¶
Suggested number of shards to distribute the model. When not specified, will infer the number of shards based on model config. System may decide to use a different number than the given value.
quantize
class-attribute
instance-attribute
¶
Whether to quantize the model.
CreateBatchCompletionsRequest ¶
Bases: BaseModel
Request object for batch completions.
output_data_path
instance-attribute
¶
Path to the output file. The output file will be a JSON file of type List[CompletionOutput].
content
class-attribute
instance-attribute
¶
Either input_data_path
or content
needs to be provided.
When input_data_path is provided, the input file should be a JSON file of type BatchCompletionsRequestContent.
model_config
instance-attribute
¶
Model configuration for the batch inference. Hardware configurations are inferred.
data_parallelism
class-attribute
instance-attribute
¶
Number of replicas to run the batch inference. More replicas are slower to schedule but faster to inference.
max_runtime_sec
class-attribute
instance-attribute
¶
Maximum runtime of the batch inference in seconds. Default to one day.
tool_config
class-attribute
instance-attribute
¶
Configuration for tool use. NOTE: this config is highly experimental and signature will change significantly in future iterations.