🐍 Python Client Data Type Reference¶

CompletionOutput ¶

Bases: BaseModel

Represents the output of a completion request to a model.

text `instance-attribute` ¶

text: str

The text of the completion.

num_prompt_tokens `class-attribute` `instance-attribute` ¶

num_prompt_tokens: Optional[int] = None

Number of tokens in the prompt.

num_completion_tokens `instance-attribute` ¶

num_completion_tokens: int

Number of tokens in the completion.

CompletionStreamOutput ¶

Bases: BaseModel

text `instance-attribute` ¶

text: str

The text of the completion.

finished `instance-attribute` ¶

finished: bool

Whether the completion is finished.

num_prompt_tokens `class-attribute` `instance-attribute` ¶

num_prompt_tokens: Optional[int] = None

Number of tokens in the prompt.

num_completion_tokens `class-attribute` `instance-attribute` ¶

num_completion_tokens: Optional[int] = None

Number of tokens in the completion.

CompletionSyncResponse ¶

Bases: BaseModel

Response object for a synchronous prompt completion.

request_id `instance-attribute` ¶

request_id: str

The unique ID of the corresponding Completion request. This request_id is generated on the server, and all logs associated with the request are grouped by the request_id, which allows for easier troubleshooting of errors as follows:

When running the Scale-hosted LLM Engine, please provide the request_id in any bug reports.
When running the self-hosted LLM Engine, the request_id serves as a trace ID in your observability provider.

output `instance-attribute` ¶

output: CompletionOutput

Completion output.

CompletionStreamResponse ¶

Bases: BaseModel

Response object for a stream prompt completion task.

request_id `instance-attribute` ¶

request_id: str

The unique ID of the corresponding Completion request. This request_id is generated on the server, and all logs associated with the request are grouped by the request_id, which allows for easier troubleshooting of errors as follows:

When running the Scale-hosted LLM Engine, please provide the request_id in any bug reports.
When running the self-hosted LLM Engine, the request_id serves as a trace ID in your observability provider.

output `class-attribute` `instance-attribute` ¶

output: Optional[CompletionStreamOutput] = None

Completion output.

CreateFineTuneResponse ¶

Bases: BaseModel

Response object for creating a FineTune.

id `class-attribute` `instance-attribute` ¶

id: str = Field(
    ..., description="ID of the created fine-tuning job."
)

The ID of the FineTune.

GetFineTuneResponse ¶

Bases: BaseModel

Response object for retrieving a FineTune.

id `class-attribute` `instance-attribute` ¶

id: str = Field(..., description="ID of the requested job.")

The ID of the FineTune.

fine_tuned_model `class-attribute` `instance-attribute` ¶

fine_tuned_model: Optional[str] = Field(
    default=None,
    description="Name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete",
)

The name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete.

ListFineTunesResponse ¶

Bases: BaseModel

Response object for listing FineTunes.

jobs `class-attribute` `instance-attribute` ¶

jobs: List[GetFineTuneResponse] = Field(
    ...,
    description="List of fine-tuning jobs and their statuses.",
)

A list of FineTunes, represented as GetFineTuneResponses.

CancelFineTuneResponse ¶

Bases: BaseModel

Response object for cancelling a FineTune.

success `class-attribute` `instance-attribute` ¶

success: bool = Field(
    ..., description="Whether cancellation was successful."
)

Whether the cancellation succeeded.

GetLLMEndpointResponse ¶

Bases: BaseModel

Response object for retrieving a Model.

name `class-attribute` `instance-attribute` ¶

name: str = Field(
    description="The name of the model. Use this for making inference requests to the model."
)

The name of the model. Use this for making inference requests to the model.

source `class-attribute` `instance-attribute` ¶

source: LLMSource = Field(
    description="The source of the model, e.g. Hugging Face."
)

The source of the model, e.g. Hugging Face.

inference_framework `class-attribute` `instance-attribute` ¶

inference_framework: LLMInferenceFramework = Field(
    description="The inference framework used by the model."
)

(For self-hosted users) The inference framework used by the model.

id `class-attribute` `instance-attribute` ¶

id: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) The autogenerated ID of the model.",
)

(For self-hosted users) The autogenerated ID of the model.

model_name `class-attribute` `instance-attribute` ¶

model_name: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.",
)

(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as name.

status `class-attribute` `instance-attribute` ¶

status: ModelEndpointStatus = Field(
    description="The status of the model."
)

The status of the model (can be one of "READY", "UPDATE_PENDING", "UPDATE_IN_PROGRESS", "UPDATE_FAILED", "DELETE_IN_PROGRESS").

inference_framework_tag `class-attribute` `instance-attribute` ¶

inference_framework_tag: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) The Docker image tag used to run the model.",
)

(For self-hosted users) The Docker image tag used to run the model.

num_shards `class-attribute` `instance-attribute` ¶

num_shards: Optional[int] = Field(
    default=None,
    description="(For self-hosted users) The number of shards.",
)

(For self-hosted users) The number of shards.

quantize `class-attribute` `instance-attribute` ¶

quantize: Optional[Quantization] = Field(
    default=None,
    description="(For self-hosted users) The quantization method.",
)

(For self-hosted users) The quantization method.

spec `class-attribute` `instance-attribute` ¶

spec: Optional[GetModelEndpointResponse] = Field(
    default=None,
    description="(For self-hosted users) Model endpoint details.",
)

(For self-hosted users) Model endpoint details.

ListLLMEndpointsResponse ¶

Bases: BaseModel

Response object for listing Models.

model_endpoints `class-attribute` `instance-attribute` ¶

model_endpoints: List[GetLLMEndpointResponse] = Field(
    ..., description="The list of models."
)

A list of Models, represented as GetLLMEndpointResponses.

DeleteLLMEndpointResponse ¶

Bases: BaseModel

Response object for deleting a Model.

deleted `class-attribute` `instance-attribute` ¶

deleted: bool = Field(
    ..., description="Whether deletion was successful."
)

Whether the deletion succeeded.

ModelDownloadRequest ¶

Bases: BaseModel

Request object for downloading a model.

model_name `class-attribute` `instance-attribute` ¶

model_name: str = Field(
    ..., description="Name of the model to download."
)

download_format `class-attribute` `instance-attribute` ¶

download_format: Optional[str] = Field(
    default="hugging_face",
    description="Desired return format for downloaded model weights (default=hugging_face).",
)

ModelDownloadResponse ¶

Bases: BaseModel

Response object for downloading a model.

urls `class-attribute` `instance-attribute` ¶

urls: Dict[str, str] = Field(
    ...,
    description="Dictionary of (file_name, url) pairs to download the model from.",
)

UploadFileResponse ¶

Bases: BaseModel

Response object for uploading a file.

id `class-attribute` `instance-attribute` ¶

id: str = Field(..., description="ID of the uploaded file.")

ID of the uploaded file.

GetFileResponse ¶

Bases: BaseModel

Response object for retrieving a file.

id `class-attribute` `instance-attribute` ¶

id: str = Field(
    ..., description="ID of the requested file."
)

ID of the requested file.

filename `class-attribute` `instance-attribute` ¶

filename: str = Field(..., description='File name.')

File name.

size `class-attribute` `instance-attribute` ¶

size: int = Field(
    ..., description="Length of the file, in characters."
)

Length of the file, in characters.

GetFileContentResponse ¶

Bases: BaseModel

Response object for retrieving a file's content.

id `class-attribute` `instance-attribute` ¶

id: str = Field(
    ..., description="ID of the requested file."
)

ID of the requested file.

content `class-attribute` `instance-attribute` ¶

content: str = Field(..., description='File content.')

File content.

ListFilesResponse ¶

Bases: BaseModel

Response object for listing files.

files `class-attribute` `instance-attribute` ¶

files: List[GetFileResponse] = Field(
    ..., description="List of file IDs, names, and sizes."
)

List of file IDs, names, and sizes.

DeleteFileResponse ¶

Bases: BaseModel

Response object for deleting a file.

deleted `class-attribute` `instance-attribute` ¶

deleted: bool = Field(
    ..., description="Whether deletion was successful."
)

Whether deletion was successful.

CreateBatchCompletionsRequestContent ¶

Bases: BaseModel

prompts `instance-attribute` ¶

prompts: List[str]

max_new_tokens `instance-attribute` ¶

max_new_tokens: int

temperature `class-attribute` `instance-attribute` ¶

temperature: float = Field(ge=0.0, le=1.0)

Temperature of the sampling. Setting to 0 equals to greedy sampling.

stop_sequences `class-attribute` `instance-attribute` ¶

stop_sequences: Optional[List[str]] = None

List of sequences to stop the completion at.

return_token_log_probs `class-attribute` `instance-attribute` ¶

return_token_log_probs: Optional[bool] = False

Whether to return the log probabilities of the tokens.

presence_penalty `class-attribute` `instance-attribute` ¶

presence_penalty: Optional[float] = Field(
    default=None, ge=0.0, le=2.0
)

Only supported in vllm, lightllm Penalize new tokens based on whether they appear in the text so far. 0.0 means no penalty

frequency_penalty `class-attribute` `instance-attribute` ¶

frequency_penalty: Optional[float] = Field(
    default=None, ge=0.0, le=2.0
)

Only supported in vllm, lightllm Penalize new tokens based on their existing frequency in the text so far. 0.0 means no penalty

top_k `class-attribute` `instance-attribute` ¶

top_k: Optional[int] = Field(default=None, ge=-1)

Controls the number of top tokens to consider. -1 means consider all tokens.

top_p `class-attribute` `instance-attribute` ¶

top_p: Optional[float] = Field(default=None, gt=0.0, le=1.0)

Controls the cumulative probability of the top tokens to consider. 1.0 means consider all tokens.

CreateBatchCompletionsModelConfig ¶

Bases: BaseModel

model `instance-attribute` ¶

model: str

checkpoint_path `class-attribute` `instance-attribute` ¶

checkpoint_path: Optional[str] = None

Path to the checkpoint to load the model from.

labels `instance-attribute` ¶

labels: Dict[str, str]

Labels to attach to the batch inference job.

num_shards `class-attribute` `instance-attribute` ¶

num_shards: Optional[int] = 1

Suggested number of shards to distribute the model. When not specified, will infer the number of shards based on model config. System may decide to use a different number than the given value.

quantize `class-attribute` `instance-attribute` ¶

quantize: Optional[Quantization] = None

Whether to quantize the model.

seed `class-attribute` `instance-attribute` ¶

seed: Optional[int] = None

Random seed for the model.

CreateBatchCompletionsRequest ¶

Bases: BaseModel

Request object for batch completions.

input_data_path `instance-attribute` ¶

input_data_path: Optional[str]

output_data_path `instance-attribute` ¶

output_data_path: str

Path to the output file. The output file will be a JSON file of type List[CompletionOutput].

content `class-attribute` `instance-attribute` ¶

content: Optional[
    CreateBatchCompletionsRequestContent
] = None

Either input_data_path or content needs to be provided. When input_data_path is provided, the input file should be a JSON file of type BatchCompletionsRequestContent.

model_config `instance-attribute` ¶

model_config: CreateBatchCompletionsModelConfig

Model configuration for the batch inference. Hardware configurations are inferred.

data_parallelism `class-attribute` `instance-attribute` ¶

data_parallelism: Optional[int] = Field(
    default=1, ge=1, le=64
)

Number of replicas to run the batch inference. More replicas are slower to schedule but faster to inference.

max_runtime_sec `class-attribute` `instance-attribute` ¶

max_runtime_sec: Optional[int] = Field(
    default=24 * 3600, ge=1, le=2 * 24 * 3600
)

Maximum runtime of the batch inference in seconds. Default to one day.

tool_config `class-attribute` `instance-attribute` ¶

tool_config: Optional[ToolConfig] = None

Configuration for tool use. NOTE: this config is highly experimental and signature will change significantly in future iterations.

CreateBatchCompletionsResponse ¶

Bases: BaseModel

job_id `instance-attribute` ¶

job_id: str

The ID of the batch completions job.

🐍 Python Client Data Type Reference¶

CompletionOutput ¶

text instance-attribute ¶

num_prompt_tokens class-attribute instance-attribute ¶

num_completion_tokens instance-attribute ¶

CompletionStreamOutput ¶

text instance-attribute ¶

finished instance-attribute ¶

num_prompt_tokens class-attribute instance-attribute ¶

num_completion_tokens class-attribute instance-attribute ¶

CompletionSyncResponse ¶

request_id instance-attribute ¶

output instance-attribute ¶

CompletionStreamResponse ¶

request_id instance-attribute ¶

output class-attribute instance-attribute ¶

CreateFineTuneResponse ¶

id class-attribute instance-attribute ¶

GetFineTuneResponse ¶

id class-attribute instance-attribute ¶

fine_tuned_model class-attribute instance-attribute ¶

ListFineTunesResponse ¶

jobs class-attribute instance-attribute ¶

CancelFineTuneResponse ¶

success class-attribute instance-attribute ¶

GetLLMEndpointResponse ¶

name class-attribute instance-attribute ¶

source class-attribute instance-attribute ¶

inference_framework class-attribute instance-attribute ¶

id class-attribute instance-attribute ¶

model_name class-attribute instance-attribute ¶

status class-attribute instance-attribute ¶

inference_framework_tag class-attribute instance-attribute ¶

num_shards class-attribute instance-attribute ¶

quantize class-attribute instance-attribute ¶

spec class-attribute instance-attribute ¶

ListLLMEndpointsResponse ¶

model_endpoints class-attribute instance-attribute ¶

DeleteLLMEndpointResponse ¶

deleted class-attribute instance-attribute ¶

ModelDownloadRequest ¶

model_name class-attribute instance-attribute ¶

download_format class-attribute instance-attribute ¶

ModelDownloadResponse ¶

urls class-attribute instance-attribute ¶

UploadFileResponse ¶

id class-attribute instance-attribute ¶

GetFileResponse ¶

id class-attribute instance-attribute ¶

filename class-attribute instance-attribute ¶

size class-attribute instance-attribute ¶

GetFileContentResponse ¶

id class-attribute instance-attribute ¶

content class-attribute instance-attribute ¶

ListFilesResponse ¶

files class-attribute instance-attribute ¶

DeleteFileResponse ¶

deleted class-attribute instance-attribute ¶

CreateBatchCompletionsRequestContent ¶

prompts instance-attribute ¶

max_new_tokens instance-attribute ¶

temperature class-attribute instance-attribute ¶

stop_sequences class-attribute instance-attribute ¶

return_token_log_probs class-attribute instance-attribute ¶

presence_penalty class-attribute instance-attribute ¶

frequency_penalty class-attribute instance-attribute ¶

top_k class-attribute instance-attribute ¶

top_p class-attribute instance-attribute ¶

CreateBatchCompletionsModelConfig ¶

model instance-attribute ¶

checkpoint_path class-attribute instance-attribute ¶

labels instance-attribute ¶

num_shards class-attribute instance-attribute ¶

quantize class-attribute instance-attribute ¶

seed class-attribute instance-attribute ¶

CreateBatchCompletionsRequest ¶

input_data_path instance-attribute ¶

output_data_path instance-attribute ¶

content class-attribute instance-attribute ¶

model_config instance-attribute ¶

text `instance-attribute` ¶

num_prompt_tokens `class-attribute` `instance-attribute` ¶

num_completion_tokens `instance-attribute` ¶

text `instance-attribute` ¶

finished `instance-attribute` ¶

num_prompt_tokens `class-attribute` `instance-attribute` ¶

num_completion_tokens `class-attribute` `instance-attribute` ¶

request_id `instance-attribute` ¶

output `instance-attribute` ¶

request_id `instance-attribute` ¶

output `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

fine_tuned_model `class-attribute` `instance-attribute` ¶

jobs `class-attribute` `instance-attribute` ¶

success `class-attribute` `instance-attribute` ¶

name `class-attribute` `instance-attribute` ¶

source `class-attribute` `instance-attribute` ¶

inference_framework `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

model_name `class-attribute` `instance-attribute` ¶

status `class-attribute` `instance-attribute` ¶

inference_framework_tag `class-attribute` `instance-attribute` ¶

num_shards `class-attribute` `instance-attribute` ¶

quantize `class-attribute` `instance-attribute` ¶

spec `class-attribute` `instance-attribute` ¶

model_endpoints `class-attribute` `instance-attribute` ¶

deleted `class-attribute` `instance-attribute` ¶

model_name `class-attribute` `instance-attribute` ¶

download_format `class-attribute` `instance-attribute` ¶

urls `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

filename `class-attribute` `instance-attribute` ¶

size `class-attribute` `instance-attribute` ¶

id `class-attribute` `instance-attribute` ¶

content `class-attribute` `instance-attribute` ¶

files `class-attribute` `instance-attribute` ¶

deleted `class-attribute` `instance-attribute` ¶

prompts `instance-attribute` ¶

max_new_tokens `instance-attribute` ¶

temperature `class-attribute` `instance-attribute` ¶

stop_sequences `class-attribute` `instance-attribute` ¶

return_token_log_probs `class-attribute` `instance-attribute` ¶

presence_penalty `class-attribute` `instance-attribute` ¶

frequency_penalty `class-attribute` `instance-attribute` ¶

top_k `class-attribute` `instance-attribute` ¶

top_p `class-attribute` `instance-attribute` ¶

model `instance-attribute` ¶

checkpoint_path `class-attribute` `instance-attribute` ¶

labels `instance-attribute` ¶

num_shards `class-attribute` `instance-attribute` ¶

quantize `class-attribute` `instance-attribute` ¶

seed `class-attribute` `instance-attribute` ¶

input_data_path `instance-attribute` ¶

output_data_path `instance-attribute` ¶

content `class-attribute` `instance-attribute` ¶

model_config `instance-attribute` ¶

data_parallelism `class-attribute` `instance-attribute` ¶

max_runtime_sec `class-attribute` `instance-attribute` ¶

tool_config `class-attribute` `instance-attribute` ¶

job_id `instance-attribute` ¶