Skip to content

🐍 Python Client Data Type Reference

CompletionOutput

Bases: BaseModel

Represents the output of a completion request to a model.

text instance-attribute

text: str

The text of the completion.

num_prompt_tokens class-attribute instance-attribute

num_prompt_tokens: Optional[int] = None

Number of tokens in the prompt.

num_completion_tokens instance-attribute

num_completion_tokens: int

Number of tokens in the completion.

CompletionStreamOutput

Bases: BaseModel

text instance-attribute

text: str

The text of the completion.

finished instance-attribute

finished: bool

Whether the completion is finished.

num_prompt_tokens class-attribute instance-attribute

num_prompt_tokens: Optional[int] = None

Number of tokens in the prompt.

num_completion_tokens class-attribute instance-attribute

num_completion_tokens: Optional[int] = None

Number of tokens in the completion.

CompletionSyncResponse

Bases: BaseModel

Response object for a synchronous prompt completion.

request_id instance-attribute

request_id: str

The unique ID of the corresponding Completion request. This request_id is generated on the server, and all logs associated with the request are grouped by the request_id, which allows for easier troubleshooting of errors as follows:

  • When running the Scale-hosted LLM Engine, please provide the request_id in any bug reports.
  • When running the self-hosted LLM Engine, the request_id serves as a trace ID in your observability provider.

output instance-attribute

output: CompletionOutput

Completion output.

CompletionStreamResponse

Bases: BaseModel

Response object for a stream prompt completion task.

request_id instance-attribute

request_id: str

The unique ID of the corresponding Completion request. This request_id is generated on the server, and all logs associated with the request are grouped by the request_id, which allows for easier troubleshooting of errors as follows:

  • When running the Scale-hosted LLM Engine, please provide the request_id in any bug reports.
  • When running the self-hosted LLM Engine, the request_id serves as a trace ID in your observability provider.

output class-attribute instance-attribute

output: Optional[CompletionStreamOutput] = None

Completion output.

CreateFineTuneResponse

Bases: BaseModel

Response object for creating a FineTune.

id class-attribute instance-attribute

id: str = Field(
    ..., description="ID of the created fine-tuning job."
)

The ID of the FineTune.

GetFineTuneResponse

Bases: BaseModel

Response object for retrieving a FineTune.

id class-attribute instance-attribute

id: str = Field(..., description="ID of the requested job.")

The ID of the FineTune.

fine_tuned_model class-attribute instance-attribute

fine_tuned_model: Optional[str] = Field(
    default=None,
    description="Name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete",
)

The name of the resulting fine-tuned model. This can be plugged into the Completion API once the fine-tune is complete.

ListFineTunesResponse

Bases: BaseModel

Response object for listing FineTunes.

jobs class-attribute instance-attribute

jobs: List[GetFineTuneResponse] = Field(
    ...,
    description="List of fine-tuning jobs and their statuses.",
)

A list of FineTunes, represented as GetFineTuneResponses.

CancelFineTuneResponse

Bases: BaseModel

Response object for cancelling a FineTune.

success class-attribute instance-attribute

success: bool = Field(
    ..., description="Whether cancellation was successful."
)

Whether the cancellation succeeded.

GetLLMEndpointResponse

Bases: BaseModel

Response object for retrieving a Model.

name class-attribute instance-attribute

name: str = Field(
    description="The name of the model. Use this for making inference requests to the model."
)

The name of the model. Use this for making inference requests to the model.

source class-attribute instance-attribute

source: LLMSource = Field(
    description="The source of the model, e.g. Hugging Face."
)

The source of the model, e.g. Hugging Face.

inference_framework class-attribute instance-attribute

inference_framework: LLMInferenceFramework = Field(
    description="The inference framework used by the model."
)

(For self-hosted users) The inference framework used by the model.

id class-attribute instance-attribute

id: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) The autogenerated ID of the model.",
)

(For self-hosted users) The autogenerated ID of the model.

model_name class-attribute instance-attribute

model_name: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as `name`.",
)

(For self-hosted users) For fine-tuned models, the base model. For base models, this will be the same as name.

status class-attribute instance-attribute

status: ModelEndpointStatus = Field(
    description="The status of the model."
)

The status of the model (can be one of "READY", "UPDATE_PENDING", "UPDATE_IN_PROGRESS", "UPDATE_FAILED", "DELETE_IN_PROGRESS").

inference_framework_tag class-attribute instance-attribute

inference_framework_tag: Optional[str] = Field(
    default=None,
    description="(For self-hosted users) The Docker image tag used to run the model.",
)

(For self-hosted users) The Docker image tag used to run the model.

num_shards class-attribute instance-attribute

num_shards: Optional[int] = Field(
    default=None,
    description="(For self-hosted users) The number of shards.",
)

(For self-hosted users) The number of shards.

quantize class-attribute instance-attribute

quantize: Optional[Quantization] = Field(
    default=None,
    description="(For self-hosted users) The quantization method.",
)

(For self-hosted users) The quantization method.

spec class-attribute instance-attribute

spec: Optional[GetModelEndpointResponse] = Field(
    default=None,
    description="(For self-hosted users) Model endpoint details.",
)

(For self-hosted users) Model endpoint details.

ListLLMEndpointsResponse

Bases: BaseModel

Response object for listing Models.

model_endpoints class-attribute instance-attribute

model_endpoints: List[GetLLMEndpointResponse] = Field(
    ..., description="The list of models."
)

A list of Models, represented as GetLLMEndpointResponses.

DeleteLLMEndpointResponse

Bases: BaseModel

Response object for deleting a Model.

deleted class-attribute instance-attribute

deleted: bool = Field(
    ..., description="Whether deletion was successful."
)

Whether the deletion succeeded.

ModelDownloadRequest

Bases: BaseModel

Request object for downloading a model.

model_name class-attribute instance-attribute

model_name: str = Field(
    ..., description="Name of the model to download."
)

download_format class-attribute instance-attribute

download_format: Optional[str] = Field(
    default="hugging_face",
    description="Desired return format for downloaded model weights (default=hugging_face).",
)

ModelDownloadResponse

Bases: BaseModel

Response object for downloading a model.

urls class-attribute instance-attribute

urls: Dict[str, str] = Field(
    ...,
    description="Dictionary of (file_name, url) pairs to download the model from.",
)

UploadFileResponse

Bases: BaseModel

Response object for uploading a file.

id class-attribute instance-attribute

id: str = Field(..., description="ID of the uploaded file.")

ID of the uploaded file.

GetFileResponse

Bases: BaseModel

Response object for retrieving a file.

id class-attribute instance-attribute

id: str = Field(
    ..., description="ID of the requested file."
)

ID of the requested file.

filename class-attribute instance-attribute

filename: str = Field(..., description='File name.')

File name.

size class-attribute instance-attribute

size: int = Field(
    ..., description="Length of the file, in characters."
)

Length of the file, in characters.

GetFileContentResponse

Bases: BaseModel

Response object for retrieving a file's content.

id class-attribute instance-attribute

id: str = Field(
    ..., description="ID of the requested file."
)

ID of the requested file.

content class-attribute instance-attribute

content: str = Field(..., description='File content.')

File content.

ListFilesResponse

Bases: BaseModel

Response object for listing files.

files class-attribute instance-attribute

files: List[GetFileResponse] = Field(
    ..., description="List of file IDs, names, and sizes."
)

List of file IDs, names, and sizes.

DeleteFileResponse

Bases: BaseModel

Response object for deleting a file.

deleted class-attribute instance-attribute

deleted: bool = Field(
    ..., description="Whether deletion was successful."
)

Whether deletion was successful.

CreateBatchCompletionsRequestContent

Bases: BaseModel

prompts instance-attribute

prompts: List[str]

max_new_tokens instance-attribute

max_new_tokens: int

temperature class-attribute instance-attribute

temperature: float = Field(ge=0.0, le=1.0)

Temperature of the sampling. Setting to 0 equals to greedy sampling.

stop_sequences class-attribute instance-attribute

stop_sequences: Optional[List[str]] = None

List of sequences to stop the completion at.

return_token_log_probs class-attribute instance-attribute

return_token_log_probs: Optional[bool] = False

Whether to return the log probabilities of the tokens.

presence_penalty class-attribute instance-attribute

presence_penalty: Optional[float] = Field(
    default=None, ge=0.0, le=2.0
)

Only supported in vllm, lightllm Penalize new tokens based on whether they appear in the text so far. 0.0 means no penalty

frequency_penalty class-attribute instance-attribute

frequency_penalty: Optional[float] = Field(
    default=None, ge=0.0, le=2.0
)

Only supported in vllm, lightllm Penalize new tokens based on their existing frequency in the text so far. 0.0 means no penalty

top_k class-attribute instance-attribute

top_k: Optional[int] = Field(default=None, ge=-1)

Controls the number of top tokens to consider. -1 means consider all tokens.

top_p class-attribute instance-attribute

top_p: Optional[float] = Field(default=None, gt=0.0, le=1.0)

Controls the cumulative probability of the top tokens to consider. 1.0 means consider all tokens.

CreateBatchCompletionsModelConfig

Bases: BaseModel

model instance-attribute

model: str

checkpoint_path class-attribute instance-attribute

checkpoint_path: Optional[str] = None

Path to the checkpoint to load the model from.

labels instance-attribute

labels: Dict[str, str]

Labels to attach to the batch inference job.

num_shards class-attribute instance-attribute

num_shards: Optional[int] = 1

Suggested number of shards to distribute the model. When not specified, will infer the number of shards based on model config. System may decide to use a different number than the given value.

quantize class-attribute instance-attribute

quantize: Optional[Quantization] = None

Whether to quantize the model.

seed class-attribute instance-attribute

seed: Optional[int] = None

Random seed for the model.

CreateBatchCompletionsRequest

Bases: BaseModel

Request object for batch completions.

input_data_path instance-attribute

input_data_path: Optional[str]

output_data_path instance-attribute

output_data_path: str

Path to the output file. The output file will be a JSON file of type List[CompletionOutput].

content class-attribute instance-attribute

content: Optional[
    CreateBatchCompletionsRequestContent
] = None

Either input_data_path or content needs to be provided. When input_data_path is provided, the input file should be a JSON file of type BatchCompletionsRequestContent.

model_config instance-attribute

model_config: CreateBatchCompletionsModelConfig

Model configuration for the batch inference. Hardware configurations are inferred.

data_parallelism class-attribute instance-attribute

data_parallelism: Optional[int] = Field(
    default=1, ge=1, le=64
)

Number of replicas to run the batch inference. More replicas are slower to schedule but faster to inference.

max_runtime_sec class-attribute instance-attribute

max_runtime_sec: Optional[int] = Field(
    default=24 * 3600, ge=1, le=2 * 24 * 3600
)

Maximum runtime of the batch inference in seconds. Default to one day.

tool_config class-attribute instance-attribute

tool_config: Optional[ToolConfig] = None

Configuration for tool use. NOTE: this config is highly experimental and signature will change significantly in future iterations.

CreateBatchCompletionsResponse

Bases: BaseModel

job_id instance-attribute

job_id: str

The ID of the batch completions job.