Create a thread and run it in one request.
POST /threads/runs
Authorizations
Request Body required
object
The ID of the assistant to use to execute this run.
If no thread is provided, an empty thread will be created.
object
A list of messages to start the thread with.
object
The role of the entity that is creating the message. Allowed values include:
user
: Indicates the message is sent by an actual user and should be used in most cases to represent user-generated messages.assistant
: Indicates the message is generated by the assistant. Use this value to insert messages from the assistant into the conversation.
The text contents of the message.
An array of content parts with a defined type, each can be of type text
or images can be passed with image_url
or image_file
. Image types are only supported on Vision-compatible models.
References an image File in the content of a message.
object
Always image_file
.
object
The File ID of the image in the message content. Set purpose="vision"
when uploading the File if you need to later display the file content.
Specifies the detail level of the image if specified by the user. low
uses fewer tokens, you can opt in to high resolution using high
.
References an image URL in the content of a message.
object
The type of the content part.
object
The external URL of the image, must be a supported image types: jpeg, jpg, png, gif, webp.
Specifies the detail level of the image. low
uses fewer tokens, you can opt in to high resolution using high
. Default value is auto
The text content that is part of a message.
object
Always text
.
Text content to be sent to the model
A list of files attached to the message, and the tools they should be added to.
object
The ID of the file to attach to the message.
The tools to add this file to.
object
The type of tool being defined: code_interpreter
object
The type of tool being defined: file_search
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
A set of resources that are made available to the assistant’s tools in this thread. The resources are specific to the type of tool. For example, the code_interpreter
tool requires a list of file IDs, while the file_search
tool requires a list of vector store IDs.
object
object
A list of file IDs made available to the code_interpreter
tool. There can be a maximum of 20 files associated with the tool.
object
The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread.
object
A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store.
The default strategy. This strategy currently uses a max_chunk_size_tokens
of 800
and chunk_overlap_tokens
of 400
.
object
Always auto
.
object
Always static
.
object
The maximum number of tokens in each chunk. The default value is 800
. The minimum value is 100
and the maximum value is 4096
.
The number of tokens that overlap between chunks. The default value is 400
.
Note that the overlap must not exceed half of max_chunk_size_tokens
.
Set of 16 key-value pairs that can be attached to a vector store. This can be useful for storing additional information about the vector store in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
object
The vector store attached to this thread. There can be a maximum of 1 vector store attached to the thread.
A helper to create a vector store with file_ids and attach it to this thread. There can be a maximum of 1 vector store attached to the thread.
object
A list of file IDs to add to the vector store. There can be a maximum of 10000 files in a vector store.
The default strategy. This strategy currently uses a max_chunk_size_tokens
of 800
and chunk_overlap_tokens
of 400
.
object
Always auto
.
object
Always static
.
object
The maximum number of tokens in each chunk. The default value is 800
. The minimum value is 100
and the maximum value is 4096
.
The number of tokens that overlap between chunks. The default value is 400
.
Note that the overlap must not exceed half of max_chunk_size_tokens
.
Set of 16 key-value pairs that can be attached to a vector store. This can be useful for storing additional information about the vector store in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
Override the default system message of the assistant. This is useful for modifying the behavior on a per-run basis.
Override the tools the assistant can use for this run. This is useful for modifying the behavior on a per-run basis.
object
The type of tool being defined: code_interpreter
object
The type of tool being defined: file_search
Overrides for the file search tool.
object
The maximum number of results the file search tool should output. The default is 20 for gpt-4*
models and 5 for gpt-3.5-turbo
. This number should be between 1 and 50 inclusive.
Note that the file search tool may output fewer than max_num_results
results. See the file search tool documentation for more information.
The ranking options for the file search. If not specified, the file search tool will use the auto
ranker and a score_threshold of 0.
See the file search tool documentation for more information.
object
The ranker to use for the file search. If not specified will use the auto
ranker.
The score threshold for the file search. All values must be a floating point number between 0 and 1.
object
The type of tool being defined: function
object
A description of what the function does, used by the model to choose when and how to call the function.
The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.
Omitting parameters
defines a function with an empty parameter list.
object
Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters
field. Only a subset of JSON Schema is supported when strict
is true
. Learn more about Structured Outputs in the function calling guide.
A set of resources that are used by the assistant’s tools. The resources are specific to the type of tool. For example, the code_interpreter
tool requires a list of file IDs, while the file_search
tool requires a list of vector store IDs.
object
object
A list of file IDs made available to the code_interpreter
tool. There can be a maximum of 20 files associated with the tool.
object
The ID of the vector store attached to this assistant. There can be a maximum of 1 vector store attached to the assistant.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature but not both.
If true
, returns a stream of events that happen during the Run as server-sent events, terminating when the Run enters a terminal state with a data: [DONE]
message.
The maximum number of prompt tokens that may be used over the course of the run. The run will make a best effort to use only the number of prompt tokens specified, across multiple turns of the run. If the run exceeds the number of prompt tokens specified, the run will end with status incomplete
. See incomplete_details
for more info.
The maximum number of completion tokens that may be used over the course of the run. The run will make a best effort to use only the number of completion tokens specified, across multiple turns of the run. If the run exceeds the number of completion tokens specified, the run will end with status incomplete
. See incomplete_details
for more info.
Controls for how a thread will be truncated prior to the run. Use this to control the intial context window of the run.
object
The truncation strategy to use for the thread. The default is auto
. If set to last_messages
, the thread will be truncated to the n most recent messages in the thread. When set to auto
, messages in the middle of the thread will be dropped to fit the context length of the model, max_prompt_tokens
.
The number of most recent messages from the thread when constructing the context for the run.
none
means the model will not call any tools and instead generates a message. auto
means the model can pick between generating a message or calling one or more tools. required
means the model must call one or more tools before responding to the user.
Specifies a tool the model should use. Use to force the model to call a specific tool.
object
The type of the tool. If type is function
, the function name must be set
object
The name of the function to call.
Whether to enable parallel function calling during tool use.
auto
is the default value
object
The type of response format being defined: text
object
The type of response format being defined: json_object
object
The type of response format being defined: json_schema
object
A description of what the response format is for, used by the model to determine how to respond in the format.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema for the response format, described as a JSON Schema object.
object
Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema
field. Only a subset of JSON Schema is supported when strict
is true
. To learn more, read the Structured Outputs guide.
Responses
200
OK
Represents an execution run on a thread.
object
The identifier, which can be referenced in API endpoints.
The object type, which is always thread.run
.
The Unix timestamp (in seconds) for when the run was created.
The ID of the thread that was executed on as a part of this run.
The ID of the assistant used for execution of this run.
The status of the run, which can be either queued
, in_progress
, requires_action
, cancelling
, cancelled
, failed
, completed
, incomplete
, or expired
.
Details on the action required to continue the run. Will be null
if no action is required.
object
For now, this is always submit_tool_outputs
.
Details on the tool outputs needed for this run to continue.
object
A list of the relevant tool calls.
Tool call objects
object
The ID of the tool call. This ID must be referenced when you submit the tool outputs in using the Submit tool outputs to run endpoint.
The type of tool call the output is required for. For now, this is always function
.
The function definition.
object
The name of the function.
The arguments that the model expects you to pass to the function.
The last error associated with this run. Will be null
if there are no errors.
object
One of server_error
, rate_limit_exceeded
, or invalid_prompt
.
A human-readable description of the error.
The Unix timestamp (in seconds) for when the run will expire.
The Unix timestamp (in seconds) for when the run was started.
The Unix timestamp (in seconds) for when the run was cancelled.
The Unix timestamp (in seconds) for when the run failed.
The Unix timestamp (in seconds) for when the run was completed.
Details on why the run is incomplete. Will be null
if the run is not incomplete.
object
The reason why the run is incomplete. This will point to which specific token limit was reached over the course of the run.
The model that the assistant used for this run.
The instructions that the assistant used for this run.
The list of tools that the assistant used for this run.
object
The type of tool being defined: code_interpreter
object
The type of tool being defined: file_search
Overrides for the file search tool.
object
The maximum number of results the file search tool should output. The default is 20 for gpt-4*
models and 5 for gpt-3.5-turbo
. This number should be between 1 and 50 inclusive.
Note that the file search tool may output fewer than max_num_results
results. See the file search tool documentation for more information.
The ranking options for the file search. If not specified, the file search tool will use the auto
ranker and a score_threshold of 0.
See the file search tool documentation for more information.
object
The ranker to use for the file search. If not specified will use the auto
ranker.
The score threshold for the file search. All values must be a floating point number between 0 and 1.
object
The type of tool being defined: function
object
A description of what the function does, used by the model to choose when and how to call the function.
The name of the function to be called. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The parameters the functions accepts, described as a JSON Schema object. See the guide for examples, and the JSON Schema reference for documentation about the format.
Omitting parameters
defines a function with an empty parameter list.
object
Whether to enable strict schema adherence when generating the function call. If set to true, the model will follow the exact schema defined in the parameters
field. Only a subset of JSON Schema is supported when strict
is true
. Learn more about Structured Outputs in the function calling guide.
Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format. Keys can be a maximum of 64 characters long and values can be a maximum of 512 characters long.
object
Usage statistics related to the run. This value will be null
if the run is not in a terminal state (i.e. in_progress
, queued
, etc.).
object
Number of completion tokens used over the course of the run.
Number of prompt tokens used over the course of the run.
Total number of tokens used (prompt + completion).
The sampling temperature used for this run. If not set, defaults to 1.
The nucleus sampling value used for this run. If not set, defaults to 1.
The maximum number of prompt tokens specified to have been used over the course of the run.
The maximum number of completion tokens specified to have been used over the course of the run.
Controls for how a thread will be truncated prior to the run. Use this to control the intial context window of the run.
object
The truncation strategy to use for the thread. The default is auto
. If set to last_messages
, the thread will be truncated to the n most recent messages in the thread. When set to auto
, messages in the middle of the thread will be dropped to fit the context length of the model, max_prompt_tokens
.
The number of most recent messages from the thread when constructing the context for the run.
none
means the model will not call any tools and instead generates a message. auto
means the model can pick between generating a message or calling one or more tools. required
means the model must call one or more tools before responding to the user.
Specifies a tool the model should use. Use to force the model to call a specific tool.
object
The type of the tool. If type is function
, the function name must be set
object
The name of the function to call.
Whether to enable parallel function calling during tool use.
auto
is the default value
object
The type of response format being defined: text
object
The type of response format being defined: json_object
object
The type of response format being defined: json_schema
object
A description of what the response format is for, used by the model to determine how to respond in the format.
The name of the response format. Must be a-z, A-Z, 0-9, or contain underscores and dashes, with a maximum length of 64.
The schema for the response format, described as a JSON Schema object.
object
Whether to enable strict schema adherence when generating the output. If set to true, the model will always follow the exact schema defined in the schema
field. Only a subset of JSON Schema is supported when strict
is true
. To learn more, read the Structured Outputs guide.