xinference.client.handlers.ChatModelHandle.chat#

ChatModelHandle.chat(messages: List[Dict], tools: List[Dict] | None = None, enable_thinking: bool | None = None, generate_config: PytorchGenerateConfig | None = None) → ChatCompletion | Iterator[ChatCompletionChunk]#

Given a list of messages comprising a conversation, the model will return a response via RESTful APIs.

Paramètres:

messages (List[Dict]) – A list of messages comprising the conversation so far.
tools (Optional[List[Dict]]) – A tool list.
generate_config (Optional["PytorchGenerateConfig"]) – Additional configuration for the chat generation. « PytorchGenerateConfig » -> configuration for pytorch model
enable_thinking (Optional[bool]) – Toggle thinking mode per request for hybrid reasoning LLMs.

Renvoie:

Stream is a parameter in generate_config. When stream is set to True, the function will return Iterator[« ChatCompletionChunk »]. When stream is set to False, the function will return « ChatCompletion ».

Type renvoyé:

Union[« ChatCompletion », Iterator[« ChatCompletionChunk »]]

Lève:

RuntimeError – Report the failure to generate the chat from the server. Detailed information provided in error message.