WIP thinking API support

- Allows specifying whether thinking mode should be on or not - Templates get passed a new option so, e.g., qwen3's template can put `/think` or `/no_think` in the system prompt depending on the value of the setting - Add parsing for thinking blocks in both streaming/non-streaming mode - Update the CLI to make use of these changes TODO: - [ ] Don't parse thinking blocks when the user doesn't explicitly set the option, to maintain backwards compatibility - [ ] Warning on CLI when using a non-thinking/older version of a model (with an old template) - [ ] Wire up capabilities fully - [x] Unify parsing for streaming/non-streaming - [ ] Update templates - [ ] Update python/js libraries - [ ] How to handle differences in models wrt defaults and whether or not the thinking ability can even be controlled. If not specified by the user, should there be a default or should the template be able to check if it was explicitly set?
2025-05-11 18:36:41 +02:00 · 2025-05-07 16:15:46 -07:00 · 2025-05-07 16:15:46 -07:00 · 77f4594e80
commit 77f4594e80
parent a7835c6716
14 changed files with 513 additions and 12 deletions
--- a/api/types.go
+++ b/api/types.go
@ -83,6 +83,10 @@ type GenerateRequest struct {
 	// Options lists model-specific options. For example, temperature can be
 	// set through this field, if the model supports it.
 	Options map[string]any `json:"options"`
+
+	// Thinking controls whether thinking/reasoning models will think before
+	// responding
+	Thinking bool `json:"thinking,omitempty"`
 }

 // ChatRequest describes a request sent by [Client.Chat].
@ -108,6 +112,10 @@ type ChatRequest struct {

 	// Options lists model-specific options.
 	Options map[string]any `json:"options"`
+
+	// Thinking controls whether thinking/reasoning models will think before
+	// responding
+	Thinking bool `json:"thinking,omitempty"`
 }

 type Tools []Tool
@ -130,6 +138,10 @@ type Message struct {
 	Content   string      `json:"content"`
 	Images    []ImageData `json:"images,omitempty"`
 	ToolCalls []ToolCall  `json:"tool_calls,omitempty"`
+
+	// ThinkingBlock contains the text that was inside <think> tags in the
+	// original model output when ChatRequest.Thinking was enabled.
+	ThinkingBlock string `json:"thinkingBlock,omitempty"`
 }

 func (m *Message) UnmarshalJSON(b []byte) error {
@ -275,6 +287,8 @@ type Options struct {
 	MirostatTau      float32  `json:"mirostat_tau,omitempty"`
 	MirostatEta      float32  `json:"mirostat_eta,omitempty"`
 	Stop             []string `json:"stop,omitempty"`
+
+	Thinking bool `json:"thinking,omitempty"`
 }

 // Runner options which must be set when the model is loaded into memory