WIP thinking API support

- Allows specifying whether thinking mode should be on or not
- Templates get passed a new option so, e.g., qwen3's template can put
  `/think` or `/no_think` in the system prompt depending on the value of
  the setting
- Add parsing for thinking blocks in both streaming/non-streaming mode
- Update the CLI to make use of these changes

TODO:

- [ ] Don't parse thinking blocks when the user doesn't explicitly set
      the option, to maintain backwards compatibility
- [ ] Warning on CLI when using a non-thinking/older version of a model
      (with an old template)
- [ ] Wire up capabilities fully
- [x] Unify parsing for streaming/non-streaming
- [ ] Update templates
- [ ] Update python/js libraries
- [ ] How to handle differences in models wrt defaults and whether or
      not the thinking ability can even be controlled. If not specified
      by the user, should there be a default or should the template be
      able to check if it was explicitly set?
This commit is contained in:
Devon Rifkin 2025-05-07 16:15:46 -07:00
parent a7835c6716
commit 77f4594e80
14 changed files with 513 additions and 12 deletions

View file

@ -83,6 +83,10 @@ type GenerateRequest struct {
// Options lists model-specific options. For example, temperature can be
// set through this field, if the model supports it.
Options map[string]any `json:"options"`
// Thinking controls whether thinking/reasoning models will think before
// responding
Thinking bool `json:"thinking,omitempty"`
}
// ChatRequest describes a request sent by [Client.Chat].
@ -108,6 +112,10 @@ type ChatRequest struct {
// Options lists model-specific options.
Options map[string]any `json:"options"`
// Thinking controls whether thinking/reasoning models will think before
// responding
Thinking bool `json:"thinking,omitempty"`
}
type Tools []Tool
@ -130,6 +138,10 @@ type Message struct {
Content string `json:"content"`
Images []ImageData `json:"images,omitempty"`
ToolCalls []ToolCall `json:"tool_calls,omitempty"`
// ThinkingBlock contains the text that was inside <think> tags in the
// original model output when ChatRequest.Thinking was enabled.
ThinkingBlock string `json:"thinkingBlock,omitempty"`
}
func (m *Message) UnmarshalJSON(b []byte) error {
@ -275,6 +287,8 @@ type Options struct {
MirostatTau float32 `json:"mirostat_tau,omitempty"`
MirostatEta float32 `json:"mirostat_eta,omitempty"`
Stop []string `json:"stop,omitempty"`
Thinking bool `json:"thinking,omitempty"`
}
// Runner options which must be set when the model is loaded into memory