- Allows specifying whether thinking mode should be on or not
- Templates get passed a new option so, e.g., qwen3's template can put
`/think` or `/no_think` in the system prompt depending on the value of
the setting
- Add parsing for thinking blocks in both streaming/non-streaming mode
- Update the CLI to make use of these changes
TODO:
- [ ] Don't parse thinking blocks when the user doesn't explicitly set
the option, to maintain backwards compatibility
- [ ] Warning on CLI when using a non-thinking/older version of a model
(with an old template)
- [ ] Wire up capabilities fully
- [x] Unify parsing for streaming/non-streaming
- [ ] Update templates
- [ ] Update python/js libraries
- [ ] How to handle differences in models wrt defaults and whether or
not the thinking ability can even be controlled. If not specified
by the user, should there be a default or should the template be
able to check if it was explicitly set?