Commit graph

17 commits

Author SHA1 Message Date
Devon Rifkin
77f4594e80 WIP thinking API support
- Allows specifying whether thinking mode should be on or not
- Templates get passed a new option so, e.g., qwen3's template can put
  `/think` or `/no_think` in the system prompt depending on the value of
  the setting
- Add parsing for thinking blocks in both streaming/non-streaming mode
- Update the CLI to make use of these changes

TODO:

- [ ] Don't parse thinking blocks when the user doesn't explicitly set
      the option, to maintain backwards compatibility
- [ ] Warning on CLI when using a non-thinking/older version of a model
      (with an old template)
- [ ] Wire up capabilities fully
- [x] Unify parsing for streaming/non-streaming
- [ ] Update templates
- [ ] Update python/js libraries
- [ ] How to handle differences in models wrt defaults and whether or
      not the thinking ability can even be controlled. If not specified
      by the user, should there be a default or should the template be
      able to check if it was explicitly set?
2025-05-07 16:15:46 -07:00
Jesse Gross
900f64e6be prompt: Don't trim whitespace from prompts
New lines can be an important part of a user's prompt and trimming
it can alter the results. We previously only trimmed prompts with
images but refactoring brought this behavior to all prompts, where
it became more noticable.

The /generate endpoint adds less whitespace and therefore doesn't
need to trim it out - this brings the same behavior to /chat.

Thanks to @gabe-l-hart for spotting the issue!

Fixes #7795
2024-12-09 11:02:55 -08:00
Jeffrey Morgan
8b4b243f5f
server: fix warnings in prompt_test.go (#7710) 2024-11-17 13:01:04 -08:00
Jesse Gross
c826e57475 runner.go: Better abstract vision model integration
-Update mllama to take the cross attention state as embeddings in
a batch, more similar to how Llava handles it. This improves
integration with the input cache.
-Pass locations in a prompt for embeddings using tags similar to Llava.
-Abstract interface to vision models so the main runner accesses Clip
and Mllama similarly

Co-authored-by: Michael Yang <mxyng@pm.me>
2024-10-30 14:53:43 -07:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
4a565cbf94 add chat and generate tests with mock runner 2024-07-16 09:39:31 -07:00
Michael Yang
d02bbebb11 tools 2024-07-15 15:26:16 -07:00
Michael Yang
22c5451fc2
fix system prompt (#5662)
* fix system prompt

* execute template when hitting previous roles

* fix tests

---------

Co-authored-by: jmorganca <jmorganca@gmail.com>
2024-07-12 21:04:44 -07:00
Michael Yang
41be28096a add system prompt to first legacy template 2024-07-10 17:03:08 -07:00
Michael Yang
2c3fe1fd97 comments 2024-07-05 13:17:24 -07:00
Michael Yang
269ed6e6a2 update message processing 2024-07-05 13:16:58 -07:00
Michael Yang
58e3fff311 rename templates to template 2024-07-01 10:40:54 -07:00
Patrick Devine
1b272d5bcd
change github.com/jmorganca/ollama to github.com/ollama/ollama (#3347) 2024-03-26 13:04:17 -07:00
Michael Yang
0e19476b56
prepend image tags (#2789)
instead of appending image tags, prepend them - this generally produces better results
2024-02-29 11:30:14 -08:00
Bruce MacDonald
88622847c6
fix: chat system prompting overrides (#2542) 2024-02-16 14:42:43 -05:00
Jeffrey Morgan
48a273f80b
Fix issues with templating prompt in chat mode (#2460) 2024-02-12 15:06:57 -08:00