Commit graph

71 commits

Author SHA1 Message Date
Bruce MacDonald
6bd0a983cd model: support for mistral-small in the ollama runner
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00
Bruce MacDonald
9876c9faa4
chore(all): replace instances of interface with any (#10067)
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
2025-04-02 09:44:27 -07:00
Bruce MacDonald
61a8825216
convert: return name of unsupported architecture (#9862)
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
2025-03-18 10:38:28 -07:00
Patrick Devine
80c7ce381b
fix: change default context size for gemma3 (#9744) 2025-03-13 13:59:19 -07:00
jmorganca
83f0ec8269 all: address linter errors 2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c use 2d pooling 2025-03-11 14:49:20 -07:00
Patrick Devine
2e54d72fc3 fix gemma3 1b conversion 2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549 compat with upstream gguf 2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0 skip repacking vision tensors 2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69 fix configs 2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4 update model 2025-03-11 14:49:19 -07:00
Patrick Devine
c62861f4fa fix conversion 2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436 set non-causal attention 2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9 temporary work around for converting spm 2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc add gemma vision encoder 2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47 gemma2 impl 2025-03-11 14:35:08 -07:00
Michael Yang
58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors (#6063)
---------

Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors (#8408)
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Stefan Weil
abfdc4710f
all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
Michael Yang
4456012956 fix unmarshaling merges 2024-12-04 09:21:56 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 (#6963)
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly (#6714) 2024-09-09 17:18:54 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion (#6645) 2024-09-05 17:02:28 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes (#6538) 2024-08-27 17:54:04 -07:00
Michael Yang
60e47573a6 more tokenizer tests 2024-08-27 14:51:10 -07:00
Michael Yang
eae3af6807 clean up convert tokenizer 2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8 detect chat template from configs that contain lists 2024-08-27 10:49:33 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
Michael Yang
77903ab8b4 llama3.1 2024-08-21 11:49:31 -07:00
Michael Yang
3546bbd08c convert gemma2 2024-08-20 17:27:51 -07:00
Michael Yang
5a28b9cf5f bert 2024-08-20 17:27:34 -07:00
Bruce MacDonald
aec77d6a05 support new "longrope" attention factor 2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017 add conversion for microsoft phi 3 mini/medium 4k, 128 2024-08-12 15:13:29 -07:00
Michael Yang
b732beba6a lint 2024-08-01 17:06:06 -07:00
Michael Yang
d8e2664c33 convert: fix parse functions 2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb convert: only extract large files 2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576 Update convert/reader_safetensors.go
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b comments 2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b refactor convert 2024-07-31 15:58:33 -07:00
Michael Yang
6b252918fb update convert test to check result data 2024-07-31 10:59:38 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim for mistral (#5818) 2024-07-22 16:16:22 -04:00
Michael Yang
e40145a39d lint 2024-06-04 11:13:30 -07:00
Michael Yang
c895a7d13f some gocritic 2024-06-04 11:13:30 -07:00
Michael Yang
55f6eba049 gofmt 2024-06-04 11:13:30 -07:00
Ikko Eltociear Ashimine
955c317cab
chore: update tokenizer.go (#4571)
PreTokenziers -> PreTokenizers
2024-05-22 00:25:23 -07:00
Michael Yang
171eb040fc simplify safetensors reading 2024-05-21 11:28:22 -07:00
Michael Yang
3591bbe56f add test 2024-05-21 11:28:22 -07:00
Michael Yang
34d5ef29b3 fix conversion for f16 or f32 inputs 2024-05-21 11:28:22 -07:00