Michael Yang
7ba9fa9c7d
fixes for maverick
2025-04-25 16:59:20 -07:00
Michael Yang
8bf11b84c1
chunked attention
2025-04-25 16:59:20 -07:00
Michael Yang
f0c66e6dea
llama4
2025-04-25 16:59:20 -07:00
Michael Yang
dc1e81f027
convert: use -1 for read all
2025-04-25 16:59:01 -07:00
Michael Yang
4892872c18
convert: change to colmajor
2025-04-25 15:27:39 -07:00
Michael Yang
2fec73eef6
fix write gguf padding
2025-04-16 10:24:35 -07:00
Bruce MacDonald
6bd0a983cd
model: support for mistral-small in the ollama runner
...
Mistral is a popular research lab making open source models. This updates
the forward pass of llama architecture models to support both llama models
and mistral models by accounting for additional metadata present in mistral
models, and finding the correct dimensions for the output projection.
2025-04-03 16:57:36 -07:00
Bruce MacDonald
9876c9faa4
chore(all): replace instances of interface with any ( #10067 )
...
Both interface{} and any (which is just an alias for interface{} introduced in Go 1.18) represent the empty interface that all types satisfy.
2025-04-02 09:44:27 -07:00
Bruce MacDonald
61a8825216
convert: return name of unsupported architecture ( #9862 )
...
When a model's architecture cannot be converted return the name of the unsupported arch in the error message.
2025-03-18 10:38:28 -07:00
Patrick Devine
80c7ce381b
fix: change default context size for gemma3 ( #9744 )
2025-03-13 13:59:19 -07:00
jmorganca
83f0ec8269
all: address linter errors
2025-03-11 14:49:20 -07:00
Michael Yang
63a394068c
use 2d pooling
2025-03-11 14:49:20 -07:00
Patrick Devine
2e54d72fc3
fix gemma3 1b conversion
2025-03-11 14:49:19 -07:00
Michael Yang
6b32a2d549
compat with upstream gguf
2025-03-11 14:49:19 -07:00
Michael Yang
d368c039f0
skip repacking vision tensors
2025-03-11 14:49:19 -07:00
Patrick Devine
9b54267e69
fix configs
2025-03-11 14:49:19 -07:00
Michael Yang
46bb0169c4
update model
2025-03-11 14:49:19 -07:00
Patrick Devine
c62861f4fa
fix conversion
2025-03-11 14:49:18 -07:00
Michael Yang
0df1800436
set non-causal attention
2025-03-11 14:49:18 -07:00
Patrick Devine
631fecc6d9
temporary work around for converting spm
2025-03-11 14:49:18 -07:00
Michael Yang
4b037a97dc
add gemma vision encoder
2025-03-11 14:49:17 -07:00
Patrick Devine
5f74d1fd47
gemma2 impl
2025-03-11 14:35:08 -07:00
Michael Yang
58245413f4
next ollama runner ( #7913 )
...
feat: add new Ollama engine using ggml through cgo
This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.
- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations
This is the first implementation of the new engine. Follow up PRs will implement more features:
- non-greedy sampling (#8410 )
- integration with Ollama and KV caching (#8301 )
- more model support (#9080 ) with more coming soon
Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
Josh
93a8daf285
convert: import support for command-r models from safetensors ( #6063 )
...
---------
Co-authored-by: Patrick Devine <patrick@infrahq.com>
2025-01-15 16:31:22 -08:00
Bruce MacDonald
f6f3713001
convert: qwen2 from safetensors ( #8408 )
...
Add native support for converting Qwen2 family models (including Qwen2.5)
from safetensors to gguf format so we can run it.
2025-01-14 10:34:37 -08:00
Stefan Weil
abfdc4710f
all: fix typos in documentation, code, and comments ( #7021 )
2024-12-10 12:58:06 -08:00
Michael Yang
4456012956
fix unmarshaling merges
2024-12-04 09:21:56 -08:00
Patrick Devine
c7cb0f0602
image processing for llama3.2 ( #6963 )
...
Co-authored-by: jmorganca <jmorganca@gmail.com>
Co-authored-by: Michael Yang <mxyng@pm.me>
Co-authored-by: Jesse Gross <jesse@ollama.com>
2024-10-18 16:12:35 -07:00
Patrick Devine
84b84ce2db
catch when model vocab size is set correctly ( #6714 )
2024-09-09 17:18:54 -07:00
Patrick Devine
608e87bf87
Fix gemma2 2b conversion ( #6645 )
2024-09-05 17:02:28 -07:00
Michael Yang
9cfd2dd3e3
Merge pull request #6522 from ollama/mxyng/detect-chat
...
detect chat template from configs that contain lists
2024-08-28 11:04:18 -07:00
Patrick Devine
6c1c1ad6a9
throw an error when encountering unsupport tensor sizes ( #6538 )
2024-08-27 17:54:04 -07:00
Michael Yang
60e47573a6
more tokenizer tests
2024-08-27 14:51:10 -07:00
Michael Yang
eae3af6807
clean up convert tokenizer
2024-08-27 11:11:43 -07:00
Michael Yang
3eb08377f8
detect chat template from configs that contain lists
2024-08-27 10:49:33 -07:00
Patrick Devine
0c819e167b
convert safetensor adapters into GGUF ( #6327 )
2024-08-23 11:29:56 -07:00
Michael Yang
77903ab8b4
llama3.1
2024-08-21 11:49:31 -07:00
Michael Yang
3546bbd08c
convert gemma2
2024-08-20 17:27:51 -07:00
Michael Yang
5a28b9cf5f
bert
2024-08-20 17:27:34 -07:00
Bruce MacDonald
aec77d6a05
support new "longrope" attention factor
2024-08-12 15:13:29 -07:00
Michael Yang
6ffb5cb017
add conversion for microsoft phi 3 mini/medium 4k, 128
2024-08-12 15:13:29 -07:00
Michael Yang
b732beba6a
lint
2024-08-01 17:06:06 -07:00
Michael Yang
d8e2664c33
convert: fix parse functions
2024-07-31 15:58:55 -07:00
Michael Yang
eafc607abb
convert: only extract large files
2024-07-31 15:58:55 -07:00
Michael Yang
781fc2d576
Update convert/reader_safetensors.go
...
Co-authored-by: Jeffrey Morgan <jmorganca@gmail.com>
2024-07-31 15:58:55 -07:00
Michael Yang
df993fa37b
comments
2024-07-31 15:58:55 -07:00
Michael Yang
5e9db9fb0b
refactor convert
2024-07-31 15:58:33 -07:00
Michael Yang
6b252918fb
update convert test to check result data
2024-07-31 10:59:38 -07:00
Jeffrey Morgan
d835368eb8
convert: capture head_dim
for mistral ( #5818 )
2024-07-22 16:16:22 -04:00
Michael Yang
e40145a39d
lint
2024-06-04 11:13:30 -07:00