ollama/convert
Michael Yang 58245413f4
next ollama runner (#7913)
feat: add new Ollama engine using ggml through cgo

This change introduces a new way to run pretrained models. It introduces 3 high level interfaces and a bunch of smaller helper interfaces to facilitate this.

- `model.Model` defines the interface for a model architecture. Models such as `llama` and `mllama`, which are provided as examples, can implement the model's forward propagation in the `Forward` method. This method will be called to generate completions. This interface can be found in `model/model.go`
- `ml.Backend` defines the interface for a backend tensor library, in this case `ggml`. Among other things, a Backend is responsible for loading a pretrained model into hardware (GPU, CPU, etc) and providing an interface for Models to access loaded tensors. This interface can be found in `ml/backend.go`
- `ml.Tensor` defines the interface for a tensor and tensor operations

This is the first implementation of the new engine. Follow up PRs will implement more features:

- non-greedy sampling (#8410)
- integration with Ollama and KV caching (#8301)
- more model support (#9080) with more coming soon

Co-authored-by: Bruce MacDonald <brucewmacdonald@gmail.com>
2025-02-13 16:31:21 -08:00
..
sentencepiece all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
testdata convert: import support for command-r models from safetensors (#6063) 2025-01-15 16:31:22 -08:00
convert.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_bert.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_commandr.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_gemma.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_gemma2.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_gemma2_adapter.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_llama.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_llama_adapter.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_mixtral.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_phi3.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_qwen2.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
convert_test.go next ollama runner (#7913) 2025-02-13 16:31:21 -08:00
fs.go lint 2024-08-01 17:06:06 -07:00
reader.go convert safetensor adapters into GGUF (#6327) 2024-08-23 11:29:56 -07:00
reader_safetensors.go Fix gemma2 2b conversion (#6645) 2024-09-05 17:02:28 -07:00
reader_torch.go convert gemma2 2024-08-20 17:27:51 -07:00
sentencepiece_model.proto all: fix typos in documentation, code, and comments (#7021) 2024-12-10 12:58:06 -08:00
tokenizer.go convert: qwen2 from safetensors (#8408) 2025-01-14 10:34:37 -08:00
tokenizer_spm.go convert gemma2 2024-08-20 17:27:51 -07:00
tokenizer_test.go fix unmarshaling merges 2024-12-04 09:21:56 -08:00