mirror of
https://github.com/ollama/ollama.git
synced 2025-05-11 18:36:41 +02:00
llm: speed up gguf decoding by a lot (#5246)
Previously, some costly things were causing the loading of GGUF files and their metadata and tensor information to be VERY slow: * Too many allocations when decoding strings * Hitting disk for each read of each key and value, resulting in a not-okay amount of syscalls/disk I/O. The show API is now down to 33ms from 800ms+ for llama3 on a macbook pro m3. This commit also prevents collecting large arrays of values when decoding GGUFs (if desired). When such keys are encountered, their values are null, and are encoded as such in JSON. Also, this fixes a broken test that was not encoding valid GGUF.
This commit is contained in:
parent
2aa91a937b
commit
cb42e607c5
13 changed files with 263 additions and 69 deletions
|
@ -144,7 +144,7 @@ func (s *Scheduler) processPending(ctx context.Context) {
|
|||
}
|
||||
|
||||
// Load model for fitting
|
||||
ggml, err := llm.LoadModel(pending.model.ModelPath)
|
||||
ggml, err := llm.LoadModel(pending.model.ModelPath, 0)
|
||||
if err != nil {
|
||||
pending.errCh <- err
|
||||
break
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue