ollama

mirror of https://github.com/ollama/ollama.git synced 2025-05-10 18:06:33 +02:00

History

Jesse Gross dbb149e6f7 ollamarunner: Preallocate worst case graph at startup Currently, the KV cache and graph are lazily allocated as needed. The cache is fully allocated on first use of the corresponding layer whereas the graph grows with the size of the context. This can be an issue if another application allocates more VRAM after we do our calculations - Ollama will crash in the middle of inference. If we instead allocate the maximum needed memory at startup of the runner, we will either succeed or fail at that point rather than at some surprising time in the future. Currently, this only generates a worst case batch for text, which means that vision models may get a partial allocation and continue to lazily allocate the rest.		2025-04-08 10:01:28 -07:00
..
imageproc	imageproc mllama refactor (#7537 )	2024-12-14 19:50:15 -08:00
input	model: Pass input tensor instead of raw data to models	2025-03-20 13:28:13 -07:00
models	model: support for mistral-small in the ollama runner	2025-04-03 16:57:36 -07:00
testdata	gemma2 impl	2025-03-11 14:35:08 -07:00
model.go	ollamarunner: Preallocate worst case graph at startup	2025-04-08 10:01:28 -07:00
model_test.go	fs: move ml.Config to fs package	2025-04-03 13:12:24 -07:00
process_text.go	model: support for mistral-small in the ollama runner	2025-04-03 16:57:36 -07:00
process_text_spm.go	model: fix issues with spm tokenizer for Gemma 3 (#10081 )	2025-04-02 13:22:56 -07:00
process_text_spm_test.go	model: fix issues with spm tokenizer for Gemma 3 (#10081 )	2025-04-02 13:22:56 -07:00
process_text_test.go	model: Don't unconditionally add special tokens	2025-03-06 16:54:16 -08:00