ollama

mirror of https://github.com/ollama/ollama.git synced 2025-05-11 18:36:41 +02:00

History

Jesse Gross 5c5535c064 models: Prune unused outputs earlier in the forward pass Currently Rows is called as the last step in a model computation to get the values for the output tokens. However, if we move it earlier in the process then we can trim out computations that never get used. This is similar to how models are defined in llama.cpp. Changing the model definition in this way improves token generation performance by approximately 8%.		2025-02-20 14:49:47 -08:00
..
imageproc	imageproc mllama refactor (#7537 )	2024-12-14 19:50:15 -08:00
models	models: Prune unused outputs earlier in the forward pass	2025-02-20 14:49:47 -08:00
testdata	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00
model.go	ollamarunner: Pass runner performance parameters to backends	2025-02-20 13:27:57 -08:00
model_test.go	Runner for Ollama engine	2025-02-13 17:09:26 -08:00
process_text.go	vocab: Use int32 for special tokens	2025-02-13 17:09:26 -08:00
process_text_test.go	next ollama runner (#7913 )	2025-02-13 16:31:21 -08:00