ollama

mirror of https://github.com/ollama/ollama.git synced 2025-05-16 22:44:25 +02:00

History

Jesse Gross 08a832b482 llama: Ensure KV cache is fully defragmented. Sometimes the KV cache requires defragmentation even without triggering the threshold heuristic. In this case, decoding will not being able to find a KV cache slot. This is particularly difficult for the caller to handle if it happens in between ubatches. To avoid this, we should immediately trigger a defrag. In addition, a heavily fragmented cache can require more than max_moves to defragment. Currently, we stop when we hit the limit but this can leave a cache that still does not have adequate space even after defragmentation is triggered. Instead, we should do multiple batches of processing until everything is complete. Fixes #7949		2024-12-17 14:01:19 -08:00
..
0001-cuda.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0002-pretokenizer.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0003-embeddings.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0004-clip-unicode.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0005-solar-pro.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0006-conditional-fattn.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0007-blas.patch	llama: update vendored code to commit 40c6d79f (#7875 )	2024-12-10 19:21:34 -08:00
0008-add-mllama-support.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0009-add-unpad-operator.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0010-fix-deepseek-deseret-regex.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0011-relative-include-paths.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0012-Maintain-ordering-for-rules-for-grammar.patch	llama: enable JSON schema key ordering for generating grammars (#8055 )	2024-12-11 17:17:36 -08:00
0013-fix-missing-arg-in-static-assert-on-windows.patch	llama: update vendor code to commit ba1cb19c (#8101 )	2024-12-14 14:55:51 -08:00
0014-llama-Ensure-KV-cache-is-fully-defragmented.patch	llama: Ensure KV cache is fully defragmented.	2024-12-17 14:01:19 -08:00