ollama/llama/patches
Jesse Gross 08a832b482 llama: Ensure KV cache is fully defragmented.
Sometimes the KV cache requires defragmentation even without
triggering the threshold heuristic. In this case, decoding
will not being able to find a KV cache slot. This is particularly
difficult for the caller to handle if it happens in between
ubatches. To avoid this, we should immediately trigger a defrag.

In addition, a heavily fragmented cache can require more than
max_moves to defragment. Currently, we stop when we hit the limit
but this can leave a cache that still does not have adequate space
even after defragmentation is triggered. Instead, we should do
multiple batches of processing until everything is complete.

Fixes #7949
2024-12-17 14:01:19 -08:00
..
0001-cuda.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0002-pretokenizer.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0003-embeddings.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0004-clip-unicode.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0005-solar-pro.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0006-conditional-fattn.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0007-blas.patch llama: update vendored code to commit 40c6d79f (#7875) 2024-12-10 19:21:34 -08:00
0008-add-mllama-support.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0009-add-unpad-operator.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0010-fix-deepseek-deseret-regex.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0011-relative-include-paths.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0012-Maintain-ordering-for-rules-for-grammar.patch llama: enable JSON schema key ordering for generating grammars (#8055) 2024-12-11 17:17:36 -08:00
0013-fix-missing-arg-in-static-assert-on-windows.patch llama: update vendor code to commit ba1cb19c (#8101) 2024-12-14 14:55:51 -08:00
0014-llama-Ensure-KV-cache-is-fully-defragmented.patch llama: Ensure KV cache is fully defragmented. 2024-12-17 14:01:19 -08:00