ollama/llm
Devon Rifkin 7c94471d38 ggml: more accurate estimates for head count array case
Also standardized the approach by always treatting `HeadCount()` and
`HeadCountKV()` as arrays by filling them with the same value when
they're a scalar in the original GGUF
2025-04-10 16:28:34 -07:00
..
llm_darwin.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_linux.go Optimize container images for startup (#6547) 2024-09-12 12:10:30 -07:00
llm_windows.go runner: Set windows above normal priority (#6905) 2024-09-21 16:54:49 -07:00
memory.go ggml: more accurate estimates for head count array case 2025-04-10 16:28:34 -07:00
memory_test.go ggml: Support heterogeneous KV cache layer sizes in memory estimation 2025-03-26 13:16:03 -07:00
server.go llm: set done reason at server level (#9830) 2025-04-03 10:19:24 -07:00
server_test.go llm: do not error on "null" format (#8139) 2024-12-17 09:49:37 -08:00
status.go Improve crash reporting (#7728) 2024-11-19 16:26:57 -08:00