Tighten up memory prediction logging

Prior to this change, we logged the memory prediction multiple times
as the scheduler iterates to find a suitable configuration, which can be
confusing since only the last log before the server starts is actually valid.
This now logs once just before starting the server on the final configuration.
It also reports what library instead of always saying "offloading to gpu" when
using CPU.
This commit is contained in:
Daniel Hiltgen 2024-06-17 18:39:48 -07:00
parent c9c8c98bf6
commit 7784ca33ce
2 changed files with 66 additions and 44 deletions

View file

@ -116,6 +116,8 @@ func NewLlamaServer(gpus gpu.GpuInfoList, model string, ggml *GGML, adapters, pr
}
}
estimate.log()
// Loop through potential servers
finalErr := errors.New("no suitable llama servers found")