Switch back to subprocessing for llama.cpp

This should resolve a number of memory leak and stability defects by allowing us to isolate llama.cpp in a separate process and shutdown when idle, and gracefully restart if it has problems. This also serves as a first step to be able to run multiple copies to support multiple models concurrently.
2025-05-11 18:36:41 +02:00 · 2024-03-14 10:24:13 -07:00 · 2024-03-14 10:24:13 -07:00 · 58d95cc9bd
commit 58d95cc9bd
parent 3b6a9154dd
35 changed files with 1416 additions and 1910 deletions
--- a/llm/llm_windows.go
+++ b/llm/llm_windows.go
@ -0,0 +1,6 @@
+package llm
+
+import "embed"
+
+//go:embed build/windows/*/*/bin/*
+var libEmbed embed.FS