ollama/runner
Jesse Gross 8e8f2c6d67 ollamarunner: Fix memory leak when processing images
The context (and therefore associated input tensors) was not being
properly closed when images were being processed. We were trying to
close them but in reality we were closing over an empty list, preventing
anything from actually being freed.

Fixes #10434
2025-05-01 15:15:24 -07:00
..
common Runner for Ollama engine 2025-02-13 17:09:26 -08:00
llamarunner llm: set done reason at server level (#9830) 2025-04-03 10:19:24 -07:00
ollamarunner ollamarunner: Fix memory leak when processing images 2025-05-01 15:15:24 -07:00
README.md Runner for Ollama engine 2025-02-13 17:09:26 -08:00
runner.go Runner for Ollama engine 2025-02-13 17:09:26 -08:00

runner

Note: this is a work in progress

A minimial runner for loading a model and running inference via a http web server.

./runner -model <model binary>

Completion

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion

Embeddings

curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding