mirror of
https://github.com/ollama/ollama.git
synced 2025-05-11 18:36:41 +02:00
This enables the runner to report progress back to the Ollama server, both for showing status to the user and also to prevent the server from killing the runner if it thinks things have stalled. Most of the infrastructure was already there, this extends it to be available to the backends. |
||
---|---|---|
.. | ||
common | ||
llamarunner | ||
ollamarunner | ||
README.md | ||
runner.go |
runner
Note: this is a work in progress
A minimial runner for loading a model and running inference via a http web server.
./runner -model <model binary>
Completion
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion
Embeddings
curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding