From b901a712c6b0afe88aef7e5318f193d5b889cf34 Mon Sep 17 00:00:00 2001 From: Azis Alvriyanto Date: Sat, 8 Feb 2025 00:55:07 +0700 Subject: [PATCH] docs: improve syntax highlighting in code blocks (#8854) --- README.md | 44 ++++++++++++++-------------- api/examples/README.md | 3 +- app/README.md | 2 +- docs/api.md | 33 ++++++++++----------- docs/development.md | 20 ++++++------- docs/docker.md | 50 +++++++++++++++++--------------- docs/faq.md | 18 +++++++++--- docs/import.md | 4 +-- docs/linux.md | 2 +- docs/modelfile.md | 64 ++++++++++++++++++++++------------------- docs/openai.md | 13 +++++---- docs/troubleshooting.md | 11 ++++--- docs/windows.md | 1 + llama/README.md | 10 +++---- llama/runner/README.md | 6 ++-- macapp/README.md | 4 +-- 16 files changed, 158 insertions(+), 127 deletions(-) diff --git a/README.md b/README.md index 187d63626..959d4c613 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ Get up and running with large language models. ### Linux -``` +```shell curl -fsSL https://ollama.com/install.sh | sh ``` @@ -42,7 +42,7 @@ The official [Ollama Docker image](https://hub.docker.com/r/ollama/ollama) `olla To run and chat with [Llama 3.2](https://ollama.com/library/llama3.2): -``` +```shell ollama run llama3.2 ``` @@ -92,13 +92,13 @@ Ollama supports importing GGUF models in the Modelfile: 2. Create the model in Ollama - ``` + ```shell ollama create example -f Modelfile ``` 3. Run the model - ``` + ```shell ollama run example ``` @@ -110,7 +110,7 @@ See the [guide](docs/import.md) on importing models for more information. Models from the Ollama library can be customized with a prompt. For example, to customize the `llama3.2` model: -``` +```shell ollama pull llama3.2 ``` @@ -145,13 +145,13 @@ For more information on working with a Modelfile, see the [Modelfile](docs/model `ollama create` is used to create a model from a Modelfile. -``` +```shell ollama create mymodel -f ./Modelfile ``` ### Pull a model -``` +```shell ollama pull llama3.2 ``` @@ -159,13 +159,13 @@ ollama pull llama3.2 ### Remove a model -``` +```shell ollama rm llama3.2 ``` ### Copy a model -``` +```shell ollama cp llama3.2 my-model ``` @@ -184,37 +184,39 @@ I'm a basic program that prints the famous "Hello, world!" message to the consol ``` ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png" -The image features a yellow smiley face, which is likely the central focus of the picture. ``` +> **Output**: The image features a yellow smiley face, which is likely the central focus of the picture. + ### Pass the prompt as an argument +```shell +ollama run llama3.2 "Summarize this file: $(cat README.md)" ``` -$ ollama run llama3.2 "Summarize this file: $(cat README.md)" - Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. -``` + +> **Output**: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications. ### Show model information -``` +```shell ollama show llama3.2 ``` ### List models on your computer -``` +```shell ollama list ``` ### List which models are currently loaded -``` +```shell ollama ps ``` ### Stop a model which is currently running -``` +```shell ollama stop llama3.2 ``` @@ -230,13 +232,13 @@ See the [developer guide](https://github.com/ollama/ollama/blob/main/docs/develo Next, start the server: -``` +```shell ./ollama serve ``` Finally, in a separate shell, run a model: -``` +```shell ./ollama run llama3.2 ``` @@ -246,7 +248,7 @@ Ollama has a REST API for running and managing models. ### Generate a response -``` +```shell curl http://localhost:11434/api/generate -d '{ "model": "llama3.2", "prompt":"Why is the sky blue?" @@ -255,7 +257,7 @@ curl http://localhost:11434/api/generate -d '{ ### Chat with a model -``` +```shell curl http://localhost:11434/api/chat -d '{ "model": "llama3.2", "messages": [ diff --git a/api/examples/README.md b/api/examples/README.md index b5a8917fb..e83b53609 100644 --- a/api/examples/README.md +++ b/api/examples/README.md @@ -2,9 +2,10 @@ Run the examples in this directory with: -``` +```shell go run example_name/main.go ``` + ## Chat - Chat with a model - [chat/main.go](chat/main.go) diff --git a/app/README.md b/app/README.md index 883d7ab7f..433ee44e8 100644 --- a/app/README.md +++ b/app/README.md @@ -17,6 +17,6 @@ If you want to build the installer, youll need to install In the top directory of this repo, run the following powershell script to build the ollama CLI, ollama app, and ollama installer. -``` +```powershell powershell -ExecutionPolicy Bypass -File .\scripts\build_windows.ps1 ``` diff --git a/docs/api.md b/docs/api.md index 5d1b7d648..7de810496 100644 --- a/docs/api.md +++ b/docs/api.md @@ -31,7 +31,7 @@ Certain endpoints stream responses as JSON objects. Streaming can be disabled by ## Generate a completion -```shell +``` POST /api/generate ``` @@ -485,7 +485,7 @@ A single JSON object is returned: ## Generate a chat completion -```shell +``` POST /api/chat ``` @@ -878,6 +878,7 @@ curl http://localhost:11434/api/chat -d '{ ``` ##### Response + ```json { "model": "llama3.2", @@ -924,7 +925,7 @@ A single JSON object is returned: ## Create a Model -```shell +``` POST /api/create ``` @@ -1020,7 +1021,7 @@ curl http://localhost:11434/api/create -d '{ A stream of JSON objects is returned: -``` +```json {"status":"quantizing F16 model to Q4_K_M"} {"status":"creating new layer sha256:667b0c1932bc6ffc593ed1d03f895bf2dc8dc6df21db3042284a6f4416b06a29"} {"status":"using existing layer sha256:11ce4ee3e170f6adebac9a991c22e22ab3f8530e154ee669954c4bc73061c258"} @@ -1051,7 +1052,7 @@ curl http://localhost:11434/api/create -d '{ A stream of JSON objects is returned: -``` +```json {"status":"parsing GGUF"} {"status":"using existing layer sha256:432f310a77f4650a88d0fd59ecdd7cebed8d684bafea53cbff0473542964f0c3"} {"status":"writing manifest"} @@ -1118,7 +1119,7 @@ Return 200 OK if the blob exists, 404 Not Found if it does not. ## Push a Blob -```shell +``` POST /api/blobs/:digest ``` @@ -1142,7 +1143,7 @@ Return 201 Created if the blob was successfully created, 400 Bad Request if the ## List Local Models -```shell +``` GET /api/tags ``` @@ -1195,7 +1196,7 @@ A single JSON object will be returned. ## Show Model Information -```shell +``` POST /api/show ``` @@ -1261,7 +1262,7 @@ curl http://localhost:11434/api/show -d '{ ## Copy a Model -```shell +``` POST /api/copy ``` @@ -1284,7 +1285,7 @@ Returns a 200 OK if successful, or a 404 Not Found if the source model doesn't e ## Delete a Model -```shell +``` DELETE /api/delete ``` @@ -1310,7 +1311,7 @@ Returns a 200 OK if successful, 404 Not Found if the model to be deleted doesn't ## Pull a Model -```shell +``` POST /api/pull ``` @@ -1382,7 +1383,7 @@ if `stream` is set to false, then the response is a single JSON object: ## Push a Model -```shell +``` POST /api/push ``` @@ -1447,7 +1448,7 @@ If `stream` is set to `false`, then the response is a single JSON object: ## Generate Embeddings -```shell +``` POST /api/embed ``` @@ -1515,7 +1516,7 @@ curl http://localhost:11434/api/embed -d '{ ``` ## List Running Models -```shell +``` GET /api/ps ``` @@ -1562,7 +1563,7 @@ A single JSON object will be returned. > Note: this endpoint has been superseded by `/api/embed` -```shell +``` POST /api/embeddings ``` @@ -1602,7 +1603,7 @@ curl http://localhost:11434/api/embeddings -d '{ ## Version -```shell +``` GET /api/version ``` diff --git a/docs/development.md b/docs/development.md index 618e98e15..5a6463fcf 100644 --- a/docs/development.md +++ b/docs/development.md @@ -7,7 +7,7 @@ Install prerequisites: Then build and run Ollama from the root directory of the repository: -``` +```shell go run . serve ``` @@ -23,14 +23,14 @@ Install prerequisites: Then, configure and build the project: -``` +```shell cmake -B build cmake --build build ``` Lastly, run Ollama: -``` +```shell go run . serve ``` @@ -57,14 +57,14 @@ Install prerequisites: Then, configure and build the project: -``` +```shell cmake -B build cmake --build build --config Release ``` Lastly, run Ollama: -``` +```shell go run . serve ``` @@ -88,26 +88,26 @@ Install prerequisites: Then, configure and build the project: -``` +```shell cmake -B build cmake --build build ``` Lastly, run Ollama: -``` +```shell go run . serve ``` ## Docker -``` +```shell docker build . ``` ### ROCm -``` +```shell docker build --build-arg FLAVOR=rocm . ``` @@ -115,7 +115,7 @@ docker build --build-arg FLAVOR=rocm . To run tests, use `go test`: -``` +```shell go test ./... ``` diff --git a/docs/docker.md b/docs/docker.md index 9dd387e3a..dce090a27 100644 --- a/docs/docker.md +++ b/docs/docker.md @@ -2,7 +2,7 @@ ### CPU only -```bash +```shell docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ``` @@ -11,42 +11,46 @@ Install the [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud- #### Install with Apt 1. Configure the repository -```bash -curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ - | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg -curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ - | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ - | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list -sudo apt-get update -``` + + ```shell + curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey \ + | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg + curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list \ + | sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' \ + | sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list + sudo apt-get update + ``` + 2. Install the NVIDIA Container Toolkit packages -```bash -sudo apt-get install -y nvidia-container-toolkit -``` + + ```shell + sudo apt-get install -y nvidia-container-toolkit + ``` #### Install with Yum or Dnf 1. Configure the repository -```bash -curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \ - | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo -``` + ```shell + curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo \ + | sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo + ``` 2. Install the NVIDIA Container Toolkit packages -```bash -sudo yum install -y nvidia-container-toolkit -``` + ```shell + sudo yum install -y nvidia-container-toolkit + ``` #### Configure Docker to use Nvidia driver -``` + +```shell sudo nvidia-ctk runtime configure --runtime=docker sudo systemctl restart docker ``` #### Start the container -```bash +```shell docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama ``` @@ -57,7 +61,7 @@ docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ol To run Ollama using Docker with AMD GPUs, use the `rocm` tag and the following command: -``` +```shell docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama:rocm ``` @@ -65,7 +69,7 @@ docker run -d --device /dev/kfd --device /dev/dri -v ollama:/root/.ollama -p 114 Now you can run a model: -``` +```shell docker exec -it ollama ollama run llama3.2 ``` diff --git a/docs/faq.md b/docs/faq.md index b58798e2a..04e8433de 100644 --- a/docs/faq.md +++ b/docs/faq.md @@ -24,7 +24,7 @@ By default, Ollama uses a context window size of 2048 tokens. To change this when using `ollama run`, use `/set parameter`: -``` +```shell /set parameter num_ctx 4096 ``` @@ -46,10 +46,15 @@ Use the `ollama ps` command to see what models are currently loaded into memory. ```shell ollama ps -NAME ID SIZE PROCESSOR UNTIL -llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now ``` +> **Output**: +> +> ``` +> NAME ID SIZE PROCESSOR UNTIL +> llama3:70b bcfb190ca3a7 42 GB 100% GPU 4 minutes from now +> ``` + The `Processor` column will show which memory the model was loaded in to: * `100% GPU` means the model was loaded entirely into the GPU * `100% CPU` means the model was loaded entirely in system memory @@ -88,7 +93,7 @@ If Ollama is run as a systemd service, environment variables should be set using 4. Reload `systemd` and restart Ollama: - ```bash + ```shell systemctl daemon-reload systemctl restart ollama ``` @@ -221,16 +226,19 @@ properties. If you are using the API you can preload a model by sending the Ollama server an empty request. This works with both the `/api/generate` and `/api/chat` API endpoints. To preload the mistral model using the generate endpoint, use: + ```shell curl http://localhost:11434/api/generate -d '{"model": "mistral"}' ``` To use the chat completions endpoint, use: + ```shell curl http://localhost:11434/api/chat -d '{"model": "mistral"}' ``` To preload a model using the CLI, use the command: + ```shell ollama run llama3.2 "" ``` @@ -250,11 +258,13 @@ If you're using the API, use the `keep_alive` parameter with the `/api/generate` * '0' which will unload the model immediately after generating a response For example, to preload a model and leave it in memory use: + ```shell curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": -1}' ``` To unload the model and free up memory use: + ```shell curl http://localhost:11434/api/generate -d '{"model": "llama3.2", "keep_alive": 0}' ``` diff --git a/docs/import.md b/docs/import.md index 040fa299e..01fea5426 100644 --- a/docs/import.md +++ b/docs/import.md @@ -20,13 +20,13 @@ Make sure that you use the same base model in the `FROM` command as you used to Now run `ollama create` from the directory where the `Modelfile` was created: -```bash +```shell ollama create my-model ``` Lastly, test the model: -```bash +```shell ollama run my-model ``` diff --git a/docs/linux.md b/docs/linux.md index a5b6dd915..12581bdd2 100644 --- a/docs/linux.md +++ b/docs/linux.md @@ -119,7 +119,7 @@ sudo systemctl status ollama To customize the installation of Ollama, you can edit the systemd service file or the environment variables by running: -``` +```shell sudo systemctl edit ollama ``` diff --git a/docs/modelfile.md b/docs/modelfile.md index cc2115b3c..a71183f40 100644 --- a/docs/modelfile.md +++ b/docs/modelfile.md @@ -28,7 +28,7 @@ A model file is the blueprint to create and share models with Ollama. The format of the `Modelfile`: -```modelfile +``` # comment INSTRUCTION arguments ``` @@ -49,7 +49,7 @@ INSTRUCTION arguments An example of a `Modelfile` creating a mario blueprint: -```modelfile +``` FROM llama3.2 # sets the temperature to 1 [higher is more creative, lower is more coherent] PARAMETER temperature 1 @@ -69,24 +69,30 @@ To use this: To view the Modelfile of a given model, use the `ollama show --modelfile` command. - ```bash - > ollama show --modelfile llama3.2 - # Modelfile generated by "ollama show" - # To build a new Modelfile based on this one, replace the FROM line with: - # FROM llama3.2:latest - FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 - TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> +```shell +ollama show --modelfile llama3.2 +``` - {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> +> **Output**: +> +> ``` +> # Modelfile generated by "ollama show" +> # To build a new Modelfile based on this one, replace the FROM line with: +> # FROM llama3.2:latest +> FROM /Users/pdevine/.ollama/models/blobs/sha256-00e1317cbf74d901080d7100f57580ba8dd8de57203072dc6f668324ba545f29 +> TEMPLATE """{{ if .System }}<|start_header_id|>system<|end_header_id|> +> +> {{ .System }}<|eot_id|>{{ end }}{{ if .Prompt }}<|start_header_id|>user<|end_header_id|> +> +> {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> +> +> {{ .Response }}<|eot_id|>""" +> PARAMETER stop "<|start_header_id|>" +> PARAMETER stop "<|end_header_id|>" +> PARAMETER stop "<|eot_id|>" +> PARAMETER stop "<|reserved_special_token" +> ``` - {{ .Prompt }}<|eot_id|>{{ end }}<|start_header_id|>assistant<|end_header_id|> - - {{ .Response }}<|eot_id|>""" - PARAMETER stop "<|start_header_id|>" - PARAMETER stop "<|end_header_id|>" - PARAMETER stop "<|eot_id|>" - PARAMETER stop "<|reserved_special_token" - ``` ## Instructions @@ -94,13 +100,13 @@ To view the Modelfile of a given model, use the `ollama show --modelfile` comman The `FROM` instruction defines the base model to use when creating a model. -```modelfile +``` FROM : ``` #### Build from existing model -```modelfile +``` FROM llama3.2 ``` @@ -111,7 +117,7 @@ Additional models can be found at: #### Build from a Safetensors model -```modelfile +``` FROM ``` @@ -125,7 +131,7 @@ Currently supported model architectures: #### Build from a GGUF file -```modelfile +``` FROM ./ollama-model.gguf ``` @@ -136,7 +142,7 @@ The GGUF file location should be specified as an absolute path or relative to th The `PARAMETER` instruction defines a parameter that can be set when the model is run. -```modelfile +``` PARAMETER ``` @@ -183,7 +189,7 @@ TEMPLATE """{{ if .System }}<|im_start|>system The `SYSTEM` instruction specifies the system message to be used in the template, if applicable. -```modelfile +``` SYSTEM """""" ``` @@ -193,7 +199,7 @@ The `ADAPTER` instruction specifies a fine tuned LoRA adapter that should apply #### Safetensor adapter -```modelfile +``` ADAPTER ``` @@ -204,7 +210,7 @@ Currently supported Safetensor adapters: #### GGUF adapter -```modelfile +``` ADAPTER ./ollama-lora.gguf ``` @@ -212,7 +218,7 @@ ADAPTER ./ollama-lora.gguf The `LICENSE` instruction allows you to specify the legal license under which the model used with this Modelfile is shared or distributed. -```modelfile +``` LICENSE """ """ @@ -222,7 +228,7 @@ LICENSE """ The `MESSAGE` instruction allows you to specify a message history for the model to use when responding. Use multiple iterations of the MESSAGE command to build up a conversation which will guide the model to answer in a similar way. -```modelfile +``` MESSAGE ``` @@ -237,7 +243,7 @@ MESSAGE #### Example conversation -```modelfile +``` MESSAGE user Is Toronto in Canada? MESSAGE assistant yes MESSAGE user Is Sacramento in Canada? diff --git a/docs/openai.md b/docs/openai.md index b0f9b353c..d0bac4cd3 100644 --- a/docs/openai.md +++ b/docs/openai.md @@ -1,6 +1,7 @@ # OpenAI compatibility -> **Note:** OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md). +> [!NOTE] +> OpenAI compatibility is experimental and is subject to major adjustments including breaking changes. For fully-featured access to the Ollama API, see the Ollama [Python library](https://github.com/ollama/ollama-python), [JavaScript library](https://github.com/ollama/ollama-js) and [REST API](https://github.com/ollama/ollama/blob/main/docs/api.md). Ollama provides experimental compatibility with parts of the [OpenAI API](https://platform.openai.com/docs/api-reference) to help connect existing applications to Ollama. @@ -59,8 +60,10 @@ embeddings = client.embeddings.create( input=["why is the sky blue?", "why is the grass green?"], ) ``` + #### Structured outputs -```py + +```python from pydantic import BaseModel from openai import OpenAI @@ -144,7 +147,7 @@ const embedding = await openai.embeddings.create({ ### `curl` -``` shell +```shell curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ @@ -319,7 +322,7 @@ ollama pull llama3.2 For tooling that relies on default OpenAI model names such as `gpt-3.5-turbo`, use `ollama cp` to copy an existing model name to a temporary name: -``` +```shell ollama cp llama3.2 gpt-3.5-turbo ``` @@ -343,7 +346,7 @@ curl http://localhost:11434/v1/chat/completions \ The OpenAI API does not have a way of setting the context size for a model. If you need to change the context size, create a `Modelfile` which looks like: -```modelfile +``` FROM PARAMETER num_ctx ``` diff --git a/docs/troubleshooting.md b/docs/troubleshooting.md index 28f4350aa..7ef1618e0 100644 --- a/docs/troubleshooting.md +++ b/docs/troubleshooting.md @@ -17,6 +17,7 @@ When you run Ollama in a **container**, the logs go to stdout/stderr in the cont ```shell docker logs ``` + (Use `docker ps` to find the container name) If manually running `ollama serve` in a terminal, the logs will be on that terminal. @@ -28,6 +29,7 @@ When you run Ollama on **Windows**, there are a few different locations. You can - `explorer %TEMP%` where temporary executable files are stored in one or more `ollama*` directories To enable additional debug logging to help troubleshoot problems, first **Quit the running app from the tray menu** then in a powershell terminal + ```powershell $env:OLLAMA_DEBUG="1" & "ollama app.exe" @@ -49,12 +51,13 @@ Dynamic LLM libraries [rocm_v6 cpu cpu_avx cpu_avx2 cuda_v11 rocm_v5] You can set OLLAMA_LLM_LIBRARY to any of the available LLM libraries to bypass autodetection, so for example, if you have a CUDA card, but want to force the CPU LLM library with AVX2 vector support, use: -``` +```shell OLLAMA_LLM_LIBRARY="cpu_avx2" ollama serve ``` You can see what features your CPU has with the following. -``` + +```shell cat /proc/cpuinfo| grep flags | head -1 ``` @@ -62,8 +65,8 @@ cat /proc/cpuinfo| grep flags | head -1 If you run into problems on Linux and want to install an older version, or you'd like to try out a pre-release before it's officially released, you can tell the install script which version to install. -```sh -curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION="0.1.29" sh +```shell +curl -fsSL https://ollama.com/install.sh | OLLAMA_VERSION=0.5.7 sh ``` ## Linux tmp noexec diff --git a/docs/windows.md b/docs/windows.md index 80bebed47..2a0d08d92 100644 --- a/docs/windows.md +++ b/docs/windows.md @@ -47,6 +47,7 @@ If Ollama is already running, Quit the tray application and relaunch it from the ## API Access Here's a quick example showing API access from `powershell` + ```powershell (Invoke-WebRequest -method POST -Body '{"model":"llama3.2", "prompt":"Why is the sky blue?", "stream": false}' -uri http://localhost:11434/api/generate ).Content | ConvertFrom-json ``` diff --git a/llama/README.md b/llama/README.md index f6a397258..bfe66a8b4 100644 --- a/llama/README.md +++ b/llama/README.md @@ -8,7 +8,7 @@ Ollama vendors [llama.cpp](https://github.com/ggerganov/llama.cpp/) and [ggml](h If you update the vendoring code, start by running the following command to establish the tracking llama.cpp repo in the `./vendor/` directory. -``` +```shell make -f Makefile.sync apply-patches ``` @@ -22,7 +22,7 @@ When updating to a newer base commit, the existing patches may not apply cleanly Start by applying the patches. If any of the patches have conflicts, the `git am` will stop at the first failure. -``` +```shell make -f Makefile.sync apply-patches ``` @@ -30,7 +30,7 @@ If there are conflicts, you will see an error message. Resolve the conflicts in Once all patches are applied, commit the changes to the tracking repository. -``` +```shell make -f Makefile.sync format-patches sync ``` @@ -38,13 +38,13 @@ make -f Makefile.sync format-patches sync When working on new fixes or features that impact vendored code, use the following model. First get a clean tracking repo with all current patches applied: -``` +```shell make -f Makefile.sync clean apply-patches ``` Iterate until you're ready to submit PRs. Once your code is ready, commit a change in the `./vendor/` directory, then generate the patches for ollama with -``` +```shell make -f Makefile.sync format-patches ``` diff --git a/llama/runner/README.md b/llama/runner/README.md index 75f61682e..80ffda81f 100644 --- a/llama/runner/README.md +++ b/llama/runner/README.md @@ -4,18 +4,18 @@ A minimial runner for loading a model and running inference via a http web server. -``` +```shell ./runner -model ``` ### Completion -``` +```shell curl -X POST -H "Content-Type: application/json" -d '{"prompt": "hi"}' http://localhost:8080/completion ``` ### Embeddings -``` +```shell curl -X POST -H "Content-Type: application/json" -d '{"prompt": "turn me into an embedding"}' http://localhost:8080/embedding ``` diff --git a/macapp/README.md b/macapp/README.md index 8bde06e2e..bdaf05e79 100644 --- a/macapp/README.md +++ b/macapp/README.md @@ -6,14 +6,14 @@ This app builds upon Ollama to provide a desktop experience for running models. First, build the `ollama` binary: -``` +```shell cd .. go build . ``` Then run the desktop app with `npm start`: -``` +```shell cd macapp npm install npm start