documents/dev/ollama.md

Ollama

Mac

The best Ollama models for local coding in early 2026 are

  • Qwen2.5-Coder (7B–30B)
  • DeepSeek-Coder-V2 (16B/33B)
  • Codestral (22B)

Optimized for MLX on Apple Silicon preview ollama run qwen3.5:35b-a3b-coding-nvfp4

Cloud models allow users to run LLM high-paremeter models in the cloud, rather than locally. Enables accessing models (e.g., 120B+ params) without needing a powerful local GPU.

ollama run kimi-k2.5:cloud ollama run kimi-k2-thinking:cloud

Use Ollama in Cursor

  • Got to Cursor Settings, add custom model, enter exact name (e.g., qwen3.5:35b-a3b-coding-nvfp4)
  • Expand Override OpenAI Base URL
  • Change https://api.openai.com/v1 to https://ollama.dph.am/v1
    • needs a proxy to your localhost:11434
  • Disable other GPT models

API key

  • generate API key
    • openssl rand -base64 32
    • python3 -c "import secrets; print(secrets.token_urlsafe(32))"
  • set variable launchctl setenv OLLAMA_API_KEY "your_api_key"

OLLAMA_API_KEY=ZnrKFRFpTmaLFoLiGq1u9i4fP-gLNWKdKpurV3vJxMY launchctl setenv OLLAMA_API_KEY "ZnrKFRFpTmaLFoLiGq1u9i4fP-gLNWKdKpurV3vJxMY"

Test:

curl https://ollama.dph.am/api/generate
-d '{ "model": "qwen3.5:35b-a3b-coding-nvfp4", "prompt": "Why is the sky blue?", "stream": false }'

curl https://ollama.dph.am/api/generate
-H "Authorization: Bearer $OLLAMA_API_KEY"
-d '{ "model": "qwen3.5:35b-a3b-coding-nvfp4", "prompt": "Why is the sky blue?", "stream": false }'

Open Web UI

source ~/.venv-scriptsync/bin/activate

# serve
open-webui serve --port 3010

# install / upgrade
pip install open-webui -U

# change password
cd ~/.venv-scriptsync/lib/python3.11/site-packages/open_webui
sqlite3 data/webui.db
# https://gchq.github.io/CyberChef/#recipe=Bcrypt(10)&input=Y2Fubm9u
UPDATE auth SET password='bcrypt hash from cyberchef' WHERE email='dominick.pham@gmail.com';

General

https://ollama.com/library

My models

  • llama3.1:70b
  • llama3.2:3b
  • qwen2.5:14b
  • qwen2.5-coder:1.5b
# to serve
set OLLAMA_HOST=0.0.0.0
ollama serve

# to run in chat mode
ollama run qwen2.5-coder:1.5b

# default models directory
C:\Users\domin\.ollama\models

# to change, set system environment variable OLLAMA_MODELS to new path

Autocomplete

curl http://ollama.dph.am/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "system": "You are an AI specialized in providing autocompletion suggestions. When given a partial sentence or phrase, suggest concise and relevant completions in a list format.",
  "prompt": "Provide autocompletion suggestions for the following partial sentence: \"the quick brown fox\"",
  "stream": false,
  "format": "json",
  "options": {
    "temperature": 0.7,
    "max_tokens": 100,
    "top_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0
  }
}'

Other calls

https://github.com/ollama/ollama/blob/main/docs/api.md#generate-a-completion POST /api/chat - chat completion

curl http://ollama.dph.am/api/chat -d '{
  "model": "qwen2.5-coder:1.5b",
  "stream": false,
  "system": "You respond in Vietnamese",
  "messages": [
    {
      "role": "user",
      "content": "why is the sky blue?"
    },
    {
      "role": "assistant",
      "content": "due to rayleigh scattering."
    },
    {
      "role": "user",
      "content": "how is that different than mie scattering?"
    }
  ]
}'

POST /api/generate -

curl http://ollama.dph.am/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "system": "You are an AI specialized in providing autocompletion suggestions. When given a partial sentence or phrase, suggest concise and relevant completions in a list format.",
  "prompt": "Provide autocompletion suggestions for the following partial sentence: \"the quick brown fox\""
}'

For non streaming, it returns the full text. When format is set to json, the output will always be a well-formed JSON object. It's important to also instruct the model to respond in JSON.

curl http://ollama.dph.am/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "system": "talk like a 5 year old",
  "prompt": "Why is the sky blue? Respond using JSON",
  "format": "json",
  "stream": false
}'

Full options

curl http://localhost:11434/api/generate -d '{
  "model": "qwen2.5-coder:1.5b",
  "prompt": "Why is the sky blue?",
  "stream": false,
  "options": {
    "num_keep": 5,
    "seed": 42,
    "num_predict": 100,
    "top_k": 20,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 0.7,
    "repeat_last_n": 33,
    "temperature": 0.8,
    "repeat_penalty": 1.2,
    "presence_penalty": 1.5,
    "frequency_penalty": 1.0,
    "mirostat": 1,
    "mirostat_tau": 0.8,
    "mirostat_eta": 0.6,
    "penalize_newline": true,
    "stop": ["\n", "user:"],
    "numa": false,
    "num_ctx": 1024,
    "num_batch": 2,
    "num_gpu": 1,
    "main_gpu": 0,
    "low_vram": false,
    "vocab_only": false,
    "use_mmap": true,
    "use_mlock": false,
    "num_thread": 8
  }
}'