Upload files to "/"

2026-04-05 05:20:44 +00:00
commit 6e852871a6
5 changed files with 2476 additions and 0 deletions
@@ -0,0 +1,238 @@
+# LLM Router Pipeline for Open WebUI
+
+An intelligent prompt classification and routing pipeline for [Open WebUI](https://github.com/open-webui/open-webui). Classifies user prompts using AI (qwen2.5:7b) and routes them to specialized Ollama models, with integrated Brave web search, image generation via Stable Diffusion, and full Finnish/English bilingual support.
+
+## Features
+
+- **AI-powered prompt classification** with keyword-based fallback
+- **Model routing** — coding, diagram, reasoning, vision, image generation, and general categories
+- **Brave web search** with full page content fetching (top 3 results scraped)
+- **Heuristic search overrides** — safety net that forces search for time-sensitive or factual questions
+- **Image generation** via AUTOMATIC1111/Forge (Stable Diffusion XL) with LLM-refined prompts
+- **VRAM management** — automatically juggles GPU memory between Ollama and Stable Diffusion
+- **Bilingual** — detects Finnish and forces responses in the correct language
+- **Thinking/reasoning display** — streams model thinking tokens in collapsible blocks
+- **Real-time search status** — shows which URLs are being fetched as search runs
+
+## Model Routing
+
+| Category | Model (120B) | Model (20B) | Trigger |
+|---|---|---|---|
+| coding | qwen2.5-coder:14b | qwen2.5-coder:14b | User asks to write/fix/debug code |
+| diagram | qwen2.5-coder:14b | qwen2.5-coder:14b | Mermaid, flowchart, UML requests |
+| reasoning (FI) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (Finnish) |
+| reasoning (EN) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (English) |
+| image generation | gpt-oss:120b + SDXL | gpt-oss:20b + SDXL | "generate an image", "luo kuva" |
+| vision | llama3.2-vision:11b | llama3.2-vision:11b | User uploads an image |
+| general | gpt-oss:120b | gpt-oss:20b | Everything else |
+
+Two pipeline variants are provided:
+- **`llm_router_v3.py`** — uses gpt-oss:120b (higher quality, more VRAM/RAM)
+- **`llm_router-20b.py`** — uses gpt-oss:20b (lighter, better for constrained hardware)
+
+## Prerequisites
+
+- **Ubuntu 22.04 LTS** (tested)
+- **NVIDIA GPU** with 16GB+ VRAM (tested on RTX 2000 Ada)
+- **Open WebUI** running in Docker with pipelines enabled
+- **Ollama** installed natively with models pulled:
+  ```bash
+  ollama pull qwen2.5:7b
+  ollama pull qwen2.5-coder:14b
+  ollama pull gpt-oss:120b    # or gpt-oss:20b for the lighter variant
+  ollama pull llama3.2-vision:11b
+  ```
+- **Brave Search API key** (free tier: https://brave.com/search/api/)
+
+## Setup
+
+### 1. Deploy the Pipeline
+
+Copy your chosen pipeline file to the Open WebUI pipelines directory:
+
+```bash
+cp llm_router_v3.py ~/ai-stack/pipelines/
+# or for the 20B variant:
+cp llm_router-20b.py ~/ai-stack/pipelines/
+```
+
+Restart the pipelines container:
+
+```bash
+docker restart pipelines
+```
+
+### 2. Configure Valves in Open WebUI
+
+Go to **Admin Panel > Pipelines** in Open WebUI and configure:
+
+| Setting | Description | Default |
+|---|---|---|
+| `ollama_url` | Ollama API URL | `http://ollama:11434` |
+| `sd_url` | Stable Diffusion API URL | `http://172.18.0.1:7860` |
+| `brave_api_key` | Brave Search API key | (from env `BRAVE_API_KEY`) |
+| `sd_width` / `sd_height` | Generated image dimensions | 1024 x 1024 |
+| `sd_steps` | Sampling steps | 25 |
+| `sd_cfg_scale` | CFG scale | 7.0 |
+| `brave_max_results` | Number of search results | 6 |
+| `use_ai_classifier` | Use AI vs keyword-only classification | true |
+| `show_routing_info` | Show routing banner in responses | true |
+| `search_context_max_chars` | Max search context size | 12000 |
+
+### 3. Set Up Stable Diffusion (Image Generation)
+
+> Skip this section if you don't need image generation.
+
+#### Install Forge (AUTOMATIC1111 fork)
+
+```bash
+# Install system dependencies
+sudo apt-get update
+sudo apt-get install -y git wget python3-venv python3-pip \
+    libgl1 libglib2.0-0 libsm6 libxrender1 libxext6 libffi-dev libssl-dev
+
+# Clone Forge
+git clone https://github.com/lllyasviel/stable-diffusion-webui-forge.git ~/stable-diffusion-webui
+cd ~/stable-diffusion-webui
+
+# Download SDXL model (~6.9GB)
+mkdir -p models/Stable-diffusion
+wget -O models/Stable-diffusion/sd_xl_base_1.0.safetensors \
+    "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"
+```
+
+#### Fix Python 3.10 build issues (Ubuntu 22.04)
+
+Before the first launch, pre-install CLIP dependencies to avoid build failures:
+
+```bash
+cd ~/stable-diffusion-webui
+# First launch creates the venv — run it once, let it fail, then fix:
+./webui.sh --api --listen --xformers --no-half-vae || true
+
+# Fix CLIP build issue
+venv/bin/pip install "setuptools<70" wheel
+venv/bin/pip install --no-build-isolation \
+    https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip
+
+# Launch again
+./webui.sh --api --listen --xformers --no-half-vae
+```
+
+#### Select SDXL model
+
+Once the UI is running, open it in a browser and select `sd_xl_base_1.0` from the checkpoint dropdown. Or via API:
+
+```bash
+curl -X POST http://localhost:7860/sdapi/v1/options \
+    -H "Content-Type: application/json" \
+    -d '{"sd_model_checkpoint": "sd_xl_base_1.0.safetensors"}'
+```
+
+#### Create a systemd service
+
+```bash
+chmod +x setup-sd-service.sh
+sudo ./setup-sd-service.sh
+```
+
+Or manually:
+
+```bash
+sudo tee /etc/systemd/system/stable-diffusion.service > /dev/null <<EOF
+[Unit]
+Description=AUTOMATIC1111 Stable Diffusion WebUI
+After=network.target
+
+[Service]
+Type=simple
+User=$USER
+WorkingDirectory=$HOME/stable-diffusion-webui
+ExecStart=$HOME/stable-diffusion-webui/webui.sh --api --listen --xformers --no-half-vae --medvram-sdxl
+Restart=on-failure
+RestartSec=10
+Environment=HOME=$HOME
+
+[Install]
+WantedBy=multi-user.target
+EOF
+
+sudo systemctl daemon-reload
+sudo systemctl enable --now stable-diffusion
+```
+
+#### Verify
+
+```bash
+curl -s http://localhost:7860/sdapi/v1/sd-models | python3 -m json.tool
+```
+
+### 4. Network Configuration
+
+The pipeline runs inside Open WebUI's Docker container and needs to reach:
+
+| Service | URL from container | Notes |
+|---|---|---|
+| Ollama | `http://ollama:11434` | Docker DNS or host networking |
+| Stable Diffusion | `http://172.18.0.1:7860` | Docker bridge gateway IP |
+
+To find your bridge gateway IP:
+
+```bash
+docker network inspect <your_network> --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}'
+```
+
+Verify connectivity from inside the container:
+
+```bash
+docker exec open-webui curl -s http://172.18.0.1:7860/sdapi/v1/sd-models
+```
+
+## VRAM Management
+
+On a single 16GB GPU, gpt-oss:120b and SDXL cannot be loaded simultaneously. The pipeline handles this automatically:
+
+1. **Before image generation**: unloads all Ollama models from VRAM
+2. **After image generation**: unloads SD checkpoint from VRAM and drops Linux page cache
+3. Ollama reloads the model on the next chat request (~10-15s warm-up)
+
+If Ollama fails to load after image generation with a memory error, clear the page cache:
+
+```bash
+sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
+```
+
+## Architecture
+
+```
+User Message
+    │
+    ├─ Image uploaded? ──────────────── → llama3.2-vision:11b
+    │
+    ├─ AI Classifier (qwen2.5:7b)
+    │       │
+    │       ├─ coding ──────────────── → qwen2.5-coder:14b
+    │       ├─ diagram ─────────────── → qwen2.5-coder:14b (Mermaid)
+    │       ├─ reasoning ───────────── → gpt-oss:120b (FI/EN system prompt)
+    │       ├─ image_generation ────── → gpt-oss:120b (refine) → SDXL (generate)
+    │       └─ general ─────────────── → gpt-oss:120b
+    │
+    ├─ Heuristic Search Override
+    │       │
+    │       └─ Brave Search + page fetch (if needed)
+    │
+    └─ Stream response (with thinking tokens)
+```
+
+## Files
+
+| File | Description |
+|---|---|
+| `llm_router_v3.py` | Main pipeline (gpt-oss:120b) |
+| `llm_router-20b.py` | Lighter pipeline variant (gpt-oss:20b) |
+| `setup-sd.sh` | Stable Diffusion Forge install script |
+| `setup-sd-service.sh` | systemd service creation script |
+
+## License
+
+MIT