Upload files to "/"

2026-04-05 05:20:44 +00:00
commit 6e852871a6
5 changed files with 2476 additions and 0 deletions
@@ -0,0 +1,58 @@
 # CLAUDE.md
 This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
 ## Project Overview
 This is an **Open WebUI Pipeline** (`llm_router_v3.py`) that acts as an intelligent LLM router. It classifies user prompts and routes them to different Ollama models based on intent, with integrated web search and image generation.
 ## Architecture
 Single-file pipeline (`llm_router_v3.py`) that runs inside Open WebUI's pipelines container. The flow is:
 1. **Task detection** — Open WebUI internal requests (title/tag generation) bypass routing and go to qwen2.5:7b directly
 2. **Vision detection** — checks if the latest user message contains an uploaded image
 3. **AI classification** — qwen2.5:7b classifies prompts into: coding, diagram, reasoning, image_generation, vision, general
 4. **Heuristic safety net** — keyword/pattern-based overrides can force search=true even if AI said no
 5. **Web search** — Brave Search API with full page content fetching for top 3 results
 6. **Image generation** — AUTOMATIC1111/Forge API via Stable Diffusion XL, with LLM-refined prompts
 7. **VRAM management** — automatically unloads Ollama models before SD generation and unloads SD checkpoint after, plus drops page cache to free RAM
 8. **Streaming response** — streams model output including thinking/reasoning tokens in collapsible blocks
 ### Model Routing
 | Category | Model | Notes |
 |---|---|---|
 | coding | qwen2.5-coder:14b | |
 | diagram | qwen2.5-coder:14b | Mermaid output |
 | reasoning (FI/EN) | gpt-oss:120b | Finnish detection via keyword scoring |
 | image_generation | gpt-oss:120b → SDXL | LLM refines prompt, then calls A1111 API |
 | vision | llama3.2-vision:11b | Only when latest user message has image |
 | general | gpt-oss:120b | |
 ### Key Design Decisions
 - **Finnish/English bilingual** — Finnish detected by scoring FINNISH_INDICATORS (threshold ≥ 2 matches). Reasoning routes to language-specific system prompts.
 - **Search is aggressive** — heuristic layer ensures search triggers for questions with named entities, freshness keywords, time-sensitive topics, even if AI classifier says no.
 - **Year injection** — search queries have wrong years replaced with current year to counter LLM hallucination.
 - **Image generation VRAM dance** — RTX 2000 Ada 16GB can't hold both gpt-oss:120b and SDXL simultaneously. Pipeline unloads Ollama before SD, unloads SD after, and drops Linux page cache.
 - **Chunked image streaming** — base64 images are compressed PNG→JPEG and yielded in 4KB chunks to avoid Open WebUI "chunk too big" errors.
 ## Deployment
 - **Open WebUI**: Docker container on `ai-stack_default` network
 - **Ollama**: Native on host (not Docker), reached via `http://ollama:11434` from containers
 - **AUTOMATIC1111 Forge**: Native on host, systemd service `stable-diffusion`, reached via `http://172.18.0.1:7860` (Docker bridge gateway)
 - **Server**: Ubuntu 22.04 LTS, NVIDIA RTX 2000 Ada 16GB
 Pipeline is deployed by copying `llm_router_v3.py` to `~/ai-stack/pipelines/` on the server and restarting the pipelines container.
 ## Setup Scripts
 - `setup-sd.sh` — installs AUTOMATIC1111 Forge + downloads SDXL model (Ubuntu 22.04 specific)
 - `setup-sd-service.sh` — creates systemd service for Forge (run after setup-sd.sh)
 ## Configuration
 All runtime settings are exposed as **Valves** in Open WebUI's pipeline settings UI:
 `ollama_url`, `sd_url`, `sd_width/height/steps/cfg_scale`, `brave_api_key`, `brave_max_results`, `use_ai_classifier`, `show_routing_info`, `search_context_max_chars`
@@ -0,0 +1,238 @@
 # LLM Router Pipeline for Open WebUI
 An intelligent prompt classification and routing pipeline for [Open WebUI](https://github.com/open-webui/open-webui). Classifies user prompts using AI (qwen2.5:7b) and routes them to specialized Ollama models, with integrated Brave web search, image generation via Stable Diffusion, and full Finnish/English bilingual support.
 ## Features
 - **AI-powered prompt classification** with keyword-based fallback
 - **Model routing** — coding, diagram, reasoning, vision, image generation, and general categories
 - **Brave web search** with full page content fetching (top 3 results scraped)
 - **Heuristic search overrides** — safety net that forces search for time-sensitive or factual questions
 - **Image generation** via AUTOMATIC1111/Forge (Stable Diffusion XL) with LLM-refined prompts
 - **VRAM management** — automatically juggles GPU memory between Ollama and Stable Diffusion
 - **Bilingual** — detects Finnish and forces responses in the correct language
 - **Thinking/reasoning display** — streams model thinking tokens in collapsible blocks
 - **Real-time search status** — shows which URLs are being fetched as search runs
 ## Model Routing
 | Category | Model (120B) | Model (20B) | Trigger |
 |---|---|---|---|
 | coding | qwen2.5-coder:14b | qwen2.5-coder:14b | User asks to write/fix/debug code |
 | diagram | qwen2.5-coder:14b | qwen2.5-coder:14b | Mermaid, flowchart, UML requests |
 | reasoning (FI) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (Finnish) |
 | reasoning (EN) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (English) |
 | image generation | gpt-oss:120b + SDXL | gpt-oss:20b + SDXL | "generate an image", "luo kuva" |
 | vision | llama3.2-vision:11b | llama3.2-vision:11b | User uploads an image |
 | general | gpt-oss:120b | gpt-oss:20b | Everything else |
 Two pipeline variants are provided:
 - **`llm_router_v3.py`** — uses gpt-oss:120b (higher quality, more VRAM/RAM)
 - **`llm_router-20b.py`** — uses gpt-oss:20b (lighter, better for constrained hardware)
 ## Prerequisites
 - **Ubuntu 22.04 LTS** (tested)
 - **NVIDIA GPU** with 16GB+ VRAM (tested on RTX 2000 Ada)
 - **Open WebUI** running in Docker with pipelines enabled
 - **Ollama** installed natively with models pulled:
  ```bash
  ollama pull qwen2.5:7b
  ollama pull qwen2.5-coder:14b
  ollama pull gpt-oss:120b    # or gpt-oss:20b for the lighter variant
  ollama pull llama3.2-vision:11b
  ```
 - **Brave Search API key** (free tier: https://brave.com/search/api/)
 ## Setup
 ### 1. Deploy the Pipeline
 Copy your chosen pipeline file to the Open WebUI pipelines directory:
 ```bash
 cp llm_router_v3.py ~/ai-stack/pipelines/
 # or for the 20B variant:
 cp llm_router-20b.py ~/ai-stack/pipelines/
 ```
 Restart the pipelines container:
 ```bash
 docker restart pipelines
 ```
 ### 2. Configure Valves in Open WebUI
 Go to **Admin Panel > Pipelines** in Open WebUI and configure:
 | Setting | Description | Default |
 |---|---|---|
 | `ollama_url` | Ollama API URL | `http://ollama:11434` |
 | `sd_url` | Stable Diffusion API URL | `http://172.18.0.1:7860` |
 | `brave_api_key` | Brave Search API key | (from env `BRAVE_API_KEY`) |
 | `sd_width` / `sd_height` | Generated image dimensions | 1024 x 1024 |
 | `sd_steps` | Sampling steps | 25 |
 | `sd_cfg_scale` | CFG scale | 7.0 |
 | `brave_max_results` | Number of search results | 6 |
 | `use_ai_classifier` | Use AI vs keyword-only classification | true |
 | `show_routing_info` | Show routing banner in responses | true |
 | `search_context_max_chars` | Max search context size | 12000 |
 ### 3. Set Up Stable Diffusion (Image Generation)
 > Skip this section if you don't need image generation.
 #### Install Forge (AUTOMATIC1111 fork)
 ```bash
 # Install system dependencies
 sudo apt-get update
 sudo apt-get install -y git wget python3-venv python3-pip \
    libgl1 libglib2.0-0 libsm6 libxrender1 libxext6 libffi-dev libssl-dev
 # Clone Forge
 git clone https://github.com/lllyasviel/stable-diffusion-webui-forge.git ~/stable-diffusion-webui
 cd ~/stable-diffusion-webui
 # Download SDXL model (~6.9GB)
 mkdir -p models/Stable-diffusion
 wget -O models/Stable-diffusion/sd_xl_base_1.0.safetensors \
    "https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"
 ```
 #### Fix Python 3.10 build issues (Ubuntu 22.04)
 Before the first launch, pre-install CLIP dependencies to avoid build failures:
 ```bash
 cd ~/stable-diffusion-webui
 # First launch creates the venv — run it once, let it fail, then fix:
 ./webui.sh --api --listen --xformers --no-half-vae || true
 # Fix CLIP build issue
 venv/bin/pip install "setuptools<70" wheel
 venv/bin/pip install --no-build-isolation \
    https://github.com/openai/CLIP/archive/d50d76daa670286dd6cacf3bcd80b5e4823fc8e1.zip
 # Launch again
 ./webui.sh --api --listen --xformers --no-half-vae
 ```
 #### Select SDXL model
 Once the UI is running, open it in a browser and select `sd_xl_base_1.0` from the checkpoint dropdown. Or via API:
 ```bash
 curl -X POST http://localhost:7860/sdapi/v1/options \
    -H "Content-Type: application/json" \
    -d '{"sd_model_checkpoint": "sd_xl_base_1.0.safetensors"}'
 ```
 #### Create a systemd service
 ```bash
 chmod +x setup-sd-service.sh
 sudo ./setup-sd-service.sh
 ```
 Or manually:
 ```bash
 sudo tee /etc/systemd/system/stable-diffusion.service > /dev/null <<EOF
 [Unit]
 Description=AUTOMATIC1111 Stable Diffusion WebUI
 After=network.target
 [Service]
 Type=simple
 User=$USER
 WorkingDirectory=$HOME/stable-diffusion-webui
 ExecStart=$HOME/stable-diffusion-webui/webui.sh --api --listen --xformers --no-half-vae --medvram-sdxl
 Restart=on-failure
 RestartSec=10
 Environment=HOME=$HOME
 [Install]
 WantedBy=multi-user.target
 EOF
 sudo systemctl daemon-reload
 sudo systemctl enable --now stable-diffusion
 ```
 #### Verify
 ```bash
 curl -s http://localhost:7860/sdapi/v1/sd-models | python3 -m json.tool
 ```
 ### 4. Network Configuration
 The pipeline runs inside Open WebUI's Docker container and needs to reach:
 | Service | URL from container | Notes |
 |---|---|---|
 | Ollama | `http://ollama:11434` | Docker DNS or host networking |
 | Stable Diffusion | `http://172.18.0.1:7860` | Docker bridge gateway IP |
 To find your bridge gateway IP:
 ```bash
 docker network inspect <your_network> --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}'
 ```
 Verify connectivity from inside the container:
 ```bash
 docker exec open-webui curl -s http://172.18.0.1:7860/sdapi/v1/sd-models
 ```
 ## VRAM Management
 On a single 16GB GPU, gpt-oss:120b and SDXL cannot be loaded simultaneously. The pipeline handles this automatically:
 1. **Before image generation**: unloads all Ollama models from VRAM
 2. **After image generation**: unloads SD checkpoint from VRAM and drops Linux page cache
 3. Ollama reloads the model on the next chat request (~10-15s warm-up)
 If Ollama fails to load after image generation with a memory error, clear the page cache:
 ```bash
 sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
 ```
 ## Architecture
 ```
 User Message
    │
    ├─ Image uploaded? ──────────────── → llama3.2-vision:11b
    │
    ├─ AI Classifier (qwen2.5:7b)
    │       │
    │       ├─ coding ──────────────── → qwen2.5-coder:14b
    │       ├─ diagram ─────────────── → qwen2.5-coder:14b (Mermaid)
    │       ├─ reasoning ───────────── → gpt-oss:120b (FI/EN system prompt)
    │       ├─ image_generation ────── → gpt-oss:120b (refine) → SDXL (generate)
    │       └─ general ─────────────── → gpt-oss:120b
    │
    ├─ Heuristic Search Override
    │       │
    │       └─ Brave Search + page fetch (if needed)
    │
    └─ Stream response (with thinking tokens)
 ```
 ## Files
 | File | Description |
 |---|---|
 | `llm_router_v3.py` | Main pipeline (gpt-oss:120b) |
 | `llm_router-20b.py` | Lighter pipeline variant (gpt-oss:20b) |
 | `setup-sd.sh` | Stable Diffusion Forge install script |
 | `setup-sd-service.sh` | systemd service creation script |
 ## License
 MIT
@@ -0,0 +1,40 @@
 #!/bin/bash
 # Create a systemd service for AUTOMATIC1111 so it starts on boot
 # Run this AFTER setup-sd.sh has completed successfully
 set -e
 SD_DIR="$HOME/stable-diffusion-webui"
 SERVICE_FILE="/etc/systemd/system/stable-diffusion.service"
 CURRENT_USER=$(whoami)
 echo "Creating systemd service for Stable Diffusion WebUI..."
 sudo tee "$SERVICE_FILE" > /dev/null <<EOF
 [Unit]
 Description=AUTOMATIC1111 Stable Diffusion WebUI
 After=network.target
 [Service]
 Type=simple
 User=$CURRENT_USER
 WorkingDirectory=$SD_DIR
 ExecStart=$SD_DIR/webui.sh --api --listen --xformers --no-half-vae
 Restart=on-failure
 RestartSec=10
 Environment=HOME=$HOME
 [Install]
 WantedBy=multi-user.target
 EOF
 sudo systemctl daemon-reload
 sudo systemctl enable stable-diffusion
 sudo systemctl start stable-diffusion
 echo ""
 echo "Service created and started!"
 echo "  Status:  sudo systemctl status stable-diffusion"
 echo "  Logs:    journalctl -u stable-diffusion -f"
 echo "  Stop:    sudo systemctl stop stable-diffusion"
 echo "  Restart: sudo systemctl restart stable-diffusion"