Upload files to "/"
This commit is contained in:
@@ -4,53 +4,59 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
|
||||
|
||||
## Project Overview
|
||||
|
||||
This is an **Open WebUI Pipeline** (`llm_router_v3.py`) that acts as an intelligent LLM router. It classifies user prompts and routes them to different Ollama models based on intent, with integrated web search and image generation.
|
||||
This is an **Open WebUI Pipeline** that acts as an intelligent LLM router. It classifies user prompts and routes them to different Ollama models based on intent, with integrated web search and image generation. Two variants exist: `llm_router_v3.py` (gpt-oss:120b) and `llm_router-20b.py` (gpt-oss:20b).
|
||||
|
||||
## Architecture
|
||||
|
||||
Single-file pipeline (`llm_router_v3.py`) that runs inside Open WebUI's pipelines container. The flow is:
|
||||
Single-file pipelines that run inside Open WebUI's pipelines container. The flow is:
|
||||
|
||||
1. **Task detection** — Open WebUI internal requests (title/tag generation) bypass routing and go to qwen2.5:7b directly
|
||||
2. **Vision detection** — checks if the latest user message contains an uploaded image
|
||||
1. **"uncen" prefix detection** — bypasses all classification/search, goes straight to uncensored image generation (Juggernaut XL v9)
|
||||
2. **Vision detection** — checks if the latest user message (not assistant messages) contains an uploaded image
|
||||
3. **AI classification** — qwen2.5:7b classifies prompts into: coding, diagram, reasoning, image_generation, vision, general
|
||||
4. **Heuristic safety net** — keyword/pattern-based overrides can force search=true even if AI said no
|
||||
5. **Web search** — Brave Search API with full page content fetching for top 3 results
|
||||
6. **Image generation** — AUTOMATIC1111/Forge API via Stable Diffusion XL, with LLM-refined prompts
|
||||
7. **VRAM management** — automatically unloads Ollama models before SD generation and unloads SD checkpoint after, plus drops page cache to free RAM
|
||||
8. **Streaming response** — streams model output including thinking/reasoning tokens in collapsible blocks
|
||||
5. **Finnish language injection** — prepends Finnish instruction to system prompt when Finnish is detected
|
||||
6. **Web search** — Brave Search API with real-time status updates and full page content fetching for top 3 results
|
||||
7. **Image generation** — Forge API via SDXL (default) or Juggernaut XL v9 (uncensored), with LLM-refined prompts
|
||||
8. **VRAM management** — unloads Ollama before SD, unloads SD checkpoint after, drops page cache
|
||||
9. **Streaming response** — streams model output including thinking/reasoning tokens in collapsible `<details>` blocks
|
||||
|
||||
### Model Routing
|
||||
|
||||
| Category | Model | Notes |
|
||||
|---|---|---|
|
||||
| coding | qwen2.5-coder:14b | |
|
||||
| coding | qwen2.5-coder:14b | Only when user asks to write/fix code |
|
||||
| diagram | qwen2.5-coder:14b | Mermaid output |
|
||||
| reasoning (FI/EN) | gpt-oss:120b | Finnish detection via keyword scoring |
|
||||
| image_generation | gpt-oss:120b → SDXL | LLM refines prompt, then calls A1111 API |
|
||||
| reasoning (FI/EN) | gpt-oss:120b / 20b | Finnish detection via keyword scoring (threshold ≥ 2) |
|
||||
| image_generation | gpt-oss → SDXL Base | LLM refines prompt, then calls A1111 API |
|
||||
| uncensored image | Juggernaut XL v9 (no LLM) | Triggered by "uncen" prefix, skips classifier, search, and LLM refinement |
|
||||
| vision | llama3.2-vision:11b | Only when latest user message has image |
|
||||
| general | gpt-oss:120b | |
|
||||
| general | gpt-oss:120b / 20b | |
|
||||
|
||||
### Key Design Decisions
|
||||
|
||||
- **Finnish/English bilingual** — Finnish detected by scoring FINNISH_INDICATORS (threshold ≥ 2 matches). Reasoning routes to language-specific system prompts.
|
||||
- **Search is aggressive** — heuristic layer ensures search triggers for questions with named entities, freshness keywords, time-sensitive topics, even if AI classifier says no.
|
||||
- **"uncen" prefix** — highest priority check, bypasses everything (classification, search, vision detection, LLM refinement) and sends the user's text directly to Juggernaut XL v9 with quality tags appended. LLM is skipped entirely to avoid refusal from censored models.
|
||||
- **Classifier strictness** — "coding" only triggers when user explicitly asks for code output. Discussing IT/tech topics routes to general/reasoning.
|
||||
- **Finnish/English bilingual** — Finnish detected by scoring FINNISH_INDICATORS. A Finnish instruction is injected into system prompts for all categories.
|
||||
- **Search is aggressive** — heuristic layer ensures search triggers for factual questions, even if AI classifier says no.
|
||||
- **Year injection** — search queries have wrong years replaced with current year to counter LLM hallucination.
|
||||
- **Image generation VRAM dance** — RTX 2000 Ada 16GB can't hold both gpt-oss:120b and SDXL simultaneously. Pipeline unloads Ollama before SD, unloads SD after, and drops Linux page cache.
|
||||
- **Chunked image streaming** — base64 images are compressed PNG→JPEG and yielded in 4KB chunks to avoid Open WebUI "chunk too big" errors.
|
||||
- **VRAM dance** — RTX 2000 Ada 16GB can't hold both gpt-oss:120b and SDXL simultaneously. Pipeline unloads Ollama before SD, unloads SD after, drops page cache.
|
||||
- **SD model switching** — pipeline calls `/sdapi/v1/options` to swap between SDXL Base and Juggernaut XL v9 at runtime.
|
||||
- **Chunked image streaming** — base64 images compressed PNG→JPEG and yielded in 4KB chunks to avoid "chunk too big" errors.
|
||||
- **Vision false positive fix** — `has_image_content` only checks the latest user message, not assistant responses containing previously generated images.
|
||||
|
||||
## Deployment
|
||||
|
||||
- **Open WebUI**: Docker container on `ai-stack_default` network
|
||||
- **Ollama**: Native on host (not Docker), reached via `http://ollama:11434` from containers
|
||||
- **AUTOMATIC1111 Forge**: Native on host, systemd service `stable-diffusion`, reached via `http://172.18.0.1:7860` (Docker bridge gateway)
|
||||
- **Open WebUI**: Docker container on `ai-stack_default` bridge network
|
||||
- **Ollama**: Native on host, reached via `http://ollama:11434` from containers
|
||||
- **Forge (A1111)**: Native on host, systemd service `stable-diffusion`, reached via `http://172.18.0.1:7860` (Docker bridge gateway)
|
||||
- **Server**: Ubuntu 22.04 LTS, NVIDIA RTX 2000 Ada 16GB
|
||||
|
||||
Pipeline is deployed by copying `llm_router_v3.py` to `~/ai-stack/pipelines/` on the server and restarting the pipelines container.
|
||||
Pipeline is deployed by copying the `.py` file to `~/ai-stack/pipelines/` on the server and restarting the pipelines container.
|
||||
|
||||
## Setup Scripts
|
||||
|
||||
- `setup-sd.sh` — installs AUTOMATIC1111 Forge + downloads SDXL model (Ubuntu 22.04 specific)
|
||||
- `setup-sd-service.sh` — creates systemd service for Forge (run after setup-sd.sh)
|
||||
- `setup-sd.sh` — installs Forge, downloads SDXL Base + Juggernaut XL v9, fixes CLIP build issue (Ubuntu 22.04)
|
||||
- `setup-sd-service.sh` — creates systemd service for Forge (handles sudo user detection correctly)
|
||||
|
||||
## Configuration
|
||||
|
||||
|
||||
Reference in New Issue
Block a user