Upload files to "/"

2026-04-05 05:20:44 +00:00
commit 6e852871a6
5 changed files with 2476 additions and 0 deletions
@@ -0,0 +1,58 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## Project Overview
+
+This is an **Open WebUI Pipeline** (`llm_router_v3.py`) that acts as an intelligent LLM router. It classifies user prompts and routes them to different Ollama models based on intent, with integrated web search and image generation.
+
+## Architecture
+
+Single-file pipeline (`llm_router_v3.py`) that runs inside Open WebUI's pipelines container. The flow is:
+
+1. **Task detection** — Open WebUI internal requests (title/tag generation) bypass routing and go to qwen2.5:7b directly
+2. **Vision detection** — checks if the latest user message contains an uploaded image
+3. **AI classification** — qwen2.5:7b classifies prompts into: coding, diagram, reasoning, image_generation, vision, general
+4. **Heuristic safety net** — keyword/pattern-based overrides can force search=true even if AI said no
+5. **Web search** — Brave Search API with full page content fetching for top 3 results
+6. **Image generation** — AUTOMATIC1111/Forge API via Stable Diffusion XL, with LLM-refined prompts
+7. **VRAM management** — automatically unloads Ollama models before SD generation and unloads SD checkpoint after, plus drops page cache to free RAM
+8. **Streaming response** — streams model output including thinking/reasoning tokens in collapsible blocks
+
+### Model Routing
+
+| Category | Model | Notes |
+|---|---|---|
+| coding | qwen2.5-coder:14b | |
+| diagram | qwen2.5-coder:14b | Mermaid output |
+| reasoning (FI/EN) | gpt-oss:120b | Finnish detection via keyword scoring |
+| image_generation | gpt-oss:120b → SDXL | LLM refines prompt, then calls A1111 API |
+| vision | llama3.2-vision:11b | Only when latest user message has image |
+| general | gpt-oss:120b | |
+
+### Key Design Decisions
+
+- **Finnish/English bilingual** — Finnish detected by scoring FINNISH_INDICATORS (threshold ≥ 2 matches). Reasoning routes to language-specific system prompts.
+- **Search is aggressive** — heuristic layer ensures search triggers for questions with named entities, freshness keywords, time-sensitive topics, even if AI classifier says no.
+- **Year injection** — search queries have wrong years replaced with current year to counter LLM hallucination.
+- **Image generation VRAM dance** — RTX 2000 Ada 16GB can't hold both gpt-oss:120b and SDXL simultaneously. Pipeline unloads Ollama before SD, unloads SD after, and drops Linux page cache.
+- **Chunked image streaming** — base64 images are compressed PNG→JPEG and yielded in 4KB chunks to avoid Open WebUI "chunk too big" errors.
+
+## Deployment
+
+- **Open WebUI**: Docker container on `ai-stack_default` network
+- **Ollama**: Native on host (not Docker), reached via `http://ollama:11434` from containers
+- **AUTOMATIC1111 Forge**: Native on host, systemd service `stable-diffusion`, reached via `http://172.18.0.1:7860` (Docker bridge gateway)
+- **Server**: Ubuntu 22.04 LTS, NVIDIA RTX 2000 Ada 16GB
+
+Pipeline is deployed by copying `llm_router_v3.py` to `~/ai-stack/pipelines/` on the server and restarting the pipelines container.
+
+## Setup Scripts
+
+- `setup-sd.sh` — installs AUTOMATIC1111 Forge + downloads SDXL model (Ubuntu 22.04 specific)
+- `setup-sd-service.sh` — creates systemd service for Forge (run after setup-sd.sh)
+
+## Configuration
+
+All runtime settings are exposed as **Valves** in Open WebUI's pipeline settings UI:
+`ollama_url`, `sd_url`, `sd_width/height/steps/cfg_scale`, `brave_api_key`, `brave_max_results`, `use_ai_classifier`, `show_routing_info`, `search_context_max_chars`