Upload files to "/"

This commit is contained in:
2026-04-05 07:17:02 +00:00
parent 39070e07d8
commit f641dfa2ba
5 changed files with 254 additions and 57 deletions
+63 -10
View File
@@ -9,6 +9,7 @@ An intelligent prompt classification and routing pipeline for [Open WebUI](https
- **Brave web search** with full page content fetching (top 3 results scraped)
- **Heuristic search overrides** — safety net that forces search for time-sensitive or factual questions
- **Image generation** via AUTOMATIC1111/Forge (Stable Diffusion XL) with LLM-refined prompts
- **Uncensored image generation** — prefix any prompt with `uncen` to bypass all classification/search and generate directly with Juggernaut XL v9
- **VRAM management** — automatically juggles GPU memory between Ollama and Stable Diffusion
- **Bilingual** — detects Finnish and forces responses in the correct language
- **Thinking/reasoning display** — streams model thinking tokens in collapsible blocks
@@ -23,6 +24,7 @@ An intelligent prompt classification and routing pipeline for [Open WebUI](https
| reasoning (FI) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (Finnish) |
| reasoning (EN) | gpt-oss:120b | gpt-oss:20b | Analysis, comparison, strategy (English) |
| image generation | gpt-oss:120b + SDXL | gpt-oss:20b + SDXL | "generate an image", "luo kuva" |
| uncensored image | Juggernaut XL v9 | Juggernaut XL v9 | Prompt starts with `uncen` |
| vision | llama3.2-vision:11b | llama3.2-vision:11b | User uploads an image |
| general | gpt-oss:120b | gpt-oss:20b | Everything else |
@@ -99,14 +101,19 @@ cd ~/stable-diffusion-webui
mkdir -p models/Stable-diffusion
wget -O models/Stable-diffusion/sd_xl_base_1.0.safetensors \
"https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors"
# Download Juggernaut XL v9 for uncensored image generation (~6.6GB)
wget -O models/Stable-diffusion/juggernautXL_v9.safetensors \
"https://huggingface.co/RunDiffusion/Juggernaut-XL-v9/resolve/main/Juggernaut-XL_v9_RunDiffusionPhoto_v2.safetensors"
```
#### Fix Python 3.10 build issues (Ubuntu 22.04)
Before the first launch, pre-install CLIP dependencies to avoid build failures:
The first launch will create a Python venv and install dependencies. CLIP will fail to build due to a `pkg_resources` issue on Python 3.10. Fix it:
```bash
cd ~/stable-diffusion-webui
# First launch creates the venv — run it once, let it fail, then fix:
./webui.sh --api --listen --xformers --no-half-vae || true
@@ -119,7 +126,7 @@ venv/bin/pip install --no-build-isolation \
./webui.sh --api --listen --xformers --no-half-vae
```
#### Select SDXL model
#### Select the default SDXL model
Once the UI is running, open it in a browser and select `sd_xl_base_1.0` from the checkpoint dropdown. Or via API:
@@ -129,14 +136,18 @@ curl -X POST http://localhost:7860/sdapi/v1/options \
-d '{"sd_model_checkpoint": "sd_xl_base_1.0.safetensors"}'
```
The pipeline automatically switches between models at runtime — `sd_xl_base_1.0` for normal generation, `juggernautXL_v9` when the `uncen` prefix is used.
#### Create a systemd service
Using the provided script:
```bash
chmod +x setup-sd-service.sh
sudo ./setup-sd-service.sh
```
Or manually:
Or manually (replace `$USER` and `$HOME` with actual values):
```bash
sudo tee /etc/systemd/system/stable-diffusion.service > /dev/null <<EOF
@@ -164,12 +175,16 @@ sudo systemctl enable --now stable-diffusion
#### Verify
```bash
# Check the service is running
sudo systemctl status stable-diffusion
# Check available models (should list both sd_xl_base and juggernautXL)
curl -s http://localhost:7860/sdapi/v1/sd-models | python3 -m json.tool
```
### 4. Network Configuration
The pipeline runs inside Open WebUI's Docker container and needs to reach:
The pipeline runs inside Open WebUI's Docker container and needs to reach services on the host:
| Service | URL from container | Notes |
|---|---|---|
@@ -182,21 +197,57 @@ To find your bridge gateway IP:
docker network inspect <your_network> --format '{{range .IPAM.Config}}{{.Gateway}}{{end}}'
```
Update `SD_URL` in the pipeline file if your gateway IP differs from `172.18.0.1`.
Verify connectivity from inside the container:
```bash
docker exec open-webui curl -s http://172.18.0.1:7860/sdapi/v1/sd-models
docker exec open-webui curl -s http://ollama:11434/api/tags | head -c 100
```
## Image Generation
### Default mode
Any prompt classified as `image_generation` (e.g. "generate an image of a cat in space") uses **SDXL Base 1.0**. The LLM refines the user's request into an optimized Stable Diffusion prompt with quality boosters, then calls the A1111 API.
### Uncensored mode
Prefix any prompt with `uncen` to bypass all classification, web search, and routing — the pipeline goes straight to image generation using **Juggernaut XL v9**:
```
uncen a beautiful sunset over the ocean
uncen portrait of a warrior in golden armor
```
The `uncen` prefix is stripped and the user's text is sent directly to Stable Diffusion with quality tags appended — **no LLM refinement** (to avoid model refusal). The pipeline switches the SD checkpoint via the API automatically.
### How it works
**Default mode:**
1. LLM (gpt-oss) converts the user request into an optimized SD prompt
2. Ollama models are unloaded from VRAM
3. SD checkpoint is loaded (SDXL Base)
4. Image is generated, compressed PNG→JPEG, and streamed in 4KB chunks
5. SD checkpoint is unloaded from VRAM and page cache is dropped
**Uncensored mode:**
1. `uncen` prefix is stripped, quality tags appended directly (no LLM call)
2. Ollama models are unloaded from VRAM
3. SD checkpoint is switched to Juggernaut XL v9
4. Image is generated, compressed PNG→JPEG, and streamed in 4KB chunks
5. SD checkpoint is unloaded from VRAM and page cache is dropped
## VRAM Management
On a single 16GB GPU, gpt-oss:120b and SDXL cannot be loaded simultaneously. The pipeline handles this automatically:
On a single 16GB GPU, large Ollama models and SDXL cannot be loaded simultaneously. The pipeline handles this automatically:
1. **Before image generation**: unloads all Ollama models from VRAM
2. **After image generation**: unloads SD checkpoint from VRAM and drops Linux page cache
1. **Before image generation**: unloads all Ollama models from VRAM via `keep_alive: 0`
2. **After image generation**: unloads SD checkpoint via `/sdapi/v1/unload-checkpoint` and drops Linux page cache
3. Ollama reloads the model on the next chat request (~10-15s warm-up)
If Ollama fails to load after image generation with a memory error, clear the page cache:
If Ollama fails to load after image generation with a memory error, manually clear the page cache:
```bash
sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
@@ -206,6 +257,8 @@ sudo sh -c 'sync; echo 3 > /proc/sys/vm/drop_caches'
```
User Message
├─ "uncen" prefix? ─────────────── → Juggernaut XL v9 (direct, no search)
├─ Image uploaded? ──────────────── → llama3.2-vision:11b
@@ -214,7 +267,7 @@ User Message
│ ├─ coding ──────────────── → qwen2.5-coder:14b
│ ├─ diagram ─────────────── → qwen2.5-coder:14b (Mermaid)
│ ├─ reasoning ───────────── → gpt-oss:120b (FI/EN system prompt)
│ ├─ image_generation ────── → gpt-oss:120b (refine) → SDXL (generate)
│ ├─ image_generation ────── → gpt-oss:120b (refine) → SDXL Base
│ └─ general ─────────────── → gpt-oss:120b
├─ Heuristic Search Override
@@ -230,7 +283,7 @@ User Message
|---|---|
| `llm_router_v3.py` | Main pipeline (gpt-oss:120b) |
| `llm_router-20b.py` | Lighter pipeline variant (gpt-oss:20b) |
| `setup-sd.sh` | Stable Diffusion Forge install script |
| `setup-sd.sh` | Stable Diffusion Forge install script (Ubuntu 22.04) |
| `setup-sd-service.sh` | systemd service creation script |
## License