# CLAUDE.md - Media Downloader Project Rules

## PROJECT LOCATION: `/opt/media-downloader/`
**All project files live under `/opt/media-downloader/`.** Always use this as the base path.

> **TL;DR - READ EVERY SESSION:**
> 1. **NEVER restart `media-downloader` service** - it's BLOCKED. ASK USER FIRST.
> 2. **NEVER instantiate AuthManager()** - it resets admin password.
> 3. **ALWAYS use parameterized SQL** - never f-strings with user input.
> 4. **Use `scripts/get-api-token.sh`** for API testing, not AuthManager.

---

## CRITICAL RULES - MUST FOLLOW

### RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION

**The `media-downloader` scheduler service is BLOCKED via deny list.**

```
BLOCKED COMMANDS (will be rejected):
- systemctl restart media-downloader
- sudo systemctl restart media-downloader
```

**YOU MUST ASK THE USER** before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.

**Why this rule exists:** On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.

### RULE 2: NEVER TOUCH AUTHENTICATION

**NEVER instantiate AuthManager directly.**
**NEVER use `AuthManager()` constructor - it has side effects that can reset the admin password.**
**NEVER run any code that could modify user passwords, accounts, or authentication data.**

**Why this rule exists:** On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's `_create_default_user()` method and RESET THE ADMIN PASSWORD.

### RULE 3: USE PARAMETERIZED SQL QUERIES

**NEVER use f-strings or string concatenation with user input in SQL queries.**
**ALWAYS use parameterized queries with `?` placeholders.**

```python
# WRONG - SQL Injection risk
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")

# CORRECT - Parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))
```

### RULE 4: USE DATABASE CONTEXT MANAGERS

**ALWAYS use the context manager pattern for database connections.**
**Use `for_write=True` when performing write operations.**

```python
# CORRECT pattern
with db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute(...)
    # commit is automatic via context manager
```

### For API Testing Tokens

Use the helper scripts:
```bash
# Get a fresh token and save to /tmp/api_token.txt
/opt/media-downloader/scripts/get-api-token.sh

# Make authenticated API calls (token loaded from /tmp/api_token.txt)
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
/opt/media-downloader/scripts/api-call.sh "/api/health"
```

**claude_test credentials:**
- Username: `claude_test`
- Password: `ClaudeTest2025Secure`
- Database: PostgreSQL `media_downloader`

---

## Services Reference

### Active Systemd Services

| Service | Port | Command | Description |
|---------|------|---------|-------------|
| `media-downloader` | - | **BLOCKED** | **SCHEDULER** - Background task runner. **REQUIRES PERMISSION TO RESTART** |
| `media-downloader-api` | 8000 | `systemctl restart media-downloader-api` | **API** - FastAPI backend. Safe to restart. |
| `media-downloader-frontend` | 3000 | `systemctl restart media-downloader-frontend` | **FRONTEND** - Vite dev server. Usually auto-reloads. |
| `xvfb-media-downloader` | - | `systemctl restart xvfb-media-downloader` | **VIRTUAL DISPLAY** - Xvfb for headless browser. |
| `nginx` | 80/443 | `systemctl restart nginx` | **REVERSE PROXY** - Routes requests to backend/frontend. |
| `redis-server` | 6379 | `systemctl restart redis-server` | **CACHE** - Session storage and caching. |

### Active Docker Containers

| Container | Port | Description |
|-----------|------|-------------|
| `flaresolverr` | 8191 | **CLOUDFLARE BYPASS** - Solves Cloudflare challenges |
| `immich_server` | 2283 | **PHOTO MANAGEMENT** - Immich photo server |
| `immich_machine_learning` | - | **ML** - Immich machine learning backend |
| `immich_redis` | 6379 (internal) | **CACHE** - Redis for Immich |
| `immich_postgres` | 5432 (internal) | **DATABASE** - PostgreSQL for Immich |
| `immich_power_tools` | 8001 | **TOOLS** - Immich power tools |

### Universal Proxy (Docker)

The frontend is served through `unified-proxy` Docker container which caches assets.

**After frontend changes, ALWAYS clear cache and reload:**
```bash
# Clear nginx cache and reload
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"
```

**Or restart the container entirely:**
```bash
docker restart unified-proxy
```

### When to Restart Which Service

- **Python backend changes** (`/web/backend/*.py`, `/modules/*.py`): Restart `media-downloader-api`
- **Scheduler/module changes** (`scheduler.py`, download modules): Restart `media-downloader` **(ASK FIRST!)**
- **Frontend changes** (`/web/frontend/src/*`):
  1. Build: `cd /opt/media-downloader/web/frontend && npm run build`
  2. Clear proxy cache: `docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"`

### Disabled Services (Run Manually)

| Service | Description |
|---------|-------------|
| `media-cache-builder` | Builds media cache/indexes |
| `media-celebrity-enrichment` | Enriches celebrity data |
| `media-embedding-generator` | Generates content embeddings |
| `media-downloader-db-cleanup` | Database cleanup tasks |

---

## Project Structure

```
/opt/media-downloader/
├── media-downloader.py      # Main scheduler script (~4000 lines)
├── modules/                 # Python modules (40+ files)
│   ├── unified_database.py  # Database layer (5194 lines)
│   ├── scheduler.py         # Task scheduler (1977 lines)
│   ├── settings_manager.py  # Configuration (8401 lines)
│   ├── pg_adapter.py        # SQLite → PostgreSQL transparency layer
│   ├── db_bootstrap.py      # Database backend initialization
│   ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
│   ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
│   ├── discovery_system.py  # Content discovery (37895 lines)
│   ├── semantic_search.py   # CLIP embeddings (25713 lines)
│   └── ...
├── web/
│   ├── backend/             # FastAPI backend (port 8000)
│   │   ├── api.py           # Main FastAPI app
│   │   ├── routers/         # API endpoints (21 routers)
│   │   │   ├── auth.py      # Authentication
│   │   │   ├── downloads.py # Download management
│   │   │   ├── media.py     # Media operations
│   │   │   ├── video.py     # Video streaming
│   │   │   ├── face.py      # Face recognition
│   │   │   ├── celebrity.py # Celebrity discovery
│   │   │   ├── appearances.py # TV/Podcast appearances
│   │   │   └── ...
│   │   ├── core/            # Core utilities
│   │   │   ├── dependencies.py
│   │   │   ├── exceptions.py
│   │   │   └── responses.py
│   │   ├── auth_manager.py  # DO NOT INSTANTIATE
│   │   └── models/          # API models
│   └── frontend/            # React/TypeScript frontend
│       └── src/
│           ├── lib/api.ts   # API client (2536 lines)
│           ├── pages/       # React pages (27 pages)
│           └── components/  # UI components
├── wrappers/                # Subprocess wrappers (8 files)
│   ├── base_subprocess_wrapper.py
│   ├── imginn_subprocess_wrapper.py
│   ├── fastdl_subprocess_wrapper.py
│   ├── instagram_client_subprocess_wrapper.py
│   ├── toolzu_subprocess_wrapper.py
│   ├── snapchat_subprocess_wrapper.py
│   ├── snapchat_client_subprocess_wrapper.py
│   └── forum_subprocess_wrapper.py
├── scripts/                 # Helper scripts
│   ├── get-api-token.sh     # Get API token safely
│   └── api-call.sh          # Make API calls
├── database/                # Legacy SQLite databases (PostgreSQL is primary)
│   ├── media_downloader.db  # Legacy — migrated to PostgreSQL
│   └── auth.db              # Legacy — migrated to PostgreSQL
└── cookies/                 # Session cookies
```

---

## Build & Test Commands

```bash
# Frontend build
cd /opt/media-downloader/web/frontend && npm run build

# Python syntax check
python3 -m py_compile /opt/media-downloader/media-downloader.py

# Check all Python files
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done

# TypeScript check
cd /opt/media-downloader/web/frontend && npx tsc --noEmit

# View logs
journalctl -u media-downloader-api -f
journalctl -u media-downloader -f
```

---

## Database

- **Backend:** PostgreSQL (database: `media_downloader`, user: `media_downloader`)
- **Connection:** `postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader`
- **Environment:** Controlled by `DATABASE_BACKEND=postgresql` and `DATABASE_URL` in `/opt/media-downloader/.env`
- **Legacy SQLite:** `database/media_downloader.db` and `database/auth.db` (no longer written to)

### pg_adapter (SQLite → PostgreSQL Transparency Layer)

The entire codebase was originally written for SQLite. Rather than rewriting every database call, `modules/pg_adapter.py` monkey-patches Python's `sqlite3` module so all existing SQLite code transparently uses PostgreSQL.

**How it works:**
1. `modules/db_bootstrap.py` loads `.env` and checks `DATABASE_BACKEND`
2. If `postgresql`, it replaces `sys.modules['sqlite3']` with `pg_adapter`
3. All `sqlite3.connect()` calls are intercepted and routed to PostgreSQL via `psycopg2`
4. SQL syntax is auto-translated: `?` → `%s` placeholders, `AUTOINCREMENT` → `SERIAL`, etc.
5. Uses `psycopg2.pool.ThreadedConnectionPool` for connection pooling

**Key implications for development:**
- Write SQL using SQLite syntax (`?` placeholders, `INTEGER PRIMARY KEY AUTOINCREMENT`) — pg_adapter translates automatically
- All `sqlite3` imports work normally — they're intercepted by the adapter
- `db_path` parameters in constructors are ignored (all connections go to PostgreSQL)
- `for_write=True` in `get_connection()` is important for PostgreSQL transaction handling
- Direct `psql` access: `psql -U media_downloader -d media_downloader`

**Important:** Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.

**When running direct `psql` commands:** Use standard PostgreSQL syntax (`%s` placeholders, `INSERT ... ON CONFLICT DO NOTHING`, `SERIAL`, `RETURNING`, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.

### Settings Storage

Configuration is stored in the PostgreSQL `settings` table, managed by `modules/settings_manager.py`:
- `settings_manager.get_all()` returns nested dict of all settings
- `settings_manager.set(key, value, category)` stores with type information
- Supports dot notation for nested keys
- Settings are cached and synced via the API at `/api/config`

### General Rules
- Use `unified_database.py` for all database operations
- Always use parameterized queries (never f-strings with user input)
- Use `for_write=True` when writing to database

---

## Known Issues & Technical Debt

### Remaining HIGH Priority Issues

| Issue | Location | Description |
|-------|----------|-------------|
| Missing retry logic | `forum_db_adapter.py` | Read queries fail on database lock |

### Remaining MEDIUM Priority Issues

| Issue | Location | Description |
|-------|----------|-------------|
| 46x `as any` assertions | Frontend | Reduces TypeScript safety |
| WebSocket token in URL | `api.ts:2418-2423` | Security concern |

### Fixed Issues (2025-01-04)

| Issue | Fix Applied |
|-------|-------------|
| Duplicate auth dependencies | Consolidated to `core/dependencies.py`, removed from `api.py` |
| Direct sqlite3 usage for main DB | Changed to `app_state.db.get_connection()` in `media.py` |
| Forum wrapper missing signal handlers | Added `setup_signal_handlers()` and `set_database_reference()` |
| Missing admin check on batch_move | Changed to `require_admin` dependency |
| Duplicate SQL filter constants | Extracted `MEDIA_FILTERS` to `core/utils.py` |
| Logger.log() calls | Changed to `logger.debug()` in `media.py` |

### Code Patterns to Follow

**Database access:**
```python
# Use context manager with for_write flag
with app_state.db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO ...", (params,))
```

**Error handling:**
```python
# Use the handle_exceptions decorator
from core.exceptions import handle_exceptions

@router.get("/endpoint")
@handle_exceptions("OperationName")
async def endpoint():
    ...
```

**Logging:**
```python
# Use logger methods, not logger.log()
logger.debug("Message", module="ModuleName")
logger.error("Error message", module="ModuleName")
```

---

## API Endpoints Summary

The backend has **21 routers** providing **150+ endpoints**:

| Router | Prefix | Key Endpoints |
|--------|--------|---------------|
| `auth` | `/api/auth` | Login, logout, 2FA, preferences |
| `downloads` | `/api/downloads` | List, search, analytics, filters |
| `media` | `/api/media` | Gallery, batch ops, thumbnails |
| `recycle` | `/api/recycle` | Recycle bin management |
| `review` | `/api/review` | Review queue for new content |
| `face` | `/api/face` | Face recognition, references |
| `video` | `/api/video` | Video streaming, info |
| `video_queue` | `/api/video-queue` | Download queue management |
| `scheduler` | `/api/scheduler` | Task scheduling, status |
| `platforms` | `/api/platforms` | Platform configs, triggers |
| `config` | `/api/config` | Application settings |
| `celebrity` | `/api/celebrity` | Celebrity discovery |
| `appearances` | `/api/appearances` | TV/podcast appearances |
| `semantic` | `/api/semantic` | Semantic search |
| `discovery` | `/api` | Smart folders, timeline |
| `stats` | `/api` | Dashboard stats, errors |
| `health` | `/api/health` | System health checks |
| `maintenance` | `/api/maintenance` | Cleanup operations |
| `scrapers` | `/api/scrapers` | Scraper configurations |
| `manual_import` | `/api/manual-import` | Manual file imports |
| `files` | `/files` | File serving |

---

## Remember

These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.