# CLAUDE.md - Media Downloader Project Rules ## PROJECT LOCATION: `/opt/media-downloader/` **All project files live under `/opt/media-downloader/`.** Always use this as the base path. > **TL;DR - READ EVERY SESSION:** > 1. **NEVER restart `media-downloader` service** - it's BLOCKED. ASK USER FIRST. > 2. **NEVER instantiate AuthManager()** - it resets admin password. > 3. **ALWAYS use parameterized SQL** - never f-strings with user input. > 4. **Use `scripts/get-api-token.sh`** for API testing, not AuthManager. --- ## CRITICAL RULES - MUST FOLLOW ### RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION **The `media-downloader` scheduler service is BLOCKED via deny list.** ``` BLOCKED COMMANDS (will be rejected): - systemctl restart media-downloader - sudo systemctl restart media-downloader ``` **YOU MUST ASK THE USER** before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks. **Why this rule exists:** On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient. ### RULE 2: NEVER TOUCH AUTHENTICATION **NEVER instantiate AuthManager directly.** **NEVER use `AuthManager()` constructor - it has side effects that can reset the admin password.** **NEVER run any code that could modify user passwords, accounts, or authentication data.** **Why this rule exists:** On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's `_create_default_user()` method and RESET THE ADMIN PASSWORD. ### RULE 3: USE PARAMETERIZED SQL QUERIES **NEVER use f-strings or string concatenation with user input in SQL queries.** **ALWAYS use parameterized queries with `?` placeholders.** ```python # WRONG - SQL Injection risk cursor.execute(f"SELECT * FROM users WHERE name = '{username}'") # CORRECT - Parameterized query cursor.execute("SELECT * FROM users WHERE name = ?", (username,)) ``` ### RULE 4: USE DATABASE CONTEXT MANAGERS **ALWAYS use the context manager pattern for database connections.** **Use `for_write=True` when performing write operations.** ```python # CORRECT pattern with db.get_connection(for_write=True) as conn: cursor = conn.cursor() cursor.execute(...) # commit is automatic via context manager ``` ### For API Testing Tokens Use the helper scripts: ```bash # Get a fresh token and save to /tmp/api_token.txt /opt/media-downloader/scripts/get-api-token.sh # Make authenticated API calls (token loaded from /tmp/api_token.txt) /opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2" /opt/media-downloader/scripts/api-call.sh "/api/health" ``` **claude_test credentials:** - Username: `claude_test` - Password: `ClaudeTest2025Secure` - Database: PostgreSQL `media_downloader` --- ## Services Reference ### Active Systemd Services | Service | Port | Command | Description | |---------|------|---------|-------------| | `media-downloader` | - | **BLOCKED** | **SCHEDULER** - Background task runner. **REQUIRES PERMISSION TO RESTART** | | `media-downloader-api` | 8000 | `systemctl restart media-downloader-api` | **API** - FastAPI backend. Safe to restart. | | `media-downloader-frontend` | 3000 | `systemctl restart media-downloader-frontend` | **FRONTEND** - Vite dev server. Usually auto-reloads. | | `xvfb-media-downloader` | - | `systemctl restart xvfb-media-downloader` | **VIRTUAL DISPLAY** - Xvfb for headless browser. | | `nginx` | 80/443 | `systemctl restart nginx` | **REVERSE PROXY** - Routes requests to backend/frontend. | | `redis-server` | 6379 | `systemctl restart redis-server` | **CACHE** - Session storage and caching. | ### Active Docker Containers | Container | Port | Description | |-----------|------|-------------| | `flaresolverr` | 8191 | **CLOUDFLARE BYPASS** - Solves Cloudflare challenges | | `immich_server` | 2283 | **PHOTO MANAGEMENT** - Immich photo server | | `immich_machine_learning` | - | **ML** - Immich machine learning backend | | `immich_redis` | 6379 (internal) | **CACHE** - Redis for Immich | | `immich_postgres` | 5432 (internal) | **DATABASE** - PostgreSQL for Immich | | `immich_power_tools` | 8001 | **TOOLS** - Immich power tools | ### Universal Proxy (Docker) The frontend is served through `unified-proxy` Docker container which caches assets. **After frontend changes, ALWAYS clear cache and reload:** ```bash # Clear nginx cache and reload docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload" ``` **Or restart the container entirely:** ```bash docker restart unified-proxy ``` ### When to Restart Which Service - **Python backend changes** (`/web/backend/*.py`, `/modules/*.py`): Restart `media-downloader-api` - **Scheduler/module changes** (`scheduler.py`, download modules): Restart `media-downloader` **(ASK FIRST!)** - **Frontend changes** (`/web/frontend/src/*`): 1. Build: `cd /opt/media-downloader/web/frontend && npm run build` 2. Clear proxy cache: `docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"` ### Disabled Services (Run Manually) | Service | Description | |---------|-------------| | `media-cache-builder` | Builds media cache/indexes | | `media-celebrity-enrichment` | Enriches celebrity data | | `media-embedding-generator` | Generates content embeddings | | `media-downloader-db-cleanup` | Database cleanup tasks | --- ## Project Structure ``` /opt/media-downloader/ ├── media-downloader.py # Main scheduler script (~4000 lines) ├── modules/ # Python modules (40+ files) │ ├── unified_database.py # Database layer (5194 lines) │ ├── scheduler.py # Task scheduler (1977 lines) │ ├── settings_manager.py # Configuration (8401 lines) │ ├── pg_adapter.py # SQLite → PostgreSQL transparency layer │ ├── db_bootstrap.py # Database backend initialization │ ├── cloudflare_handler.py # Cloudflare bypass (25804 lines) │ ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright) │ ├── discovery_system.py # Content discovery (37895 lines) │ ├── semantic_search.py # CLIP embeddings (25713 lines) │ └── ... ├── web/ │ ├── backend/ # FastAPI backend (port 8000) │ │ ├── api.py # Main FastAPI app │ │ ├── routers/ # API endpoints (21 routers) │ │ │ ├── auth.py # Authentication │ │ │ ├── downloads.py # Download management │ │ │ ├── media.py # Media operations │ │ │ ├── video.py # Video streaming │ │ │ ├── face.py # Face recognition │ │ │ ├── celebrity.py # Celebrity discovery │ │ │ ├── appearances.py # TV/Podcast appearances │ │ │ └── ... │ │ ├── core/ # Core utilities │ │ │ ├── dependencies.py │ │ │ ├── exceptions.py │ │ │ └── responses.py │ │ ├── auth_manager.py # DO NOT INSTANTIATE │ │ └── models/ # API models │ └── frontend/ # React/TypeScript frontend │ └── src/ │ ├── lib/api.ts # API client (2536 lines) │ ├── pages/ # React pages (27 pages) │ └── components/ # UI components ├── wrappers/ # Subprocess wrappers (8 files) │ ├── base_subprocess_wrapper.py │ ├── imginn_subprocess_wrapper.py │ ├── fastdl_subprocess_wrapper.py │ ├── instagram_client_subprocess_wrapper.py │ ├── toolzu_subprocess_wrapper.py │ ├── snapchat_subprocess_wrapper.py │ ├── snapchat_client_subprocess_wrapper.py │ └── forum_subprocess_wrapper.py ├── scripts/ # Helper scripts │ ├── get-api-token.sh # Get API token safely │ └── api-call.sh # Make API calls ├── database/ # Legacy SQLite databases (PostgreSQL is primary) │ ├── media_downloader.db # Legacy — migrated to PostgreSQL │ └── auth.db # Legacy — migrated to PostgreSQL └── cookies/ # Session cookies ``` --- ## Build & Test Commands ```bash # Frontend build cd /opt/media-downloader/web/frontend && npm run build # Python syntax check python3 -m py_compile /opt/media-downloader/media-downloader.py # Check all Python files for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done # TypeScript check cd /opt/media-downloader/web/frontend && npx tsc --noEmit # View logs journalctl -u media-downloader-api -f journalctl -u media-downloader -f ``` --- ## Database - **Backend:** PostgreSQL (database: `media_downloader`, user: `media_downloader`) - **Connection:** `postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader` - **Environment:** Controlled by `DATABASE_BACKEND=postgresql` and `DATABASE_URL` in `/opt/media-downloader/.env` - **Legacy SQLite:** `database/media_downloader.db` and `database/auth.db` (no longer written to) ### pg_adapter (SQLite → PostgreSQL Transparency Layer) The entire codebase was originally written for SQLite. Rather than rewriting every database call, `modules/pg_adapter.py` monkey-patches Python's `sqlite3` module so all existing SQLite code transparently uses PostgreSQL. **How it works:** 1. `modules/db_bootstrap.py` loads `.env` and checks `DATABASE_BACKEND` 2. If `postgresql`, it replaces `sys.modules['sqlite3']` with `pg_adapter` 3. All `sqlite3.connect()` calls are intercepted and routed to PostgreSQL via `psycopg2` 4. SQL syntax is auto-translated: `?` → `%s` placeholders, `AUTOINCREMENT` → `SERIAL`, etc. 5. Uses `psycopg2.pool.ThreadedConnectionPool` for connection pooling **Key implications for development:** - Write SQL using SQLite syntax (`?` placeholders, `INTEGER PRIMARY KEY AUTOINCREMENT`) — pg_adapter translates automatically - All `sqlite3` imports work normally — they're intercepted by the adapter - `db_path` parameters in constructors are ignored (all connections go to PostgreSQL) - `for_write=True` in `get_connection()` is important for PostgreSQL transaction handling - Direct `psql` access: `psql -U media_downloader -d media_downloader` **Important:** Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility. **When running direct `psql` commands:** Use standard PostgreSQL syntax (`%s` placeholders, `INSERT ... ON CONFLICT DO NOTHING`, `SERIAL`, `RETURNING`, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax. ### Settings Storage Configuration is stored in the PostgreSQL `settings` table, managed by `modules/settings_manager.py`: - `settings_manager.get_all()` returns nested dict of all settings - `settings_manager.set(key, value, category)` stores with type information - Supports dot notation for nested keys - Settings are cached and synced via the API at `/api/config` ### General Rules - Use `unified_database.py` for all database operations - Always use parameterized queries (never f-strings with user input) - Use `for_write=True` when writing to database --- ## Known Issues & Technical Debt ### Remaining HIGH Priority Issues | Issue | Location | Description | |-------|----------|-------------| | Missing retry logic | `forum_db_adapter.py` | Read queries fail on database lock | ### Remaining MEDIUM Priority Issues | Issue | Location | Description | |-------|----------|-------------| | 46x `as any` assertions | Frontend | Reduces TypeScript safety | | WebSocket token in URL | `api.ts:2418-2423` | Security concern | ### Fixed Issues (2025-01-04) | Issue | Fix Applied | |-------|-------------| | Duplicate auth dependencies | Consolidated to `core/dependencies.py`, removed from `api.py` | | Direct sqlite3 usage for main DB | Changed to `app_state.db.get_connection()` in `media.py` | | Forum wrapper missing signal handlers | Added `setup_signal_handlers()` and `set_database_reference()` | | Missing admin check on batch_move | Changed to `require_admin` dependency | | Duplicate SQL filter constants | Extracted `MEDIA_FILTERS` to `core/utils.py` | | Logger.log() calls | Changed to `logger.debug()` in `media.py` | ### Code Patterns to Follow **Database access:** ```python # Use context manager with for_write flag with app_state.db.get_connection(for_write=True) as conn: cursor = conn.cursor() cursor.execute("INSERT INTO ...", (params,)) ``` **Error handling:** ```python # Use the handle_exceptions decorator from core.exceptions import handle_exceptions @router.get("/endpoint") @handle_exceptions("OperationName") async def endpoint(): ... ``` **Logging:** ```python # Use logger methods, not logger.log() logger.debug("Message", module="ModuleName") logger.error("Error message", module="ModuleName") ``` --- ## API Endpoints Summary The backend has **21 routers** providing **150+ endpoints**: | Router | Prefix | Key Endpoints | |--------|--------|---------------| | `auth` | `/api/auth` | Login, logout, 2FA, preferences | | `downloads` | `/api/downloads` | List, search, analytics, filters | | `media` | `/api/media` | Gallery, batch ops, thumbnails | | `recycle` | `/api/recycle` | Recycle bin management | | `review` | `/api/review` | Review queue for new content | | `face` | `/api/face` | Face recognition, references | | `video` | `/api/video` | Video streaming, info | | `video_queue` | `/api/video-queue` | Download queue management | | `scheduler` | `/api/scheduler` | Task scheduling, status | | `platforms` | `/api/platforms` | Platform configs, triggers | | `config` | `/api/config` | Application settings | | `celebrity` | `/api/celebrity` | Celebrity discovery | | `appearances` | `/api/appearances` | TV/podcast appearances | | `semantic` | `/api/semantic` | Semantic search | | `discovery` | `/api` | Smart folders, timeline | | `stats` | `/api` | Dashboard stats, errors | | `health` | `/api/health` | System health checks | | `maintenance` | `/api/maintenance` | Cleanup operations | | `scrapers` | `/api/scrapers` | Scraper configurations | | `manual_import` | `/api/manual-import` | Manual file imports | | `files` | `/files` | File serving | --- ## Remember These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.