15 KiB
CLAUDE.md - Media Downloader Project Rules
PROJECT LOCATION: /opt/media-downloader/
All project files live under /opt/media-downloader/. Always use this as the base path.
TL;DR - READ EVERY SESSION:
- NEVER restart
media-downloaderservice - it's BLOCKED. ASK USER FIRST.- NEVER instantiate AuthManager() - it resets admin password.
- ALWAYS use parameterized SQL - never f-strings with user input.
- Use
scripts/get-api-token.shfor API testing, not AuthManager.
CRITICAL RULES - MUST FOLLOW
RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION
The media-downloader scheduler service is BLOCKED via deny list.
BLOCKED COMMANDS (will be rejected):
- systemctl restart media-downloader
- sudo systemctl restart media-downloader
YOU MUST ASK THE USER before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.
Why this rule exists: On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.
RULE 2: NEVER TOUCH AUTHENTICATION
NEVER instantiate AuthManager directly.
NEVER use AuthManager() constructor - it has side effects that can reset the admin password.
NEVER run any code that could modify user passwords, accounts, or authentication data.
Why this rule exists: On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's _create_default_user() method and RESET THE ADMIN PASSWORD.
RULE 3: USE PARAMETERIZED SQL QUERIES
NEVER use f-strings or string concatenation with user input in SQL queries.
ALWAYS use parameterized queries with ? placeholders.
# WRONG - SQL Injection risk
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")
# CORRECT - Parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))
RULE 4: USE DATABASE CONTEXT MANAGERS
ALWAYS use the context manager pattern for database connections.
Use for_write=True when performing write operations.
# CORRECT pattern
with db.get_connection(for_write=True) as conn:
cursor = conn.cursor()
cursor.execute(...)
# commit is automatic via context manager
For API Testing Tokens
Use the helper scripts:
# Get a fresh token and save to /tmp/api_token.txt
/opt/media-downloader/scripts/get-api-token.sh
# Make authenticated API calls (token loaded from /tmp/api_token.txt)
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
/opt/media-downloader/scripts/api-call.sh "/api/health"
claude_test credentials:
- Username:
claude_test - Password:
ClaudeTest2025Secure - Database: PostgreSQL
media_downloader
Services Reference
Active Systemd Services
| Service | Port | Command | Description |
|---|---|---|---|
media-downloader |
- | BLOCKED | SCHEDULER - Background task runner. REQUIRES PERMISSION TO RESTART |
media-downloader-api |
8000 | systemctl restart media-downloader-api |
API - FastAPI backend. Safe to restart. |
media-downloader-frontend |
3000 | systemctl restart media-downloader-frontend |
FRONTEND - Vite dev server. Usually auto-reloads. |
xvfb-media-downloader |
- | systemctl restart xvfb-media-downloader |
VIRTUAL DISPLAY - Xvfb for headless browser. |
nginx |
80/443 | systemctl restart nginx |
REVERSE PROXY - Routes requests to backend/frontend. |
redis-server |
6379 | systemctl restart redis-server |
CACHE - Session storage and caching. |
Active Docker Containers
| Container | Port | Description |
|---|---|---|
flaresolverr |
8191 | CLOUDFLARE BYPASS - Solves Cloudflare challenges |
immich_server |
2283 | PHOTO MANAGEMENT - Immich photo server |
immich_machine_learning |
- | ML - Immich machine learning backend |
immich_redis |
6379 (internal) | CACHE - Redis for Immich |
immich_postgres |
5432 (internal) | DATABASE - PostgreSQL for Immich |
immich_power_tools |
8001 | TOOLS - Immich power tools |
Universal Proxy (Docker)
The frontend is served through unified-proxy Docker container which caches assets.
After frontend changes, ALWAYS clear cache and reload:
# Clear nginx cache and reload
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"
Or restart the container entirely:
docker restart unified-proxy
When to Restart Which Service
- Python backend changes (
/web/backend/*.py,/modules/*.py): Restartmedia-downloader-api - Scheduler/module changes (
scheduler.py, download modules): Restartmedia-downloader(ASK FIRST!) - Frontend changes (
/web/frontend/src/*):- Build:
cd /opt/media-downloader/web/frontend && npm run build - Clear proxy cache:
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"
- Build:
Disabled Services (Run Manually)
| Service | Description |
|---|---|
media-cache-builder |
Builds media cache/indexes |
media-celebrity-enrichment |
Enriches celebrity data |
media-embedding-generator |
Generates content embeddings |
media-downloader-db-cleanup |
Database cleanup tasks |
Project Structure
/opt/media-downloader/
├── media-downloader.py # Main scheduler script (~4000 lines)
├── modules/ # Python modules (40+ files)
│ ├── unified_database.py # Database layer (5194 lines)
│ ├── scheduler.py # Task scheduler (1977 lines)
│ ├── settings_manager.py # Configuration (8401 lines)
│ ├── pg_adapter.py # SQLite → PostgreSQL transparency layer
│ ├── db_bootstrap.py # Database backend initialization
│ ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
│ ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
│ ├── discovery_system.py # Content discovery (37895 lines)
│ ├── semantic_search.py # CLIP embeddings (25713 lines)
│ └── ...
├── web/
│ ├── backend/ # FastAPI backend (port 8000)
│ │ ├── api.py # Main FastAPI app
│ │ ├── routers/ # API endpoints (21 routers)
│ │ │ ├── auth.py # Authentication
│ │ │ ├── downloads.py # Download management
│ │ │ ├── media.py # Media operations
│ │ │ ├── video.py # Video streaming
│ │ │ ├── face.py # Face recognition
│ │ │ ├── celebrity.py # Celebrity discovery
│ │ │ ├── appearances.py # TV/Podcast appearances
│ │ │ └── ...
│ │ ├── core/ # Core utilities
│ │ │ ├── dependencies.py
│ │ │ ├── exceptions.py
│ │ │ └── responses.py
│ │ ├── auth_manager.py # DO NOT INSTANTIATE
│ │ └── models/ # API models
│ └── frontend/ # React/TypeScript frontend
│ └── src/
│ ├── lib/api.ts # API client (2536 lines)
│ ├── pages/ # React pages (27 pages)
│ └── components/ # UI components
├── wrappers/ # Subprocess wrappers (8 files)
│ ├── base_subprocess_wrapper.py
│ ├── imginn_subprocess_wrapper.py
│ ├── fastdl_subprocess_wrapper.py
│ ├── instagram_client_subprocess_wrapper.py
│ ├── toolzu_subprocess_wrapper.py
│ ├── snapchat_subprocess_wrapper.py
│ ├── snapchat_client_subprocess_wrapper.py
│ └── forum_subprocess_wrapper.py
├── scripts/ # Helper scripts
│ ├── get-api-token.sh # Get API token safely
│ └── api-call.sh # Make API calls
├── database/ # Legacy SQLite databases (PostgreSQL is primary)
│ ├── media_downloader.db # Legacy — migrated to PostgreSQL
│ └── auth.db # Legacy — migrated to PostgreSQL
└── cookies/ # Session cookies
Build & Test Commands
# Frontend build
cd /opt/media-downloader/web/frontend && npm run build
# Python syntax check
python3 -m py_compile /opt/media-downloader/media-downloader.py
# Check all Python files
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done
# TypeScript check
cd /opt/media-downloader/web/frontend && npx tsc --noEmit
# View logs
journalctl -u media-downloader-api -f
journalctl -u media-downloader -f
Database
- Backend: PostgreSQL (database:
media_downloader, user:media_downloader) - Connection:
postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader - Environment: Controlled by
DATABASE_BACKEND=postgresqlandDATABASE_URLin/opt/media-downloader/.env - Legacy SQLite:
database/media_downloader.dbanddatabase/auth.db(no longer written to)
pg_adapter (SQLite → PostgreSQL Transparency Layer)
The entire codebase was originally written for SQLite. Rather than rewriting every database call, modules/pg_adapter.py monkey-patches Python's sqlite3 module so all existing SQLite code transparently uses PostgreSQL.
How it works:
modules/db_bootstrap.pyloads.envand checksDATABASE_BACKEND- If
postgresql, it replacessys.modules['sqlite3']withpg_adapter - All
sqlite3.connect()calls are intercepted and routed to PostgreSQL viapsycopg2 - SQL syntax is auto-translated:
?→%splaceholders,AUTOINCREMENT→SERIAL, etc. - Uses
psycopg2.pool.ThreadedConnectionPoolfor connection pooling
Key implications for development:
- Write SQL using SQLite syntax (
?placeholders,INTEGER PRIMARY KEY AUTOINCREMENT) — pg_adapter translates automatically - All
sqlite3imports work normally — they're intercepted by the adapter db_pathparameters in constructors are ignored (all connections go to PostgreSQL)for_write=Trueinget_connection()is important for PostgreSQL transaction handling- Direct
psqlaccess:psql -U media_downloader -d media_downloader
Important: Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.
When running direct psql commands: Use standard PostgreSQL syntax (%s placeholders, INSERT ... ON CONFLICT DO NOTHING, SERIAL, RETURNING, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.
Settings Storage
Configuration is stored in the PostgreSQL settings table, managed by modules/settings_manager.py:
settings_manager.get_all()returns nested dict of all settingssettings_manager.set(key, value, category)stores with type information- Supports dot notation for nested keys
- Settings are cached and synced via the API at
/api/config
General Rules
- Use
unified_database.pyfor all database operations - Always use parameterized queries (never f-strings with user input)
- Use
for_write=Truewhen writing to database
Known Issues & Technical Debt
Remaining HIGH Priority Issues
| Issue | Location | Description |
|---|---|---|
| Missing retry logic | forum_db_adapter.py |
Read queries fail on database lock |
Remaining MEDIUM Priority Issues
| Issue | Location | Description |
|---|---|---|
46x as any assertions |
Frontend | Reduces TypeScript safety |
| WebSocket token in URL | api.ts:2418-2423 |
Security concern |
Fixed Issues (2025-01-04)
| Issue | Fix Applied |
|---|---|
| Duplicate auth dependencies | Consolidated to core/dependencies.py, removed from api.py |
| Direct sqlite3 usage for main DB | Changed to app_state.db.get_connection() in media.py |
| Forum wrapper missing signal handlers | Added setup_signal_handlers() and set_database_reference() |
| Missing admin check on batch_move | Changed to require_admin dependency |
| Duplicate SQL filter constants | Extracted MEDIA_FILTERS to core/utils.py |
| Logger.log() calls | Changed to logger.debug() in media.py |
Code Patterns to Follow
Database access:
# Use context manager with for_write flag
with app_state.db.get_connection(for_write=True) as conn:
cursor = conn.cursor()
cursor.execute("INSERT INTO ...", (params,))
Error handling:
# Use the handle_exceptions decorator
from core.exceptions import handle_exceptions
@router.get("/endpoint")
@handle_exceptions("OperationName")
async def endpoint():
...
Logging:
# Use logger methods, not logger.log()
logger.debug("Message", module="ModuleName")
logger.error("Error message", module="ModuleName")
API Endpoints Summary
The backend has 21 routers providing 150+ endpoints:
| Router | Prefix | Key Endpoints |
|---|---|---|
auth |
/api/auth |
Login, logout, 2FA, preferences |
downloads |
/api/downloads |
List, search, analytics, filters |
media |
/api/media |
Gallery, batch ops, thumbnails |
recycle |
/api/recycle |
Recycle bin management |
review |
/api/review |
Review queue for new content |
face |
/api/face |
Face recognition, references |
video |
/api/video |
Video streaming, info |
video_queue |
/api/video-queue |
Download queue management |
scheduler |
/api/scheduler |
Task scheduling, status |
platforms |
/api/platforms |
Platform configs, triggers |
config |
/api/config |
Application settings |
celebrity |
/api/celebrity |
Celebrity discovery |
appearances |
/api/appearances |
TV/podcast appearances |
semantic |
/api/semantic |
Semantic search |
discovery |
/api |
Smart folders, timeline |
stats |
/api |
Dashboard stats, errors |
health |
/api/health |
System health checks |
maintenance |
/api/maintenance |
Cleanup operations |
scrapers |
/api/scrapers |
Scraper configurations |
manual_import |
/api/manual-import |
Manual file imports |
files |
/files |
File serving |
Remember
These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.