Files
media-downloader/CLAUDE.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

15 KiB

CLAUDE.md - Media Downloader Project Rules

PROJECT LOCATION: /opt/media-downloader/

All project files live under /opt/media-downloader/. Always use this as the base path.

TL;DR - READ EVERY SESSION:

  1. NEVER restart media-downloader service - it's BLOCKED. ASK USER FIRST.
  2. NEVER instantiate AuthManager() - it resets admin password.
  3. ALWAYS use parameterized SQL - never f-strings with user input.
  4. Use scripts/get-api-token.sh for API testing, not AuthManager.

CRITICAL RULES - MUST FOLLOW

RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION

The media-downloader scheduler service is BLOCKED via deny list.

BLOCKED COMMANDS (will be rejected):
- systemctl restart media-downloader
- sudo systemctl restart media-downloader

YOU MUST ASK THE USER before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.

Why this rule exists: On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.

RULE 2: NEVER TOUCH AUTHENTICATION

NEVER instantiate AuthManager directly. NEVER use AuthManager() constructor - it has side effects that can reset the admin password. NEVER run any code that could modify user passwords, accounts, or authentication data.

Why this rule exists: On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's _create_default_user() method and RESET THE ADMIN PASSWORD.

RULE 3: USE PARAMETERIZED SQL QUERIES

NEVER use f-strings or string concatenation with user input in SQL queries. ALWAYS use parameterized queries with ? placeholders.

# WRONG - SQL Injection risk
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")

# CORRECT - Parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))

RULE 4: USE DATABASE CONTEXT MANAGERS

ALWAYS use the context manager pattern for database connections. Use for_write=True when performing write operations.

# CORRECT pattern
with db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute(...)
    # commit is automatic via context manager

For API Testing Tokens

Use the helper scripts:

# Get a fresh token and save to /tmp/api_token.txt
/opt/media-downloader/scripts/get-api-token.sh

# Make authenticated API calls (token loaded from /tmp/api_token.txt)
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
/opt/media-downloader/scripts/api-call.sh "/api/health"

claude_test credentials:

  • Username: claude_test
  • Password: ClaudeTest2025Secure
  • Database: PostgreSQL media_downloader

Services Reference

Active Systemd Services

Service Port Command Description
media-downloader - BLOCKED SCHEDULER - Background task runner. REQUIRES PERMISSION TO RESTART
media-downloader-api 8000 systemctl restart media-downloader-api API - FastAPI backend. Safe to restart.
media-downloader-frontend 3000 systemctl restart media-downloader-frontend FRONTEND - Vite dev server. Usually auto-reloads.
xvfb-media-downloader - systemctl restart xvfb-media-downloader VIRTUAL DISPLAY - Xvfb for headless browser.
nginx 80/443 systemctl restart nginx REVERSE PROXY - Routes requests to backend/frontend.
redis-server 6379 systemctl restart redis-server CACHE - Session storage and caching.

Active Docker Containers

Container Port Description
flaresolverr 8191 CLOUDFLARE BYPASS - Solves Cloudflare challenges
immich_server 2283 PHOTO MANAGEMENT - Immich photo server
immich_machine_learning - ML - Immich machine learning backend
immich_redis 6379 (internal) CACHE - Redis for Immich
immich_postgres 5432 (internal) DATABASE - PostgreSQL for Immich
immich_power_tools 8001 TOOLS - Immich power tools

Universal Proxy (Docker)

The frontend is served through unified-proxy Docker container which caches assets.

After frontend changes, ALWAYS clear cache and reload:

# Clear nginx cache and reload
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"

Or restart the container entirely:

docker restart unified-proxy

When to Restart Which Service

  • Python backend changes (/web/backend/*.py, /modules/*.py): Restart media-downloader-api
  • Scheduler/module changes (scheduler.py, download modules): Restart media-downloader (ASK FIRST!)
  • Frontend changes (/web/frontend/src/*):
    1. Build: cd /opt/media-downloader/web/frontend && npm run build
    2. Clear proxy cache: docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"

Disabled Services (Run Manually)

Service Description
media-cache-builder Builds media cache/indexes
media-celebrity-enrichment Enriches celebrity data
media-embedding-generator Generates content embeddings
media-downloader-db-cleanup Database cleanup tasks

Project Structure

/opt/media-downloader/
├── media-downloader.py      # Main scheduler script (~4000 lines)
├── modules/                 # Python modules (40+ files)
│   ├── unified_database.py  # Database layer (5194 lines)
│   ├── scheduler.py         # Task scheduler (1977 lines)
│   ├── settings_manager.py  # Configuration (8401 lines)
│   ├── pg_adapter.py        # SQLite → PostgreSQL transparency layer
│   ├── db_bootstrap.py      # Database backend initialization
│   ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
│   ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
│   ├── discovery_system.py  # Content discovery (37895 lines)
│   ├── semantic_search.py   # CLIP embeddings (25713 lines)
│   └── ...
├── web/
│   ├── backend/             # FastAPI backend (port 8000)
│   │   ├── api.py           # Main FastAPI app
│   │   ├── routers/         # API endpoints (21 routers)
│   │   │   ├── auth.py      # Authentication
│   │   │   ├── downloads.py # Download management
│   │   │   ├── media.py     # Media operations
│   │   │   ├── video.py     # Video streaming
│   │   │   ├── face.py      # Face recognition
│   │   │   ├── celebrity.py # Celebrity discovery
│   │   │   ├── appearances.py # TV/Podcast appearances
│   │   │   └── ...
│   │   ├── core/            # Core utilities
│   │   │   ├── dependencies.py
│   │   │   ├── exceptions.py
│   │   │   └── responses.py
│   │   ├── auth_manager.py  # DO NOT INSTANTIATE
│   │   └── models/          # API models
│   └── frontend/            # React/TypeScript frontend
│       └── src/
│           ├── lib/api.ts   # API client (2536 lines)
│           ├── pages/       # React pages (27 pages)
│           └── components/  # UI components
├── wrappers/                # Subprocess wrappers (8 files)
│   ├── base_subprocess_wrapper.py
│   ├── imginn_subprocess_wrapper.py
│   ├── fastdl_subprocess_wrapper.py
│   ├── instagram_client_subprocess_wrapper.py
│   ├── toolzu_subprocess_wrapper.py
│   ├── snapchat_subprocess_wrapper.py
│   ├── snapchat_client_subprocess_wrapper.py
│   └── forum_subprocess_wrapper.py
├── scripts/                 # Helper scripts
│   ├── get-api-token.sh     # Get API token safely
│   └── api-call.sh          # Make API calls
├── database/                # Legacy SQLite databases (PostgreSQL is primary)
│   ├── media_downloader.db  # Legacy — migrated to PostgreSQL
│   └── auth.db              # Legacy — migrated to PostgreSQL
└── cookies/                 # Session cookies

Build & Test Commands

# Frontend build
cd /opt/media-downloader/web/frontend && npm run build

# Python syntax check
python3 -m py_compile /opt/media-downloader/media-downloader.py

# Check all Python files
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done

# TypeScript check
cd /opt/media-downloader/web/frontend && npx tsc --noEmit

# View logs
journalctl -u media-downloader-api -f
journalctl -u media-downloader -f

Database

  • Backend: PostgreSQL (database: media_downloader, user: media_downloader)
  • Connection: postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader
  • Environment: Controlled by DATABASE_BACKEND=postgresql and DATABASE_URL in /opt/media-downloader/.env
  • Legacy SQLite: database/media_downloader.db and database/auth.db (no longer written to)

pg_adapter (SQLite → PostgreSQL Transparency Layer)

The entire codebase was originally written for SQLite. Rather than rewriting every database call, modules/pg_adapter.py monkey-patches Python's sqlite3 module so all existing SQLite code transparently uses PostgreSQL.

How it works:

  1. modules/db_bootstrap.py loads .env and checks DATABASE_BACKEND
  2. If postgresql, it replaces sys.modules['sqlite3'] with pg_adapter
  3. All sqlite3.connect() calls are intercepted and routed to PostgreSQL via psycopg2
  4. SQL syntax is auto-translated: ?%s placeholders, AUTOINCREMENTSERIAL, etc.
  5. Uses psycopg2.pool.ThreadedConnectionPool for connection pooling

Key implications for development:

  • Write SQL using SQLite syntax (? placeholders, INTEGER PRIMARY KEY AUTOINCREMENT) — pg_adapter translates automatically
  • All sqlite3 imports work normally — they're intercepted by the adapter
  • db_path parameters in constructors are ignored (all connections go to PostgreSQL)
  • for_write=True in get_connection() is important for PostgreSQL transaction handling
  • Direct psql access: psql -U media_downloader -d media_downloader

Important: Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.

When running direct psql commands: Use standard PostgreSQL syntax (%s placeholders, INSERT ... ON CONFLICT DO NOTHING, SERIAL, RETURNING, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.

Settings Storage

Configuration is stored in the PostgreSQL settings table, managed by modules/settings_manager.py:

  • settings_manager.get_all() returns nested dict of all settings
  • settings_manager.set(key, value, category) stores with type information
  • Supports dot notation for nested keys
  • Settings are cached and synced via the API at /api/config

General Rules

  • Use unified_database.py for all database operations
  • Always use parameterized queries (never f-strings with user input)
  • Use for_write=True when writing to database

Known Issues & Technical Debt

Remaining HIGH Priority Issues

Issue Location Description
Missing retry logic forum_db_adapter.py Read queries fail on database lock

Remaining MEDIUM Priority Issues

Issue Location Description
46x as any assertions Frontend Reduces TypeScript safety
WebSocket token in URL api.ts:2418-2423 Security concern

Fixed Issues (2025-01-04)

Issue Fix Applied
Duplicate auth dependencies Consolidated to core/dependencies.py, removed from api.py
Direct sqlite3 usage for main DB Changed to app_state.db.get_connection() in media.py
Forum wrapper missing signal handlers Added setup_signal_handlers() and set_database_reference()
Missing admin check on batch_move Changed to require_admin dependency
Duplicate SQL filter constants Extracted MEDIA_FILTERS to core/utils.py
Logger.log() calls Changed to logger.debug() in media.py

Code Patterns to Follow

Database access:

# Use context manager with for_write flag
with app_state.db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO ...", (params,))

Error handling:

# Use the handle_exceptions decorator
from core.exceptions import handle_exceptions

@router.get("/endpoint")
@handle_exceptions("OperationName")
async def endpoint():
    ...

Logging:

# Use logger methods, not logger.log()
logger.debug("Message", module="ModuleName")
logger.error("Error message", module="ModuleName")

API Endpoints Summary

The backend has 21 routers providing 150+ endpoints:

Router Prefix Key Endpoints
auth /api/auth Login, logout, 2FA, preferences
downloads /api/downloads List, search, analytics, filters
media /api/media Gallery, batch ops, thumbnails
recycle /api/recycle Recycle bin management
review /api/review Review queue for new content
face /api/face Face recognition, references
video /api/video Video streaming, info
video_queue /api/video-queue Download queue management
scheduler /api/scheduler Task scheduling, status
platforms /api/platforms Platform configs, triggers
config /api/config Application settings
celebrity /api/celebrity Celebrity discovery
appearances /api/appearances TV/podcast appearances
semantic /api/semantic Semantic search
discovery /api Smart folders, timeline
stats /api Dashboard stats, errors
health /api/health System health checks
maintenance /api/maintenance Cleanup operations
scrapers /api/scrapers Scraper configurations
manual_import /api/manual-import Manual file imports
files /files File serving

Remember

These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.