Files

Todd 0d7b2b1aab Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-29 22:42:55 -04:00

15 KiB

Raw Blame History

CLAUDE.md - Media Downloader Project Rules

PROJECT LOCATION: `/opt/media-downloader/`

All project files live under /opt/media-downloader/. Always use this as the base path.

TL;DR - READ EVERY SESSION:

NEVER restart media-downloader service - it's BLOCKED. ASK USER FIRST.

NEVER instantiate AuthManager() - it resets admin password.

ALWAYS use parameterized SQL - never f-strings with user input.

Use scripts/get-api-token.sh for API testing, not AuthManager.

CRITICAL RULES - MUST FOLLOW

RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION

The media-downloader scheduler service is BLOCKED via deny list.

BLOCKED COMMANDS (will be rejected):
- systemctl restart media-downloader
- sudo systemctl restart media-downloader

YOU MUST ASK THE USER before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.

Why this rule exists: On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.

RULE 2: NEVER TOUCH AUTHENTICATION

NEVER instantiate AuthManager directly. NEVER use AuthManager() constructor - it has side effects that can reset the admin password. NEVER run any code that could modify user passwords, accounts, or authentication data.

Why this rule exists: On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's _create_default_user() method and RESET THE ADMIN PASSWORD.

RULE 3: USE PARAMETERIZED SQL QUERIES

NEVER use f-strings or string concatenation with user input in SQL queries. ALWAYS use parameterized queries with ? placeholders.

# WRONG - SQL Injection risk
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")

# CORRECT - Parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))

RULE 4: USE DATABASE CONTEXT MANAGERS

ALWAYS use the context manager pattern for database connections. Use for_write=True when performing write operations.

# CORRECT pattern
with db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute(...)
    # commit is automatic via context manager

For API Testing Tokens

Use the helper scripts:

# Get a fresh token and save to /tmp/api_token.txt
/opt/media-downloader/scripts/get-api-token.sh

# Make authenticated API calls (token loaded from /tmp/api_token.txt)
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
/opt/media-downloader/scripts/api-call.sh "/api/health"

claude_test credentials:

Username: claude_test
Password: ClaudeTest2025Secure
Database: PostgreSQL media_downloader

Services Reference

Active Systemd Services

Service	Port	Command	Description
`media-downloader`	-	BLOCKED	SCHEDULER - Background task runner. REQUIRES PERMISSION TO RESTART
`media-downloader-api`	8000	`systemctl restart media-downloader-api`	API - FastAPI backend. Safe to restart.
`media-downloader-frontend`	3000	`systemctl restart media-downloader-frontend`	FRONTEND - Vite dev server. Usually auto-reloads.
`xvfb-media-downloader`	-	`systemctl restart xvfb-media-downloader`	VIRTUAL DISPLAY - Xvfb for headless browser.
`nginx`	80/443	`systemctl restart nginx`	REVERSE PROXY - Routes requests to backend/frontend.
`redis-server`	6379	`systemctl restart redis-server`	CACHE - Session storage and caching.

Active Docker Containers

Container	Port	Description
`flaresolverr`	8191	CLOUDFLARE BYPASS - Solves Cloudflare challenges
`immich_server`	2283	PHOTO MANAGEMENT - Immich photo server
`immich_machine_learning`	-	ML - Immich machine learning backend
`immich_redis`	6379 (internal)	CACHE - Redis for Immich
`immich_postgres`	5432 (internal)	DATABASE - PostgreSQL for Immich
`immich_power_tools`	8001	TOOLS - Immich power tools

Universal Proxy (Docker)

The frontend is served through unified-proxy Docker container which caches assets.

After frontend changes, ALWAYS clear cache and reload:

# Clear nginx cache and reload
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"

Or restart the container entirely:

docker restart unified-proxy

When to Restart Which Service

Python backend changes (/web/backend/*.py, /modules/*.py): Restart media-downloader-api
Scheduler/module changes (scheduler.py, download modules): Restart media-downloader (ASK FIRST!)
Frontend changes (/web/frontend/src/*):
1. Build: cd /opt/media-downloader/web/frontend && npm run build
2. Clear proxy cache: docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"

Disabled Services (Run Manually)

Service	Description
`media-cache-builder`	Builds media cache/indexes
`media-celebrity-enrichment`	Enriches celebrity data
`media-embedding-generator`	Generates content embeddings
`media-downloader-db-cleanup`	Database cleanup tasks

Project Structure

/opt/media-downloader/
├── media-downloader.py      # Main scheduler script (~4000 lines)
├── modules/                 # Python modules (40+ files)
│   ├── unified_database.py  # Database layer (5194 lines)
│   ├── scheduler.py         # Task scheduler (1977 lines)
│   ├── settings_manager.py  # Configuration (8401 lines)
│   ├── pg_adapter.py        # SQLite → PostgreSQL transparency layer
│   ├── db_bootstrap.py      # Database backend initialization
│   ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
│   ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
│   ├── discovery_system.py  # Content discovery (37895 lines)
│   ├── semantic_search.py   # CLIP embeddings (25713 lines)
│   └── ...
├── web/
│   ├── backend/             # FastAPI backend (port 8000)
│   │   ├── api.py           # Main FastAPI app
│   │   ├── routers/         # API endpoints (21 routers)
│   │   │   ├── auth.py      # Authentication
│   │   │   ├── downloads.py # Download management
│   │   │   ├── media.py     # Media operations
│   │   │   ├── video.py     # Video streaming
│   │   │   ├── face.py      # Face recognition
│   │   │   ├── celebrity.py # Celebrity discovery
│   │   │   ├── appearances.py # TV/Podcast appearances
│   │   │   └── ...
│   │   ├── core/            # Core utilities
│   │   │   ├── dependencies.py
│   │   │   ├── exceptions.py
│   │   │   └── responses.py
│   │   ├── auth_manager.py  # DO NOT INSTANTIATE
│   │   └── models/          # API models
│   └── frontend/            # React/TypeScript frontend
│       └── src/
│           ├── lib/api.ts   # API client (2536 lines)
│           ├── pages/       # React pages (27 pages)
│           └── components/  # UI components
├── wrappers/                # Subprocess wrappers (8 files)
│   ├── base_subprocess_wrapper.py
│   ├── imginn_subprocess_wrapper.py
│   ├── fastdl_subprocess_wrapper.py
│   ├── instagram_client_subprocess_wrapper.py
│   ├── toolzu_subprocess_wrapper.py
│   ├── snapchat_subprocess_wrapper.py
│   ├── snapchat_client_subprocess_wrapper.py
│   └── forum_subprocess_wrapper.py
├── scripts/                 # Helper scripts
│   ├── get-api-token.sh     # Get API token safely
│   └── api-call.sh          # Make API calls
├── database/                # Legacy SQLite databases (PostgreSQL is primary)
│   ├── media_downloader.db  # Legacy — migrated to PostgreSQL
│   └── auth.db              # Legacy — migrated to PostgreSQL
└── cookies/                 # Session cookies

Build & Test Commands

# Frontend build
cd /opt/media-downloader/web/frontend && npm run build

# Python syntax check
python3 -m py_compile /opt/media-downloader/media-downloader.py

# Check all Python files
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done

# TypeScript check
cd /opt/media-downloader/web/frontend && npx tsc --noEmit

# View logs
journalctl -u media-downloader-api -f
journalctl -u media-downloader -f

Database

Backend: PostgreSQL (database: media_downloader, user: media_downloader)
Connection: postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader
Environment: Controlled by DATABASE_BACKEND=postgresql and DATABASE_URL in /opt/media-downloader/.env
Legacy SQLite: database/media_downloader.db and database/auth.db (no longer written to)

pg_adapter (SQLite → PostgreSQL Transparency Layer)

The entire codebase was originally written for SQLite. Rather than rewriting every database call, modules/pg_adapter.py monkey-patches Python's sqlite3 module so all existing SQLite code transparently uses PostgreSQL.

How it works:

modules/db_bootstrap.py loads .env and checks DATABASE_BACKEND
If postgresql, it replaces sys.modules['sqlite3'] with pg_adapter
All sqlite3.connect() calls are intercepted and routed to PostgreSQL via psycopg2
SQL syntax is auto-translated: ? → %s placeholders, AUTOINCREMENT → SERIAL, etc.
Uses psycopg2.pool.ThreadedConnectionPool for connection pooling

Key implications for development:

Write SQL using SQLite syntax (? placeholders, INTEGER PRIMARY KEY AUTOINCREMENT) — pg_adapter translates automatically
All sqlite3 imports work normally — they're intercepted by the adapter
db_path parameters in constructors are ignored (all connections go to PostgreSQL)
for_write=True in get_connection() is important for PostgreSQL transaction handling
Direct psql access: psql -U media_downloader -d media_downloader

Important: Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.

When running direct psql commands: Use standard PostgreSQL syntax (%s placeholders, INSERT ... ON CONFLICT DO NOTHING, SERIAL, RETURNING, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.

Settings Storage

Configuration is stored in the PostgreSQL settings table, managed by modules/settings_manager.py:

settings_manager.get_all() returns nested dict of all settings
settings_manager.set(key, value, category) stores with type information
Supports dot notation for nested keys
Settings are cached and synced via the API at /api/config

General Rules

Use unified_database.py for all database operations
Always use parameterized queries (never f-strings with user input)
Use for_write=True when writing to database

Known Issues & Technical Debt

Remaining HIGH Priority Issues

Issue	Location	Description
Missing retry logic	`forum_db_adapter.py`	Read queries fail on database lock

Remaining MEDIUM Priority Issues

Issue	Location	Description
46x `as any` assertions	Frontend	Reduces TypeScript safety
WebSocket token in URL	`api.ts:2418-2423`	Security concern

Fixed Issues (2025-01-04)

Issue	Fix Applied
Duplicate auth dependencies	Consolidated to `core/dependencies.py`, removed from `api.py`
Direct sqlite3 usage for main DB	Changed to `app_state.db.get_connection()` in `media.py`
Forum wrapper missing signal handlers	Added `setup_signal_handlers()` and `set_database_reference()`
Missing admin check on batch_move	Changed to `require_admin` dependency
Duplicate SQL filter constants	Extracted `MEDIA_FILTERS` to `core/utils.py`
Logger.log() calls	Changed to `logger.debug()` in `media.py`

Code Patterns to Follow

Database access:

# Use context manager with for_write flag
with app_state.db.get_connection(for_write=True) as conn:
    cursor = conn.cursor()
    cursor.execute("INSERT INTO ...", (params,))

Error handling:

# Use the handle_exceptions decorator
from core.exceptions import handle_exceptions

@router.get("/endpoint")
@handle_exceptions("OperationName")
async def endpoint():
    ...

Logging:

# Use logger methods, not logger.log()
logger.debug("Message", module="ModuleName")
logger.error("Error message", module="ModuleName")

API Endpoints Summary

The backend has 21 routers providing 150+ endpoints:

Router	Prefix	Key Endpoints
`auth`	`/api/auth`	Login, logout, 2FA, preferences
`downloads`	`/api/downloads`	List, search, analytics, filters
`media`	`/api/media`	Gallery, batch ops, thumbnails
`recycle`	`/api/recycle`	Recycle bin management
`review`	`/api/review`	Review queue for new content
`face`	`/api/face`	Face recognition, references
`video`	`/api/video`	Video streaming, info
`video_queue`	`/api/video-queue`	Download queue management
`scheduler`	`/api/scheduler`	Task scheduling, status
`platforms`	`/api/platforms`	Platform configs, triggers
`config`	`/api/config`	Application settings
`celebrity`	`/api/celebrity`	Celebrity discovery
`appearances`	`/api/appearances`	TV/podcast appearances
`semantic`	`/api/semantic`	Semantic search
`discovery`	`/api`	Smart folders, timeline
`stats`	`/api`	Dashboard stats, errors
`health`	`/api/health`	System health checks
`maintenance`	`/api/maintenance`	Cleanup operations
`scrapers`	`/api/scrapers`	Scraper configurations
`manual_import`	`/api/manual-import`	Manual file imports
`files`	`/files`	File serving

Remember

These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.

15 KiB Raw Blame History