Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Todd
2026-03-29 22:42:55 -04:00
commit 0d7b2b1aab
389 changed files with 280296 additions and 0 deletions

355
CLAUDE.md Normal file
View File

@@ -0,0 +1,355 @@
# CLAUDE.md - Media Downloader Project Rules
## PROJECT LOCATION: `/opt/media-downloader/`
**All project files live under `/opt/media-downloader/`.** Always use this as the base path.
> **TL;DR - READ EVERY SESSION:**
> 1. **NEVER restart `media-downloader` service** - it's BLOCKED. ASK USER FIRST.
> 2. **NEVER instantiate AuthManager()** - it resets admin password.
> 3. **ALWAYS use parameterized SQL** - never f-strings with user input.
> 4. **Use `scripts/get-api-token.sh`** for API testing, not AuthManager.
---
## CRITICAL RULES - MUST FOLLOW
### RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION
**The `media-downloader` scheduler service is BLOCKED via deny list.**
```
BLOCKED COMMANDS (will be rejected):
- systemctl restart media-downloader
- sudo systemctl restart media-downloader
```
**YOU MUST ASK THE USER** before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.
**Why this rule exists:** On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.
### RULE 2: NEVER TOUCH AUTHENTICATION
**NEVER instantiate AuthManager directly.**
**NEVER use `AuthManager()` constructor - it has side effects that can reset the admin password.**
**NEVER run any code that could modify user passwords, accounts, or authentication data.**
**Why this rule exists:** On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's `_create_default_user()` method and RESET THE ADMIN PASSWORD.
### RULE 3: USE PARAMETERIZED SQL QUERIES
**NEVER use f-strings or string concatenation with user input in SQL queries.**
**ALWAYS use parameterized queries with `?` placeholders.**
```python
# WRONG - SQL Injection risk
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")
# CORRECT - Parameterized query
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))
```
### RULE 4: USE DATABASE CONTEXT MANAGERS
**ALWAYS use the context manager pattern for database connections.**
**Use `for_write=True` when performing write operations.**
```python
# CORRECT pattern
with db.get_connection(for_write=True) as conn:
cursor = conn.cursor()
cursor.execute(...)
# commit is automatic via context manager
```
### For API Testing Tokens
Use the helper scripts:
```bash
# Get a fresh token and save to /tmp/api_token.txt
/opt/media-downloader/scripts/get-api-token.sh
# Make authenticated API calls (token loaded from /tmp/api_token.txt)
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
/opt/media-downloader/scripts/api-call.sh "/api/health"
```
**claude_test credentials:**
- Username: `claude_test`
- Password: `ClaudeTest2025Secure`
- Database: PostgreSQL `media_downloader`
---
## Services Reference
### Active Systemd Services
| Service | Port | Command | Description |
|---------|------|---------|-------------|
| `media-downloader` | - | **BLOCKED** | **SCHEDULER** - Background task runner. **REQUIRES PERMISSION TO RESTART** |
| `media-downloader-api` | 8000 | `systemctl restart media-downloader-api` | **API** - FastAPI backend. Safe to restart. |
| `media-downloader-frontend` | 3000 | `systemctl restart media-downloader-frontend` | **FRONTEND** - Vite dev server. Usually auto-reloads. |
| `xvfb-media-downloader` | - | `systemctl restart xvfb-media-downloader` | **VIRTUAL DISPLAY** - Xvfb for headless browser. |
| `nginx` | 80/443 | `systemctl restart nginx` | **REVERSE PROXY** - Routes requests to backend/frontend. |
| `redis-server` | 6379 | `systemctl restart redis-server` | **CACHE** - Session storage and caching. |
### Active Docker Containers
| Container | Port | Description |
|-----------|------|-------------|
| `flaresolverr` | 8191 | **CLOUDFLARE BYPASS** - Solves Cloudflare challenges |
| `immich_server` | 2283 | **PHOTO MANAGEMENT** - Immich photo server |
| `immich_machine_learning` | - | **ML** - Immich machine learning backend |
| `immich_redis` | 6379 (internal) | **CACHE** - Redis for Immich |
| `immich_postgres` | 5432 (internal) | **DATABASE** - PostgreSQL for Immich |
| `immich_power_tools` | 8001 | **TOOLS** - Immich power tools |
### Universal Proxy (Docker)
The frontend is served through `unified-proxy` Docker container which caches assets.
**After frontend changes, ALWAYS clear cache and reload:**
```bash
# Clear nginx cache and reload
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"
```
**Or restart the container entirely:**
```bash
docker restart unified-proxy
```
### When to Restart Which Service
- **Python backend changes** (`/web/backend/*.py`, `/modules/*.py`): Restart `media-downloader-api`
- **Scheduler/module changes** (`scheduler.py`, download modules): Restart `media-downloader` **(ASK FIRST!)**
- **Frontend changes** (`/web/frontend/src/*`):
1. Build: `cd /opt/media-downloader/web/frontend && npm run build`
2. Clear proxy cache: `docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"`
### Disabled Services (Run Manually)
| Service | Description |
|---------|-------------|
| `media-cache-builder` | Builds media cache/indexes |
| `media-celebrity-enrichment` | Enriches celebrity data |
| `media-embedding-generator` | Generates content embeddings |
| `media-downloader-db-cleanup` | Database cleanup tasks |
---
## Project Structure
```
/opt/media-downloader/
├── media-downloader.py # Main scheduler script (~4000 lines)
├── modules/ # Python modules (40+ files)
│ ├── unified_database.py # Database layer (5194 lines)
│ ├── scheduler.py # Task scheduler (1977 lines)
│ ├── settings_manager.py # Configuration (8401 lines)
│ ├── pg_adapter.py # SQLite → PostgreSQL transparency layer
│ ├── db_bootstrap.py # Database backend initialization
│ ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
│ ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
│ ├── discovery_system.py # Content discovery (37895 lines)
│ ├── semantic_search.py # CLIP embeddings (25713 lines)
│ └── ...
├── web/
│ ├── backend/ # FastAPI backend (port 8000)
│ │ ├── api.py # Main FastAPI app
│ │ ├── routers/ # API endpoints (21 routers)
│ │ │ ├── auth.py # Authentication
│ │ │ ├── downloads.py # Download management
│ │ │ ├── media.py # Media operations
│ │ │ ├── video.py # Video streaming
│ │ │ ├── face.py # Face recognition
│ │ │ ├── celebrity.py # Celebrity discovery
│ │ │ ├── appearances.py # TV/Podcast appearances
│ │ │ └── ...
│ │ ├── core/ # Core utilities
│ │ │ ├── dependencies.py
│ │ │ ├── exceptions.py
│ │ │ └── responses.py
│ │ ├── auth_manager.py # DO NOT INSTANTIATE
│ │ └── models/ # API models
│ └── frontend/ # React/TypeScript frontend
│ └── src/
│ ├── lib/api.ts # API client (2536 lines)
│ ├── pages/ # React pages (27 pages)
│ └── components/ # UI components
├── wrappers/ # Subprocess wrappers (8 files)
│ ├── base_subprocess_wrapper.py
│ ├── imginn_subprocess_wrapper.py
│ ├── fastdl_subprocess_wrapper.py
│ ├── instagram_client_subprocess_wrapper.py
│ ├── toolzu_subprocess_wrapper.py
│ ├── snapchat_subprocess_wrapper.py
│ ├── snapchat_client_subprocess_wrapper.py
│ └── forum_subprocess_wrapper.py
├── scripts/ # Helper scripts
│ ├── get-api-token.sh # Get API token safely
│ └── api-call.sh # Make API calls
├── database/ # Legacy SQLite databases (PostgreSQL is primary)
│ ├── media_downloader.db # Legacy — migrated to PostgreSQL
│ └── auth.db # Legacy — migrated to PostgreSQL
└── cookies/ # Session cookies
```
---
## Build & Test Commands
```bash
# Frontend build
cd /opt/media-downloader/web/frontend && npm run build
# Python syntax check
python3 -m py_compile /opt/media-downloader/media-downloader.py
# Check all Python files
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done
# TypeScript check
cd /opt/media-downloader/web/frontend && npx tsc --noEmit
# View logs
journalctl -u media-downloader-api -f
journalctl -u media-downloader -f
```
---
## Database
- **Backend:** PostgreSQL (database: `media_downloader`, user: `media_downloader`)
- **Connection:** `postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader`
- **Environment:** Controlled by `DATABASE_BACKEND=postgresql` and `DATABASE_URL` in `/opt/media-downloader/.env`
- **Legacy SQLite:** `database/media_downloader.db` and `database/auth.db` (no longer written to)
### pg_adapter (SQLite → PostgreSQL Transparency Layer)
The entire codebase was originally written for SQLite. Rather than rewriting every database call, `modules/pg_adapter.py` monkey-patches Python's `sqlite3` module so all existing SQLite code transparently uses PostgreSQL.
**How it works:**
1. `modules/db_bootstrap.py` loads `.env` and checks `DATABASE_BACKEND`
2. If `postgresql`, it replaces `sys.modules['sqlite3']` with `pg_adapter`
3. All `sqlite3.connect()` calls are intercepted and routed to PostgreSQL via `psycopg2`
4. SQL syntax is auto-translated: `?``%s` placeholders, `AUTOINCREMENT``SERIAL`, etc.
5. Uses `psycopg2.pool.ThreadedConnectionPool` for connection pooling
**Key implications for development:**
- Write SQL using SQLite syntax (`?` placeholders, `INTEGER PRIMARY KEY AUTOINCREMENT`) — pg_adapter translates automatically
- All `sqlite3` imports work normally — they're intercepted by the adapter
- `db_path` parameters in constructors are ignored (all connections go to PostgreSQL)
- `for_write=True` in `get_connection()` is important for PostgreSQL transaction handling
- Direct `psql` access: `psql -U media_downloader -d media_downloader`
**Important:** Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.
**When running direct `psql` commands:** Use standard PostgreSQL syntax (`%s` placeholders, `INSERT ... ON CONFLICT DO NOTHING`, `SERIAL`, `RETURNING`, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.
### Settings Storage
Configuration is stored in the PostgreSQL `settings` table, managed by `modules/settings_manager.py`:
- `settings_manager.get_all()` returns nested dict of all settings
- `settings_manager.set(key, value, category)` stores with type information
- Supports dot notation for nested keys
- Settings are cached and synced via the API at `/api/config`
### General Rules
- Use `unified_database.py` for all database operations
- Always use parameterized queries (never f-strings with user input)
- Use `for_write=True` when writing to database
---
## Known Issues & Technical Debt
### Remaining HIGH Priority Issues
| Issue | Location | Description |
|-------|----------|-------------|
| Missing retry logic | `forum_db_adapter.py` | Read queries fail on database lock |
### Remaining MEDIUM Priority Issues
| Issue | Location | Description |
|-------|----------|-------------|
| 46x `as any` assertions | Frontend | Reduces TypeScript safety |
| WebSocket token in URL | `api.ts:2418-2423` | Security concern |
### Fixed Issues (2025-01-04)
| Issue | Fix Applied |
|-------|-------------|
| Duplicate auth dependencies | Consolidated to `core/dependencies.py`, removed from `api.py` |
| Direct sqlite3 usage for main DB | Changed to `app_state.db.get_connection()` in `media.py` |
| Forum wrapper missing signal handlers | Added `setup_signal_handlers()` and `set_database_reference()` |
| Missing admin check on batch_move | Changed to `require_admin` dependency |
| Duplicate SQL filter constants | Extracted `MEDIA_FILTERS` to `core/utils.py` |
| Logger.log() calls | Changed to `logger.debug()` in `media.py` |
### Code Patterns to Follow
**Database access:**
```python
# Use context manager with for_write flag
with app_state.db.get_connection(for_write=True) as conn:
cursor = conn.cursor()
cursor.execute("INSERT INTO ...", (params,))
```
**Error handling:**
```python
# Use the handle_exceptions decorator
from core.exceptions import handle_exceptions
@router.get("/endpoint")
@handle_exceptions("OperationName")
async def endpoint():
...
```
**Logging:**
```python
# Use logger methods, not logger.log()
logger.debug("Message", module="ModuleName")
logger.error("Error message", module="ModuleName")
```
---
## API Endpoints Summary
The backend has **21 routers** providing **150+ endpoints**:
| Router | Prefix | Key Endpoints |
|--------|--------|---------------|
| `auth` | `/api/auth` | Login, logout, 2FA, preferences |
| `downloads` | `/api/downloads` | List, search, analytics, filters |
| `media` | `/api/media` | Gallery, batch ops, thumbnails |
| `recycle` | `/api/recycle` | Recycle bin management |
| `review` | `/api/review` | Review queue for new content |
| `face` | `/api/face` | Face recognition, references |
| `video` | `/api/video` | Video streaming, info |
| `video_queue` | `/api/video-queue` | Download queue management |
| `scheduler` | `/api/scheduler` | Task scheduling, status |
| `platforms` | `/api/platforms` | Platform configs, triggers |
| `config` | `/api/config` | Application settings |
| `celebrity` | `/api/celebrity` | Celebrity discovery |
| `appearances` | `/api/appearances` | TV/podcast appearances |
| `semantic` | `/api/semantic` | Semantic search |
| `discovery` | `/api` | Smart folders, timeline |
| `stats` | `/api` | Dashboard stats, errors |
| `health` | `/api/health` | System health checks |
| `maintenance` | `/api/maintenance` | Cleanup operations |
| `scrapers` | `/api/scrapers` | Scraper configurations |
| `manual_import` | `/api/manual-import` | Manual file imports |
| `files` | `/files` | File serving |
---
## Remember
These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.