355
CLAUDE.md
Normal file
355
CLAUDE.md
Normal file
@@ -0,0 +1,355 @@
|
||||
# CLAUDE.md - Media Downloader Project Rules
|
||||
|
||||
## PROJECT LOCATION: `/opt/media-downloader/`
|
||||
**All project files live under `/opt/media-downloader/`.** Always use this as the base path.
|
||||
|
||||
> **TL;DR - READ EVERY SESSION:**
|
||||
> 1. **NEVER restart `media-downloader` service** - it's BLOCKED. ASK USER FIRST.
|
||||
> 2. **NEVER instantiate AuthManager()** - it resets admin password.
|
||||
> 3. **ALWAYS use parameterized SQL** - never f-strings with user input.
|
||||
> 4. **Use `scripts/get-api-token.sh`** for API testing, not AuthManager.
|
||||
|
||||
---
|
||||
|
||||
## CRITICAL RULES - MUST FOLLOW
|
||||
|
||||
### RULE 1: NEVER RESTART THE SCHEDULER WITHOUT PERMISSION
|
||||
|
||||
**The `media-downloader` scheduler service is BLOCKED via deny list.**
|
||||
|
||||
```
|
||||
BLOCKED COMMANDS (will be rejected):
|
||||
- systemctl restart media-downloader
|
||||
- sudo systemctl restart media-downloader
|
||||
```
|
||||
|
||||
**YOU MUST ASK THE USER** before restarting the scheduler. It runs long-running tasks (downloads, scans, face recognition, cache building). Restarting kills those tasks.
|
||||
|
||||
**Why this rule exists:** On 2025-12-13, I restarted the scheduler without permission WHILE IT WAS RUNNING A SCAN. This killed the running scan and lost the user's work. The deny list is a technical enforcement because my promises were not sufficient.
|
||||
|
||||
### RULE 2: NEVER TOUCH AUTHENTICATION
|
||||
|
||||
**NEVER instantiate AuthManager directly.**
|
||||
**NEVER use `AuthManager()` constructor - it has side effects that can reset the admin password.**
|
||||
**NEVER run any code that could modify user passwords, accounts, or authentication data.**
|
||||
|
||||
**Why this rule exists:** On 2025-12-13, I carelessly instantiated AuthManager to generate a test token, which triggered its constructor's `_create_default_user()` method and RESET THE ADMIN PASSWORD.
|
||||
|
||||
### RULE 3: USE PARAMETERIZED SQL QUERIES
|
||||
|
||||
**NEVER use f-strings or string concatenation with user input in SQL queries.**
|
||||
**ALWAYS use parameterized queries with `?` placeholders.**
|
||||
|
||||
```python
|
||||
# WRONG - SQL Injection risk
|
||||
cursor.execute(f"SELECT * FROM users WHERE name = '{username}'")
|
||||
|
||||
# CORRECT - Parameterized query
|
||||
cursor.execute("SELECT * FROM users WHERE name = ?", (username,))
|
||||
```
|
||||
|
||||
### RULE 4: USE DATABASE CONTEXT MANAGERS
|
||||
|
||||
**ALWAYS use the context manager pattern for database connections.**
|
||||
**Use `for_write=True` when performing write operations.**
|
||||
|
||||
```python
|
||||
# CORRECT pattern
|
||||
with db.get_connection(for_write=True) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute(...)
|
||||
# commit is automatic via context manager
|
||||
```
|
||||
|
||||
### For API Testing Tokens
|
||||
|
||||
Use the helper scripts:
|
||||
```bash
|
||||
# Get a fresh token and save to /tmp/api_token.txt
|
||||
/opt/media-downloader/scripts/get-api-token.sh
|
||||
|
||||
# Make authenticated API calls (token loaded from /tmp/api_token.txt)
|
||||
/opt/media-downloader/scripts/api-call.sh "/api/video-queue?limit=2"
|
||||
/opt/media-downloader/scripts/api-call.sh "/api/health"
|
||||
```
|
||||
|
||||
**claude_test credentials:**
|
||||
- Username: `claude_test`
|
||||
- Password: `ClaudeTest2025Secure`
|
||||
- Database: PostgreSQL `media_downloader`
|
||||
|
||||
---
|
||||
|
||||
## Services Reference
|
||||
|
||||
### Active Systemd Services
|
||||
|
||||
| Service | Port | Command | Description |
|
||||
|---------|------|---------|-------------|
|
||||
| `media-downloader` | - | **BLOCKED** | **SCHEDULER** - Background task runner. **REQUIRES PERMISSION TO RESTART** |
|
||||
| `media-downloader-api` | 8000 | `systemctl restart media-downloader-api` | **API** - FastAPI backend. Safe to restart. |
|
||||
| `media-downloader-frontend` | 3000 | `systemctl restart media-downloader-frontend` | **FRONTEND** - Vite dev server. Usually auto-reloads. |
|
||||
| `xvfb-media-downloader` | - | `systemctl restart xvfb-media-downloader` | **VIRTUAL DISPLAY** - Xvfb for headless browser. |
|
||||
| `nginx` | 80/443 | `systemctl restart nginx` | **REVERSE PROXY** - Routes requests to backend/frontend. |
|
||||
| `redis-server` | 6379 | `systemctl restart redis-server` | **CACHE** - Session storage and caching. |
|
||||
|
||||
### Active Docker Containers
|
||||
|
||||
| Container | Port | Description |
|
||||
|-----------|------|-------------|
|
||||
| `flaresolverr` | 8191 | **CLOUDFLARE BYPASS** - Solves Cloudflare challenges |
|
||||
| `immich_server` | 2283 | **PHOTO MANAGEMENT** - Immich photo server |
|
||||
| `immich_machine_learning` | - | **ML** - Immich machine learning backend |
|
||||
| `immich_redis` | 6379 (internal) | **CACHE** - Redis for Immich |
|
||||
| `immich_postgres` | 5432 (internal) | **DATABASE** - PostgreSQL for Immich |
|
||||
| `immich_power_tools` | 8001 | **TOOLS** - Immich power tools |
|
||||
|
||||
### Universal Proxy (Docker)
|
||||
|
||||
The frontend is served through `unified-proxy` Docker container which caches assets.
|
||||
|
||||
**After frontend changes, ALWAYS clear cache and reload:**
|
||||
```bash
|
||||
# Clear nginx cache and reload
|
||||
docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"
|
||||
```
|
||||
|
||||
**Or restart the container entirely:**
|
||||
```bash
|
||||
docker restart unified-proxy
|
||||
```
|
||||
|
||||
### When to Restart Which Service
|
||||
|
||||
- **Python backend changes** (`/web/backend/*.py`, `/modules/*.py`): Restart `media-downloader-api`
|
||||
- **Scheduler/module changes** (`scheduler.py`, download modules): Restart `media-downloader` **(ASK FIRST!)**
|
||||
- **Frontend changes** (`/web/frontend/src/*`):
|
||||
1. Build: `cd /opt/media-downloader/web/frontend && npm run build`
|
||||
2. Clear proxy cache: `docker exec unified-proxy sh -c "rm -rf /var/cache/* 2>/dev/null; nginx -s reload"`
|
||||
|
||||
### Disabled Services (Run Manually)
|
||||
|
||||
| Service | Description |
|
||||
|---------|-------------|
|
||||
| `media-cache-builder` | Builds media cache/indexes |
|
||||
| `media-celebrity-enrichment` | Enriches celebrity data |
|
||||
| `media-embedding-generator` | Generates content embeddings |
|
||||
| `media-downloader-db-cleanup` | Database cleanup tasks |
|
||||
|
||||
---
|
||||
|
||||
## Project Structure
|
||||
|
||||
```
|
||||
/opt/media-downloader/
|
||||
├── media-downloader.py # Main scheduler script (~4000 lines)
|
||||
├── modules/ # Python modules (40+ files)
|
||||
│ ├── unified_database.py # Database layer (5194 lines)
|
||||
│ ├── scheduler.py # Task scheduler (1977 lines)
|
||||
│ ├── settings_manager.py # Configuration (8401 lines)
|
||||
│ ├── pg_adapter.py # SQLite → PostgreSQL transparency layer
|
||||
│ ├── db_bootstrap.py # Database backend initialization
|
||||
│ ├── cloudflare_handler.py # Cloudflare bypass (25804 lines)
|
||||
│ ├── snapchat_client_module.py # Snapchat via direct HTTP (no Playwright)
|
||||
│ ├── discovery_system.py # Content discovery (37895 lines)
|
||||
│ ├── semantic_search.py # CLIP embeddings (25713 lines)
|
||||
│ └── ...
|
||||
├── web/
|
||||
│ ├── backend/ # FastAPI backend (port 8000)
|
||||
│ │ ├── api.py # Main FastAPI app
|
||||
│ │ ├── routers/ # API endpoints (21 routers)
|
||||
│ │ │ ├── auth.py # Authentication
|
||||
│ │ │ ├── downloads.py # Download management
|
||||
│ │ │ ├── media.py # Media operations
|
||||
│ │ │ ├── video.py # Video streaming
|
||||
│ │ │ ├── face.py # Face recognition
|
||||
│ │ │ ├── celebrity.py # Celebrity discovery
|
||||
│ │ │ ├── appearances.py # TV/Podcast appearances
|
||||
│ │ │ └── ...
|
||||
│ │ ├── core/ # Core utilities
|
||||
│ │ │ ├── dependencies.py
|
||||
│ │ │ ├── exceptions.py
|
||||
│ │ │ └── responses.py
|
||||
│ │ ├── auth_manager.py # DO NOT INSTANTIATE
|
||||
│ │ └── models/ # API models
|
||||
│ └── frontend/ # React/TypeScript frontend
|
||||
│ └── src/
|
||||
│ ├── lib/api.ts # API client (2536 lines)
|
||||
│ ├── pages/ # React pages (27 pages)
|
||||
│ └── components/ # UI components
|
||||
├── wrappers/ # Subprocess wrappers (8 files)
|
||||
│ ├── base_subprocess_wrapper.py
|
||||
│ ├── imginn_subprocess_wrapper.py
|
||||
│ ├── fastdl_subprocess_wrapper.py
|
||||
│ ├── instagram_client_subprocess_wrapper.py
|
||||
│ ├── toolzu_subprocess_wrapper.py
|
||||
│ ├── snapchat_subprocess_wrapper.py
|
||||
│ ├── snapchat_client_subprocess_wrapper.py
|
||||
│ └── forum_subprocess_wrapper.py
|
||||
├── scripts/ # Helper scripts
|
||||
│ ├── get-api-token.sh # Get API token safely
|
||||
│ └── api-call.sh # Make API calls
|
||||
├── database/ # Legacy SQLite databases (PostgreSQL is primary)
|
||||
│ ├── media_downloader.db # Legacy — migrated to PostgreSQL
|
||||
│ └── auth.db # Legacy — migrated to PostgreSQL
|
||||
└── cookies/ # Session cookies
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Build & Test Commands
|
||||
|
||||
```bash
|
||||
# Frontend build
|
||||
cd /opt/media-downloader/web/frontend && npm run build
|
||||
|
||||
# Python syntax check
|
||||
python3 -m py_compile /opt/media-downloader/media-downloader.py
|
||||
|
||||
# Check all Python files
|
||||
for f in /opt/media-downloader/modules/*.py; do python3 -m py_compile "$f"; done
|
||||
for f in /opt/media-downloader/web/backend/routers/*.py; do python3 -m py_compile "$f"; done
|
||||
|
||||
# TypeScript check
|
||||
cd /opt/media-downloader/web/frontend && npx tsc --noEmit
|
||||
|
||||
# View logs
|
||||
journalctl -u media-downloader-api -f
|
||||
journalctl -u media-downloader -f
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Database
|
||||
|
||||
- **Backend:** PostgreSQL (database: `media_downloader`, user: `media_downloader`)
|
||||
- **Connection:** `postgresql://media_downloader:PNsihOXvvuPwWiIvGlsc9Fh2YmMmB@localhost/media_downloader`
|
||||
- **Environment:** Controlled by `DATABASE_BACKEND=postgresql` and `DATABASE_URL` in `/opt/media-downloader/.env`
|
||||
- **Legacy SQLite:** `database/media_downloader.db` and `database/auth.db` (no longer written to)
|
||||
|
||||
### pg_adapter (SQLite → PostgreSQL Transparency Layer)
|
||||
|
||||
The entire codebase was originally written for SQLite. Rather than rewriting every database call, `modules/pg_adapter.py` monkey-patches Python's `sqlite3` module so all existing SQLite code transparently uses PostgreSQL.
|
||||
|
||||
**How it works:**
|
||||
1. `modules/db_bootstrap.py` loads `.env` and checks `DATABASE_BACKEND`
|
||||
2. If `postgresql`, it replaces `sys.modules['sqlite3']` with `pg_adapter`
|
||||
3. All `sqlite3.connect()` calls are intercepted and routed to PostgreSQL via `psycopg2`
|
||||
4. SQL syntax is auto-translated: `?` → `%s` placeholders, `AUTOINCREMENT` → `SERIAL`, etc.
|
||||
5. Uses `psycopg2.pool.ThreadedConnectionPool` for connection pooling
|
||||
|
||||
**Key implications for development:**
|
||||
- Write SQL using SQLite syntax (`?` placeholders, `INTEGER PRIMARY KEY AUTOINCREMENT`) — pg_adapter translates automatically
|
||||
- All `sqlite3` imports work normally — they're intercepted by the adapter
|
||||
- `db_path` parameters in constructors are ignored (all connections go to PostgreSQL)
|
||||
- `for_write=True` in `get_connection()` is important for PostgreSQL transaction handling
|
||||
- Direct `psql` access: `psql -U media_downloader -d media_downloader`
|
||||
|
||||
**Important:** Do NOT use PostgreSQL-specific SQL syntax in the codebase. Always use SQLite-compatible syntax and let pg_adapter handle translation. This maintains backward compatibility.
|
||||
|
||||
**When running direct `psql` commands:** Use standard PostgreSQL syntax (`%s` placeholders, `INSERT ... ON CONFLICT DO NOTHING`, `SERIAL`, `RETURNING`, etc.). The pg_adapter translation ONLY applies to Python code — raw psql commands must use native PostgreSQL syntax.
|
||||
|
||||
### Settings Storage
|
||||
|
||||
Configuration is stored in the PostgreSQL `settings` table, managed by `modules/settings_manager.py`:
|
||||
- `settings_manager.get_all()` returns nested dict of all settings
|
||||
- `settings_manager.set(key, value, category)` stores with type information
|
||||
- Supports dot notation for nested keys
|
||||
- Settings are cached and synced via the API at `/api/config`
|
||||
|
||||
### General Rules
|
||||
- Use `unified_database.py` for all database operations
|
||||
- Always use parameterized queries (never f-strings with user input)
|
||||
- Use `for_write=True` when writing to database
|
||||
|
||||
---
|
||||
|
||||
## Known Issues & Technical Debt
|
||||
|
||||
### Remaining HIGH Priority Issues
|
||||
|
||||
| Issue | Location | Description |
|
||||
|-------|----------|-------------|
|
||||
| Missing retry logic | `forum_db_adapter.py` | Read queries fail on database lock |
|
||||
|
||||
### Remaining MEDIUM Priority Issues
|
||||
|
||||
| Issue | Location | Description |
|
||||
|-------|----------|-------------|
|
||||
| 46x `as any` assertions | Frontend | Reduces TypeScript safety |
|
||||
| WebSocket token in URL | `api.ts:2418-2423` | Security concern |
|
||||
|
||||
### Fixed Issues (2025-01-04)
|
||||
|
||||
| Issue | Fix Applied |
|
||||
|-------|-------------|
|
||||
| Duplicate auth dependencies | Consolidated to `core/dependencies.py`, removed from `api.py` |
|
||||
| Direct sqlite3 usage for main DB | Changed to `app_state.db.get_connection()` in `media.py` |
|
||||
| Forum wrapper missing signal handlers | Added `setup_signal_handlers()` and `set_database_reference()` |
|
||||
| Missing admin check on batch_move | Changed to `require_admin` dependency |
|
||||
| Duplicate SQL filter constants | Extracted `MEDIA_FILTERS` to `core/utils.py` |
|
||||
| Logger.log() calls | Changed to `logger.debug()` in `media.py` |
|
||||
|
||||
### Code Patterns to Follow
|
||||
|
||||
**Database access:**
|
||||
```python
|
||||
# Use context manager with for_write flag
|
||||
with app_state.db.get_connection(for_write=True) as conn:
|
||||
cursor = conn.cursor()
|
||||
cursor.execute("INSERT INTO ...", (params,))
|
||||
```
|
||||
|
||||
**Error handling:**
|
||||
```python
|
||||
# Use the handle_exceptions decorator
|
||||
from core.exceptions import handle_exceptions
|
||||
|
||||
@router.get("/endpoint")
|
||||
@handle_exceptions("OperationName")
|
||||
async def endpoint():
|
||||
...
|
||||
```
|
||||
|
||||
**Logging:**
|
||||
```python
|
||||
# Use logger methods, not logger.log()
|
||||
logger.debug("Message", module="ModuleName")
|
||||
logger.error("Error message", module="ModuleName")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## API Endpoints Summary
|
||||
|
||||
The backend has **21 routers** providing **150+ endpoints**:
|
||||
|
||||
| Router | Prefix | Key Endpoints |
|
||||
|--------|--------|---------------|
|
||||
| `auth` | `/api/auth` | Login, logout, 2FA, preferences |
|
||||
| `downloads` | `/api/downloads` | List, search, analytics, filters |
|
||||
| `media` | `/api/media` | Gallery, batch ops, thumbnails |
|
||||
| `recycle` | `/api/recycle` | Recycle bin management |
|
||||
| `review` | `/api/review` | Review queue for new content |
|
||||
| `face` | `/api/face` | Face recognition, references |
|
||||
| `video` | `/api/video` | Video streaming, info |
|
||||
| `video_queue` | `/api/video-queue` | Download queue management |
|
||||
| `scheduler` | `/api/scheduler` | Task scheduling, status |
|
||||
| `platforms` | `/api/platforms` | Platform configs, triggers |
|
||||
| `config` | `/api/config` | Application settings |
|
||||
| `celebrity` | `/api/celebrity` | Celebrity discovery |
|
||||
| `appearances` | `/api/appearances` | TV/podcast appearances |
|
||||
| `semantic` | `/api/semantic` | Semantic search |
|
||||
| `discovery` | `/api` | Smart folders, timeline |
|
||||
| `stats` | `/api` | Dashboard stats, errors |
|
||||
| `health` | `/api/health` | System health checks |
|
||||
| `maintenance` | `/api/maintenance` | Cleanup operations |
|
||||
| `scrapers` | `/api/scrapers` | Scraper configurations |
|
||||
| `manual_import` | `/api/manual-import` | Manual file imports |
|
||||
| `files` | `/files` | File serving |
|
||||
|
||||
---
|
||||
|
||||
## Remember
|
||||
|
||||
These rules exist because I repeatedly violated the user's trust. The deny list is a technical enforcement because my promises were not sufficient. Always ask before restarting the scheduler, and never touch authentication code.
|
||||
Reference in New Issue
Block a user