Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00
commit 0d7b2b1aab
389 changed files with 280296 additions and 0 deletions
--- a/docs/TECHNICAL_DEBT_ANALYSIS.md
+++ b/docs/TECHNICAL_DEBT_ANALYSIS.md
@@ -0,0 +1,591 @@
+# Technical Debt Analysis & Immediate Improvements
+**Date:** 2025-10-31
+**Version:** 6.3.6
+**Analyst:** Automated Code Review
+
+---
+
+## Executive Summary
+
+This document identifies technical debt, code smells, and immediate improvement opportunities in the Media Downloader codebase.
+
+---
+
+## Critical Technical Debt
+
+### 1. Monolithic API File (2,649 lines)
+**File:** `/opt/media-downloader/web/backend/api.py`
+**Severity:** HIGH
+**Impact:** Maintainability, Testing, Code Review
+
+**Current State:**
+- Single file contains all API endpoints
+- 50+ routes in one file
+- Multiple responsibilities (auth, downloads, media, scheduler, config)
+- Difficult to test individual components
+- High cognitive load for developers
+
+**Recommendation:**
+Refactor into modular structure:
+```
+web/backend/
+├── main.py (app initialization, 100-150 lines)
+├── routers/
+│   ├── auth.py (authentication endpoints)
+│   ├── downloads.py (download management)
+│   ├── media.py (media serving)
+│   ├── scheduler.py (scheduler management)
+│   ├── platforms.py (platform configuration)
+│   └── health.py (health & monitoring)
+├── services/
+│   ├── download_service.py (business logic)
+│   ├── media_service.py (media processing)
+│   └── scheduler_service.py (scheduling logic)
+└── models/
+    ├── requests.py (Pydantic request models)
+    └── responses.py (Pydantic response models)
+```
+
+**Effort:** 16-24 hours
+**Priority:** HIGH
+**Benefits:**
+- Easier to test individual routers
+- Better separation of concerns
+- Reduced merge conflicts
+- Faster development velocity
+
+---
+
+### 2. Large Module Files
+**Severity:** HIGH
+**Impact:** Maintainability
+
+**Problem Files:**
+- `modules/forum_downloader.py` (3,971 lines)
+- `modules/imginn_module.py` (2,542 lines)
+- `media-downloader.py` (2,653 lines)
+
+**Common Issues:**
+- God objects (classes doing too much)
+- Long methods (100+ lines)
+- Deep nesting (5+ levels)
+- Code duplication
+- Difficult to unit test
+
+**Recommendations:**
+
+#### Forum Downloader Refactoring:
+```
+modules/forum/
+├── __init__.py
+├── base.py (base forum class)
+├── authentication.py (login, 2FA)
+├── thread_parser.py (HTML parsing)
+├── image_extractor.py (image extraction)
+├── download_manager.py (download logic)
+└── sites/
+    ├── hqcelebcorner.py (site-specific)
+    └── picturepub.py (site-specific)
+```
+
+#### Instagram Module Refactoring:
+```
+modules/instagram/
+├── __init__.py
+├── base_instagram.py (shared logic)
+├── fastdl.py (FastDL implementation)
+├── imginn.py (ImgInn implementation)
+├── toolzu.py (Toolzu implementation)
+├── cookie_manager.py (cookie handling)
+├── flaresolverr.py (FlareSolverr integration)
+└── content_parser.py (HTML parsing)
+```
+
+**Effort:** 32-48 hours
+**Priority:** MEDIUM
+
+---
+
+### 3. Code Duplication in Instagram Modules
+**Severity:** MEDIUM
+**Impact:** Maintainability, Bug Fixes
+
+**Duplication Analysis:**
+- fastdl_module.py, imginn_module.py, toolzu_module.py share 60-70% code
+- Cookie management duplicated 3x
+- FlareSolverr integration duplicated 3x
+- HTML parsing logic duplicated 3x
+- Download logic very similar
+
+**Example Duplication:**
+```python
+# Appears in 3 files with minor variations
+def _get_flaresolverr_session(self):
+    response = requests.post(
+        f"{self.flaresolverr_url}/v1/sessions/create",
+        json={"maxTimeout": 60000}
+    )
+    if response.status_code == 200:
+        return response.json()['solution']['sessionId']
+```
+
+**Solution:** Create base class with shared logic
+```python
+# modules/instagram/base_instagram.py
+class BaseInstagramDownloader(ABC):
+    """Base class for Instagram-like services"""
+
+    def __init__(self, config, unified_db):
+        self.config = config
+        self.unified_db = unified_db
+        self.cookie_manager = CookieManager(config.get('cookie_file'))
+        self.flaresolverr = FlareSolverrClient(config.get('flaresolverr_url'))
+
+    def _get_or_create_session(self):
+        """Shared session management logic"""
+        # Common implementation
+
+    def _parse_stories(self, html: str) -> List[Dict]:
+        """Shared HTML parsing logic"""
+        # Common implementation
+
+    @abstractmethod
+    def _get_content_urls(self, username: str) -> List[str]:
+        """Platform-specific URL extraction"""
+        pass
+```
+
+**Effort:** 12-16 hours
+**Priority:** MEDIUM
+**Benefits:**
+- Fix bugs once, applies to all modules
+- Easier to add new Instagram-like platforms
+- Less code to maintain
+- Consistent behavior
+
+---
+
+## Medium Priority Technical Debt
+
+### 4. Inconsistent Logging
+**Severity:** MEDIUM
+**Impact:** Debugging, Monitoring
+
+**Current State:**
+- Mix of `print()`, callbacks, `logging` module
+- No structured logging
+- Difficult to filter/search logs
+- No log levels in many places
+- No request IDs for tracing
+
+**Examples:**
+```python
+# Different logging approaches in codebase
+print(f"Downloading {filename}")                          # Style 1
+if self.log_callback:                                     # Style 2
+    self.log_callback(f"[{platform}] {message}", "info")
+logger.info(f"Download complete: {filename}")             # Style 3
+```
+
+**Recommendation:** Standardize on structured logging
+```python
+# modules/structured_logger.py
+import logging
+import json
+from datetime import datetime
+from typing import Optional
+
+class StructuredLogger:
+    def __init__(self, name: str, context: Optional[Dict] = None):
+        self.logger = logging.getLogger(name)
+        self.context = context or {}
+
+    def log(self, level: str, message: str, **extra):
+        """Log with structured data"""
+        log_entry = {
+            'timestamp': datetime.now().isoformat(),
+            'level': level.upper(),
+            'logger': self.logger.name,
+            'message': message,
+            **self.context,
+            **extra
+        }
+
+        getattr(self.logger, level.lower())(json.dumps(log_entry))
+
+    def info(self, message: str, **extra):
+        self.log('info', message, **extra)
+
+    def error(self, message: str, **extra):
+        self.log('error', message, **extra)
+
+    def warning(self, message: str, **extra):
+        self.log('warning', message, **extra)
+
+    def with_context(self, **context) -> 'StructuredLogger':
+        """Create logger with additional context"""
+        new_context = {**self.context, **context}
+        return StructuredLogger(self.logger.name, new_context)
+
+# Usage
+logger = StructuredLogger('downloader')
+request_logger = logger.with_context(request_id='abc123', user_id=42)
+
+request_logger.info('Starting download',
+    platform='instagram',
+    username='testuser',
+    content_type='stories'
+)
+# Output: {"timestamp": "2025-10-31T13:00:00", "level": "INFO",
+#          "message": "Starting download", "request_id": "abc123",
+#          "user_id": 42, "platform": "instagram", ...}
+```
+
+**Effort:** 8-12 hours
+**Priority:** MEDIUM
+
+---
+
+### 5. Missing Database Migrations System
+**Severity:** MEDIUM
+**Impact:** Deployment, Upgrades
+
+**Current State:**
+- Schema changes via ad-hoc ALTER TABLE statements
+- No version tracking
+- No rollback capability
+- Difficult to deploy across environments
+- Manual schema updates error-prone
+
+**Recommendation:** Implement Alembic migrations
+```bash
+# Install Alembic
+pip install alembic
+
+# Initialize
+alembic init alembic
+
+# Create migration
+alembic revision --autogenerate -m "Add user preferences column"
+
+# Apply migrations
+alembic upgrade head
+
+# Rollback
+alembic downgrade -1
+```
+
+**Migration Example:**
+```python
+# alembic/versions/001_add_user_preferences.py
+def upgrade():
+    op.add_column('users', sa.Column('preferences', sa.JSON(), nullable=True))
+    op.create_index('idx_users_username', 'users', ['username'])
+
+def downgrade():
+    op.drop_index('idx_users_username', 'users')
+    op.drop_column('users', 'preferences')
+```
+
+**Effort:** 6-8 hours
+**Priority:** MEDIUM
+
+---
+
+### 6. No API Documentation (OpenAPI/Swagger)
+**Severity:** MEDIUM
+**Impact:** Integration, Developer Experience
+
+**Current State:**
+- No interactive API documentation
+- No schema validation documentation
+- Difficult for third-party integrations
+- Manual endpoint discovery
+
+**Solution:** FastAPI automatically generates OpenAPI docs
+```python
+# main.py
+app = FastAPI(
+    title="Media Downloader API",
+    description="Unified media downloading system",
+    version="6.3.6",
+    docs_url="/api/docs",
+    redoc_url="/api/redoc"
+)
+
+# Add tags for organization
+@app.get("/api/downloads", tags=["Downloads"])
+async def get_downloads():
+    """
+    Get list of downloads with filtering.
+
+    Returns:
+        List of download records with metadata
+
+    Raises:
+        401: Unauthorized - Missing or invalid authentication
+        500: Internal Server Error - Database or system error
+    """
+    pass
+```
+
+**Access docs at:**
+- Swagger UI: `http://localhost:8000/api/docs`
+- ReDoc: `http://localhost:8000/api/redoc`
+
+**Effort:** 4-6 hours (adding descriptions, examples)
+**Priority:** MEDIUM
+
+---
+
+## Low Priority Technical Debt
+
+### 7. Frontend Type Safety Gaps
+**Severity:** LOW
+**Impact:** Development Velocity
+
+**Remaining Issues:**
+- Some components still use `any` type
+- API response types not fully typed
+- Props interfaces could be more specific
+- Missing null checks in places
+
+**Solution:** Progressive enhancement with new types file
+```typescript
+// Update components to use types from types/index.ts
+import { Download, Platform, User } from '../types'
+
+interface DownloadListProps {
+  downloads: Download[]
+  onSelect: (download: Download) => void
+  currentUser: User
+}
+
+const DownloadList: React.FC<DownloadListProps> = ({
+  downloads,
+  onSelect,
+  currentUser
+}) => {
+  // Fully typed component
+}
+```
+
+**Effort:** 6-8 hours
+**Priority:** LOW
+
+---
+
+### 8. Hardcoded Configuration Values
+**Severity:** LOW
+**Impact:** Flexibility
+
+**Examples:**
+```python
+# Hardcoded paths
+base_path = Path("/opt/immich/md")
+media_base = Path("/opt/immich/md")
+
+# Hardcoded timeouts
+timeout=10.0
+timeout=30
+
+# Hardcoded limits
+limit: int = 100
+```
+
+**Solution:** Move to configuration
+```python
+# config/defaults.py
+DEFAULTS = {
+    'media_base_path': '/opt/immich/md',
+    'database_timeout': 10.0,
+    'api_timeout': 30.0,
+    'default_page_limit': 100,
+    'max_page_limit': 1000,
+    'thumbnail_size': (300, 300),
+    'cache_ttl': 300
+}
+
+# Usage
+from config import get_config
+config = get_config()
+base_path = Path(config.get('media_base_path'))
+```
+
+**Effort:** 4-6 hours
+**Priority:** LOW
+
+---
+
+## Code Quality Improvements
+
+### 9. Add Pre-commit Hooks
+**Effort:** 2-3 hours
+**Priority:** MEDIUM
+
+**Setup:**
+```yaml
+# .pre-commit-config.yaml
+repos:
+  - repo: https://github.com/psf/black
+    rev: 23.12.1
+    hooks:
+      - id: black
+        language_version: python3.12
+
+  - repo: https://github.com/PyCQA/flake8
+    rev: 7.0.0
+    hooks:
+      - id: flake8
+        args: [--max-line-length=120]
+
+  - repo: https://github.com/pre-commit/mirrors-mypy
+    rev: v1.8.0
+    hooks:
+      - id: mypy
+        additional_dependencies: [types-all]
+
+  - repo: https://github.com/pre-commit/mirrors-eslint
+    rev: v8.56.0
+    hooks:
+      - id: eslint
+        files: \.(js|ts|tsx)$
+        types: [file]
+```
+
+**Benefits:**
+- Automatic code formatting
+- Catch errors before commit
+- Enforce code style
+- Prevent bad commits
+
+---
+
+### 10. Add GitHub Actions CI/CD
+**Effort:** 4-6 hours
+**Priority:** MEDIUM
+
+**Workflow:**
+```yaml
+# .github/workflows/ci.yml
+name: CI
+
+on: [push, pull_request]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-python@v4
+        with:
+          python-version: '3.12'
+      - run: pip install -r requirements.txt
+      - run: pytest tests/
+      - run: python -m py_compile **/*.py
+
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - run: pip install black flake8
+      - run: black --check .
+      - run: flake8 .
+
+  frontend:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v3
+      - uses: actions/setup-node@v3
+      - run: npm install
+      - run: npm run build
+      - run: npm run lint
+```
+
+---
+
+## Immediate Quick Wins (< 2 hours each)
+
+### 1. Add Request ID Tracking
+```python
+import uuid
+from fastapi import Request
+
+@app.middleware("http")
+async def add_request_id(request: Request, call_next):
+    request.state.request_id = str(uuid.uuid4())
+    response = await call_next(request)
+    response.headers["X-Request-ID"] = request.state.request_id
+    return response
+```
+
+### 2. Add Response Time Logging
+```python
+import time
+
+@app.middleware("http")
+async def log_response_time(request: Request, call_next):
+    start = time.time()
+    response = await call_next(request)
+    duration = time.time() - start
+    logger.info(f"{request.method} {request.url.path} - {duration:.3f}s")
+    return response
+```
+
+### 3. Add Health Check Versioning
+```python
+@app.get("/api/health")
+async def health():
+    return {
+        "status": "healthy",
+        "version": "6.3.6",
+        "build_date": "2025-10-31",
+        "python_version": sys.version,
+        "uptime": get_uptime()
+    }
+```
+
+### 4. Add CORS Configuration
+```python
+from fastapi.middleware.cors import CORSMiddleware
+
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["https://your-domain.com"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+```
+
+### 5. Add Compression Middleware
+```python
+from fastapi.middleware.gzip import GZipMiddleware
+
+app.add_middleware(GZipMiddleware, minimum_size=1000)
+```
+
+---
+
+## Summary
+
+**Total Technical Debt Identified:** 10 major items
+**Estimated Total Effort:** 100-140 hours
+**Recommended Priority Order:**
+
+1. **Immediate (< 2h each):** Quick wins listed above
+2. **Week 1-2 (16-24h):** Refactor api.py into modules
+3. **Week 3-4 (16-24h):** Implement testing suite
+4. **Month 2 (32-48h):** Refactor large module files
+5. **Month 3 (30-40h):** Address remaining items
+
+**ROI Analysis:**
+- High ROI: API refactoring, testing suite, logging standardization
+- Medium ROI: Database migrations, code deduplication
+- Low ROI (but important): Type safety, pre-commit hooks
+
+**Next Steps:**
+1. Review and prioritize with team
+2. Create issues for each item
+3. Start with quick wins for immediate impact
+4. Tackle high-impact items in sprints