Files
media-downloader/docs/TECHNICAL_DEBT_ANALYSIS.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

14 KiB

Technical Debt Analysis & Immediate Improvements

Date: 2025-10-31 Version: 6.3.6 Analyst: Automated Code Review


Executive Summary

This document identifies technical debt, code smells, and immediate improvement opportunities in the Media Downloader codebase.


Critical Technical Debt

1. Monolithic API File (2,649 lines)

File: /opt/media-downloader/web/backend/api.py Severity: HIGH Impact: Maintainability, Testing, Code Review

Current State:

  • Single file contains all API endpoints
  • 50+ routes in one file
  • Multiple responsibilities (auth, downloads, media, scheduler, config)
  • Difficult to test individual components
  • High cognitive load for developers

Recommendation: Refactor into modular structure:

web/backend/
├── main.py (app initialization, 100-150 lines)
├── routers/
│   ├── auth.py (authentication endpoints)
│   ├── downloads.py (download management)
│   ├── media.py (media serving)
│   ├── scheduler.py (scheduler management)
│   ├── platforms.py (platform configuration)
│   └── health.py (health & monitoring)
├── services/
│   ├── download_service.py (business logic)
│   ├── media_service.py (media processing)
│   └── scheduler_service.py (scheduling logic)
└── models/
    ├── requests.py (Pydantic request models)
    └── responses.py (Pydantic response models)

Effort: 16-24 hours Priority: HIGH Benefits:

  • Easier to test individual routers
  • Better separation of concerns
  • Reduced merge conflicts
  • Faster development velocity

2. Large Module Files

Severity: HIGH Impact: Maintainability

Problem Files:

  • modules/forum_downloader.py (3,971 lines)
  • modules/imginn_module.py (2,542 lines)
  • media-downloader.py (2,653 lines)

Common Issues:

  • God objects (classes doing too much)
  • Long methods (100+ lines)
  • Deep nesting (5+ levels)
  • Code duplication
  • Difficult to unit test

Recommendations:

Forum Downloader Refactoring:

modules/forum/
├── __init__.py
├── base.py (base forum class)
├── authentication.py (login, 2FA)
├── thread_parser.py (HTML parsing)
├── image_extractor.py (image extraction)
├── download_manager.py (download logic)
└── sites/
    ├── hqcelebcorner.py (site-specific)
    └── picturepub.py (site-specific)

Instagram Module Refactoring:

modules/instagram/
├── __init__.py
├── base_instagram.py (shared logic)
├── fastdl.py (FastDL implementation)
├── imginn.py (ImgInn implementation)
├── toolzu.py (Toolzu implementation)
├── cookie_manager.py (cookie handling)
├── flaresolverr.py (FlareSolverr integration)
└── content_parser.py (HTML parsing)

Effort: 32-48 hours Priority: MEDIUM


3. Code Duplication in Instagram Modules

Severity: MEDIUM Impact: Maintainability, Bug Fixes

Duplication Analysis:

  • fastdl_module.py, imginn_module.py, toolzu_module.py share 60-70% code
  • Cookie management duplicated 3x
  • FlareSolverr integration duplicated 3x
  • HTML parsing logic duplicated 3x
  • Download logic very similar

Example Duplication:

# Appears in 3 files with minor variations
def _get_flaresolverr_session(self):
    response = requests.post(
        f"{self.flaresolverr_url}/v1/sessions/create",
        json={"maxTimeout": 60000}
    )
    if response.status_code == 200:
        return response.json()['solution']['sessionId']

Solution: Create base class with shared logic

# modules/instagram/base_instagram.py
class BaseInstagramDownloader(ABC):
    """Base class for Instagram-like services"""

    def __init__(self, config, unified_db):
        self.config = config
        self.unified_db = unified_db
        self.cookie_manager = CookieManager(config.get('cookie_file'))
        self.flaresolverr = FlareSolverrClient(config.get('flaresolverr_url'))

    def _get_or_create_session(self):
        """Shared session management logic"""
        # Common implementation

    def _parse_stories(self, html: str) -> List[Dict]:
        """Shared HTML parsing logic"""
        # Common implementation

    @abstractmethod
    def _get_content_urls(self, username: str) -> List[str]:
        """Platform-specific URL extraction"""
        pass

Effort: 12-16 hours Priority: MEDIUM Benefits:

  • Fix bugs once, applies to all modules
  • Easier to add new Instagram-like platforms
  • Less code to maintain
  • Consistent behavior

Medium Priority Technical Debt

4. Inconsistent Logging

Severity: MEDIUM Impact: Debugging, Monitoring

Current State:

  • Mix of print(), callbacks, logging module
  • No structured logging
  • Difficult to filter/search logs
  • No log levels in many places
  • No request IDs for tracing

Examples:

# Different logging approaches in codebase
print(f"Downloading {filename}")                          # Style 1
if self.log_callback:                                     # Style 2
    self.log_callback(f"[{platform}] {message}", "info")
logger.info(f"Download complete: {filename}")             # Style 3

Recommendation: Standardize on structured logging

# modules/structured_logger.py
import logging
import json
from datetime import datetime
from typing import Optional

class StructuredLogger:
    def __init__(self, name: str, context: Optional[Dict] = None):
        self.logger = logging.getLogger(name)
        self.context = context or {}

    def log(self, level: str, message: str, **extra):
        """Log with structured data"""
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'level': level.upper(),
            'logger': self.logger.name,
            'message': message,
            **self.context,
            **extra
        }

        getattr(self.logger, level.lower())(json.dumps(log_entry))

    def info(self, message: str, **extra):
        self.log('info', message, **extra)

    def error(self, message: str, **extra):
        self.log('error', message, **extra)

    def warning(self, message: str, **extra):
        self.log('warning', message, **extra)

    def with_context(self, **context) -> 'StructuredLogger':
        """Create logger with additional context"""
        new_context = {**self.context, **context}
        return StructuredLogger(self.logger.name, new_context)

# Usage
logger = StructuredLogger('downloader')
request_logger = logger.with_context(request_id='abc123', user_id=42)

request_logger.info('Starting download',
    platform='instagram',
    username='testuser',
    content_type='stories'
)
# Output: {"timestamp": "2025-10-31T13:00:00", "level": "INFO",
#          "message": "Starting download", "request_id": "abc123",
#          "user_id": 42, "platform": "instagram", ...}

Effort: 8-12 hours Priority: MEDIUM


5. Missing Database Migrations System

Severity: MEDIUM Impact: Deployment, Upgrades

Current State:

  • Schema changes via ad-hoc ALTER TABLE statements
  • No version tracking
  • No rollback capability
  • Difficult to deploy across environments
  • Manual schema updates error-prone

Recommendation: Implement Alembic migrations

# Install Alembic
pip install alembic

# Initialize
alembic init alembic

# Create migration
alembic revision --autogenerate -m "Add user preferences column"

# Apply migrations
alembic upgrade head

# Rollback
alembic downgrade -1

Migration Example:

# alembic/versions/001_add_user_preferences.py
def upgrade():
    op.add_column('users', sa.Column('preferences', sa.JSON(), nullable=True))
    op.create_index('idx_users_username', 'users', ['username'])

def downgrade():
    op.drop_index('idx_users_username', 'users')
    op.drop_column('users', 'preferences')

Effort: 6-8 hours Priority: MEDIUM


6. No API Documentation (OpenAPI/Swagger)

Severity: MEDIUM Impact: Integration, Developer Experience

Current State:

  • No interactive API documentation
  • No schema validation documentation
  • Difficult for third-party integrations
  • Manual endpoint discovery

Solution: FastAPI automatically generates OpenAPI docs

# main.py
app = FastAPI(
    title="Media Downloader API",
    description="Unified media downloading system",
    version="6.3.6",
    docs_url="/api/docs",
    redoc_url="/api/redoc"
)

# Add tags for organization
@app.get("/api/downloads", tags=["Downloads"])
async def get_downloads():
    """
    Get list of downloads with filtering.

    Returns:
        List of download records with metadata

    Raises:
        401: Unauthorized - Missing or invalid authentication
        500: Internal Server Error - Database or system error
    """
    pass

Access docs at:

  • Swagger UI: http://localhost:8000/api/docs
  • ReDoc: http://localhost:8000/api/redoc

Effort: 4-6 hours (adding descriptions, examples) Priority: MEDIUM


Low Priority Technical Debt

7. Frontend Type Safety Gaps

Severity: LOW Impact: Development Velocity

Remaining Issues:

  • Some components still use any type
  • API response types not fully typed
  • Props interfaces could be more specific
  • Missing null checks in places

Solution: Progressive enhancement with new types file

// Update components to use types from types/index.ts
import { Download, Platform, User } from '../types'

interface DownloadListProps {
  downloads: Download[]
  onSelect: (download: Download) => void
  currentUser: User
}

const DownloadList: React.FC<DownloadListProps> = ({
  downloads,
  onSelect,
  currentUser
}) => {
  // Fully typed component
}

Effort: 6-8 hours Priority: LOW


8. Hardcoded Configuration Values

Severity: LOW Impact: Flexibility

Examples:

# Hardcoded paths
base_path = Path("/opt/immich/md")
media_base = Path("/opt/immich/md")

# Hardcoded timeouts
timeout=10.0
timeout=30

# Hardcoded limits
limit: int = 100

Solution: Move to configuration

# config/defaults.py
DEFAULTS = {
    'media_base_path': '/opt/immich/md',
    'database_timeout': 10.0,
    'api_timeout': 30.0,
    'default_page_limit': 100,
    'max_page_limit': 1000,
    'thumbnail_size': (300, 300),
    'cache_ttl': 300
}

# Usage
from config import get_config
config = get_config()
base_path = Path(config.get('media_base_path'))

Effort: 4-6 hours Priority: LOW


Code Quality Improvements

9. Add Pre-commit Hooks

Effort: 2-3 hours Priority: MEDIUM

Setup:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/psf/black
    rev: 23.12.1
    hooks:
      - id: black
        language_version: python3.12

  - repo: https://github.com/PyCQA/flake8
    rev: 7.0.0
    hooks:
      - id: flake8
        args: [--max-line-length=120]

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0
    hooks:
      - id: mypy
        additional_dependencies: [types-all]

  - repo: https://github.com/pre-commit/mirrors-eslint
    rev: v8.56.0
    hooks:
      - id: eslint
        files: \.(js|ts|tsx)$
        types: [file]

Benefits:

  • Automatic code formatting
  • Catch errors before commit
  • Enforce code style
  • Prevent bad commits

10. Add GitHub Actions CI/CD

Effort: 4-6 hours Priority: MEDIUM

Workflow:

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: pytest tests/
      - run: python -m py_compile **/*.py

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install black flake8
      - run: black --check .
      - run: flake8 .

  frontend:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - run: npm install
      - run: npm run build
      - run: npm run lint

Immediate Quick Wins (< 2 hours each)

1. Add Request ID Tracking

import uuid
from fastapi import Request

@app.middleware("http")
async def add_request_id(request: Request, call_next):
    request.state.request_id = str(uuid.uuid4())
    response = await call_next(request)
    response.headers["X-Request-ID"] = request.state.request_id
    return response

2. Add Response Time Logging

import time

@app.middleware("http")
async def log_response_time(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration = time.time() - start
    logger.info(f"{request.method} {request.url.path} - {duration:.3f}s")
    return response

3. Add Health Check Versioning

@app.get("/api/health")
async def health():
    return {
        "status": "healthy",
        "version": "6.3.6",
        "build_date": "2025-10-31",
        "python_version": sys.version,
        "uptime": get_uptime()
    }

4. Add CORS Configuration

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://your-domain.com"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

5. Add Compression Middleware

from fastapi.middleware.gzip import GZipMiddleware

app.add_middleware(GZipMiddleware, minimum_size=1000)

Summary

Total Technical Debt Identified: 10 major items Estimated Total Effort: 100-140 hours Recommended Priority Order:

  1. Immediate (< 2h each): Quick wins listed above
  2. Week 1-2 (16-24h): Refactor api.py into modules
  3. Week 3-4 (16-24h): Implement testing suite
  4. Month 2 (32-48h): Refactor large module files
  5. Month 3 (30-40h): Address remaining items

ROI Analysis:

  • High ROI: API refactoring, testing suite, logging standardization
  • Medium ROI: Database migrations, code deduplication
  • Low ROI (but important): Type safety, pre-commit hooks

Next Steps:

  1. Review and prioritize with team
  2. Create issues for each item
  3. Start with quick wins for immediate impact
  4. Tackle high-impact items in sprints