Files

Todd 0d7b2b1aab Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-29 22:42:55 -04:00

14 KiB

Raw Permalink Blame History

Technical Debt Analysis & Immediate Improvements

Date: 2025-10-31 Version: 6.3.6 Analyst: Automated Code Review

Executive Summary

This document identifies technical debt, code smells, and immediate improvement opportunities in the Media Downloader codebase.

Critical Technical Debt

1. Monolithic API File (2,649 lines)

File: /opt/media-downloader/web/backend/api.py Severity: HIGH Impact: Maintainability, Testing, Code Review

Current State:

Single file contains all API endpoints
50+ routes in one file
Multiple responsibilities (auth, downloads, media, scheduler, config)
Difficult to test individual components
High cognitive load for developers

Recommendation: Refactor into modular structure:

web/backend/
├── main.py (app initialization, 100-150 lines)
├── routers/
│   ├── auth.py (authentication endpoints)
│   ├── downloads.py (download management)
│   ├── media.py (media serving)
│   ├── scheduler.py (scheduler management)
│   ├── platforms.py (platform configuration)
│   └── health.py (health & monitoring)
├── services/
│   ├── download_service.py (business logic)
│   ├── media_service.py (media processing)
│   └── scheduler_service.py (scheduling logic)
└── models/
    ├── requests.py (Pydantic request models)
    └── responses.py (Pydantic response models)

Effort: 16-24 hours Priority: HIGH Benefits:

Easier to test individual routers
Better separation of concerns
Reduced merge conflicts
Faster development velocity

2. Large Module Files

Severity: HIGH Impact: Maintainability

Problem Files:

modules/forum_downloader.py (3,971 lines)
modules/imginn_module.py (2,542 lines)
media-downloader.py (2,653 lines)

Common Issues:

God objects (classes doing too much)
Long methods (100+ lines)
Deep nesting (5+ levels)
Code duplication
Difficult to unit test

Recommendations:

Forum Downloader Refactoring:

modules/forum/
├── __init__.py
├── base.py (base forum class)
├── authentication.py (login, 2FA)
├── thread_parser.py (HTML parsing)
├── image_extractor.py (image extraction)
├── download_manager.py (download logic)
└── sites/
    ├── hqcelebcorner.py (site-specific)
    └── picturepub.py (site-specific)

Instagram Module Refactoring:

modules/instagram/
├── __init__.py
├── base_instagram.py (shared logic)
├── fastdl.py (FastDL implementation)
├── imginn.py (ImgInn implementation)
├── toolzu.py (Toolzu implementation)
├── cookie_manager.py (cookie handling)
├── flaresolverr.py (FlareSolverr integration)
└── content_parser.py (HTML parsing)

Effort: 32-48 hours Priority: MEDIUM

3. Code Duplication in Instagram Modules

Severity: MEDIUM Impact: Maintainability, Bug Fixes

Duplication Analysis:

fastdl_module.py, imginn_module.py, toolzu_module.py share 60-70% code
Cookie management duplicated 3x
FlareSolverr integration duplicated 3x
HTML parsing logic duplicated 3x
Download logic very similar

Example Duplication:

# Appears in 3 files with minor variations
def _get_flaresolverr_session(self):
    response = requests.post(
        f"{self.flaresolverr_url}/v1/sessions/create",
        json={"maxTimeout": 60000}
    )
    if response.status_code == 200:
        return response.json()['solution']['sessionId']

Solution: Create base class with shared logic

# modules/instagram/base_instagram.py
class BaseInstagramDownloader(ABC):
    """Base class for Instagram-like services"""

    def __init__(self, config, unified_db):
        self.config = config
        self.unified_db = unified_db
        self.cookie_manager = CookieManager(config.get('cookie_file'))
        self.flaresolverr = FlareSolverrClient(config.get('flaresolverr_url'))

    def _get_or_create_session(self):
        """Shared session management logic"""
        # Common implementation

    def _parse_stories(self, html: str) -> List[Dict]:
        """Shared HTML parsing logic"""
        # Common implementation

    @abstractmethod
    def _get_content_urls(self, username: str) -> List[str]:
        """Platform-specific URL extraction"""
        pass

Effort: 12-16 hours Priority: MEDIUM Benefits:

Fix bugs once, applies to all modules
Easier to add new Instagram-like platforms
Less code to maintain
Consistent behavior

Medium Priority Technical Debt

4. Inconsistent Logging

Severity: MEDIUM Impact: Debugging, Monitoring

Current State:

Mix of print(), callbacks, logging module
No structured logging
Difficult to filter/search logs
No log levels in many places
No request IDs for tracing

Examples:

# Different logging approaches in codebase
print(f"Downloading {filename}")                          # Style 1
if self.log_callback:                                     # Style 2
    self.log_callback(f"[{platform}] {message}", "info")
logger.info(f"Download complete: {filename}")             # Style 3

Recommendation: Standardize on structured logging

# modules/structured_logger.py
import logging
import json
from datetime import datetime
from typing import Optional

class StructuredLogger:
    def __init__(self, name: str, context: Optional[Dict] = None):
        self.logger = logging.getLogger(name)
        self.context = context or {}

    def log(self, level: str, message: str, **extra):
        """Log with structured data"""
        log_entry = {
            'timestamp': datetime.now().isoformat(),
            'level': level.upper(),
            'logger': self.logger.name,
            'message': message,
            **self.context,
            **extra
        }

        getattr(self.logger, level.lower())(json.dumps(log_entry))

    def info(self, message: str, **extra):
        self.log('info', message, **extra)

    def error(self, message: str, **extra):
        self.log('error', message, **extra)

    def warning(self, message: str, **extra):
        self.log('warning', message, **extra)

    def with_context(self, **context) -> 'StructuredLogger':
        """Create logger with additional context"""
        new_context = {**self.context, **context}
        return StructuredLogger(self.logger.name, new_context)

# Usage
logger = StructuredLogger('downloader')
request_logger = logger.with_context(request_id='abc123', user_id=42)

request_logger.info('Starting download',
    platform='instagram',
    username='testuser',
    content_type='stories'
)
# Output: {"timestamp": "2025-10-31T13:00:00", "level": "INFO",
#          "message": "Starting download", "request_id": "abc123",
#          "user_id": 42, "platform": "instagram", ...}

Effort: 8-12 hours Priority: MEDIUM

5. Missing Database Migrations System

Severity: MEDIUM Impact: Deployment, Upgrades

Current State:

Schema changes via ad-hoc ALTER TABLE statements
No version tracking
No rollback capability
Difficult to deploy across environments
Manual schema updates error-prone

Recommendation: Implement Alembic migrations

# Install Alembic
pip install alembic

# Initialize
alembic init alembic

# Create migration
alembic revision --autogenerate -m "Add user preferences column"

# Apply migrations
alembic upgrade head

# Rollback
alembic downgrade -1

Migration Example:

# alembic/versions/001_add_user_preferences.py
def upgrade():
    op.add_column('users', sa.Column('preferences', sa.JSON(), nullable=True))
    op.create_index('idx_users_username', 'users', ['username'])

def downgrade():
    op.drop_index('idx_users_username', 'users')
    op.drop_column('users', 'preferences')

Effort: 6-8 hours Priority: MEDIUM

6. No API Documentation (OpenAPI/Swagger)

Severity: MEDIUM Impact: Integration, Developer Experience

Current State:

No interactive API documentation
No schema validation documentation
Difficult for third-party integrations
Manual endpoint discovery

Solution: FastAPI automatically generates OpenAPI docs

# main.py
app = FastAPI(
    title="Media Downloader API",
    description="Unified media downloading system",
    version="6.3.6",
    docs_url="/api/docs",
    redoc_url="/api/redoc"
)

# Add tags for organization
@app.get("/api/downloads", tags=["Downloads"])
async def get_downloads():
    """
    Get list of downloads with filtering.

    Returns:
        List of download records with metadata

    Raises:
        401: Unauthorized - Missing or invalid authentication
        500: Internal Server Error - Database or system error
    """
    pass

Access docs at:

Swagger UI: http://localhost:8000/api/docs
ReDoc: http://localhost:8000/api/redoc

Effort: 4-6 hours (adding descriptions, examples) Priority: MEDIUM

Low Priority Technical Debt

7. Frontend Type Safety Gaps

Severity: LOW Impact: Development Velocity

Remaining Issues:

Some components still use any type
API response types not fully typed
Props interfaces could be more specific
Missing null checks in places

Solution: Progressive enhancement with new types file

// Update components to use types from types/index.ts
import { Download, Platform, User } from '../types'

interface DownloadListProps {
  downloads: Download[]
  onSelect: (download: Download) => void
  currentUser: User
}

const DownloadList: React.FC<DownloadListProps> = ({
  downloads,
  onSelect,
  currentUser
}) => {
  // Fully typed component
}

Effort: 6-8 hours Priority: LOW

8. Hardcoded Configuration Values

Severity: LOW Impact: Flexibility

Examples:

# Hardcoded paths
base_path = Path("/opt/immich/md")
media_base = Path("/opt/immich/md")

# Hardcoded timeouts
timeout=10.0
timeout=30

# Hardcoded limits
limit: int = 100

Solution: Move to configuration

# config/defaults.py
DEFAULTS = {
    'media_base_path': '/opt/immich/md',
    'database_timeout': 10.0,
    'api_timeout': 30.0,
    'default_page_limit': 100,
    'max_page_limit': 1000,
    'thumbnail_size': (300, 300),
    'cache_ttl': 300
}

# Usage
from config import get_config
config = get_config()
base_path = Path(config.get('media_base_path'))

Effort: 4-6 hours Priority: LOW

Code Quality Improvements

9. Add Pre-commit Hooks

Effort: 2-3 hours Priority: MEDIUM

Setup:

# .pre-commit-config.yaml
repos:
  - repo: https://github.com/psf/black
    rev: 23.12.1
    hooks:
      - id: black
        language_version: python3.12

  - repo: https://github.com/PyCQA/flake8
    rev: 7.0.0
    hooks:
      - id: flake8
        args: [--max-line-length=120]

  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.8.0
    hooks:
      - id: mypy
        additional_dependencies: [types-all]

  - repo: https://github.com/pre-commit/mirrors-eslint
    rev: v8.56.0
    hooks:
      - id: eslint
        files: \.(js|ts|tsx)$
        types: [file]

Benefits:

Automatic code formatting
Catch errors before commit
Enforce code style
Prevent bad commits

10. Add GitHub Actions CI/CD

Effort: 4-6 hours Priority: MEDIUM

Workflow:

# .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.12'
      - run: pip install -r requirements.txt
      - run: pytest tests/
      - run: python -m py_compile **/*.py

  lint:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - run: pip install black flake8
      - run: black --check .
      - run: flake8 .

  frontend:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-node@v3
      - run: npm install
      - run: npm run build
      - run: npm run lint

Immediate Quick Wins (< 2 hours each)

1. Add Request ID Tracking

import uuid
from fastapi import Request

@app.middleware("http")
async def add_request_id(request: Request, call_next):
    request.state.request_id = str(uuid.uuid4())
    response = await call_next(request)
    response.headers["X-Request-ID"] = request.state.request_id
    return response

2. Add Response Time Logging

import time

@app.middleware("http")
async def log_response_time(request: Request, call_next):
    start = time.time()
    response = await call_next(request)
    duration = time.time() - start
    logger.info(f"{request.method} {request.url.path} - {duration:.3f}s")
    return response

3. Add Health Check Versioning

@app.get("/api/health")
async def health():
    return {
        "status": "healthy",
        "version": "6.3.6",
        "build_date": "2025-10-31",
        "python_version": sys.version,
        "uptime": get_uptime()
    }

4. Add CORS Configuration

from fastapi.middleware.cors import CORSMiddleware

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://your-domain.com"],
    allow_credentials=True,
    allow_methods=["*"],
    allow_headers=["*"],
)

5. Add Compression Middleware

from fastapi.middleware.gzip import GZipMiddleware

app.add_middleware(GZipMiddleware, minimum_size=1000)

Summary

Total Technical Debt Identified: 10 major items Estimated Total Effort: 100-140 hours Recommended Priority Order:

Immediate (< 2h each): Quick wins listed above
Week 1-2 (16-24h): Refactor api.py into modules
Week 3-4 (16-24h): Implement testing suite
Month 2 (32-48h): Refactor large module files
Month 3 (30-40h): Address remaining items

ROI Analysis:

High ROI: API refactoring, testing suite, logging standardization
Medium ROI: Database migrations, code deduplication
Low ROI (but important): Type safety, pre-commit hooks

Next Steps:

Review and prioritize with team
Create issues for each item
Start with quick wins for immediate impact
Tackle high-impact items in sprints

14 KiB Raw Permalink Blame History

Technical Debt Analysis & Immediate Improvements

Executive Summary

Critical Technical Debt

1. Monolithic API File (2,649 lines)

2. Large Module Files

Forum Downloader Refactoring:

Instagram Module Refactoring:

3. Code Duplication in Instagram Modules

Medium Priority Technical Debt

4. Inconsistent Logging

5. Missing Database Migrations System

6. No API Documentation (OpenAPI/Swagger)

Low Priority Technical Debt

7. Frontend Type Safety Gaps

8. Hardcoded Configuration Values

Code Quality Improvements

9. Add Pre-commit Hooks

10. Add GitHub Actions CI/CD

Immediate Quick Wins (< 2 hours each)

1. Add Request ID Tracking

2. Add Response Time Logging

3. Add Health Check Versioning

4. Add CORS Configuration

5. Add Compression Middleware

Summary

14 KiB

Raw Permalink Blame History