14 KiB
Technical Debt Analysis & Immediate Improvements
Date: 2025-10-31 Version: 6.3.6 Analyst: Automated Code Review
Executive Summary
This document identifies technical debt, code smells, and immediate improvement opportunities in the Media Downloader codebase.
Critical Technical Debt
1. Monolithic API File (2,649 lines)
File: /opt/media-downloader/web/backend/api.py
Severity: HIGH
Impact: Maintainability, Testing, Code Review
Current State:
- Single file contains all API endpoints
- 50+ routes in one file
- Multiple responsibilities (auth, downloads, media, scheduler, config)
- Difficult to test individual components
- High cognitive load for developers
Recommendation: Refactor into modular structure:
web/backend/
├── main.py (app initialization, 100-150 lines)
├── routers/
│ ├── auth.py (authentication endpoints)
│ ├── downloads.py (download management)
│ ├── media.py (media serving)
│ ├── scheduler.py (scheduler management)
│ ├── platforms.py (platform configuration)
│ └── health.py (health & monitoring)
├── services/
│ ├── download_service.py (business logic)
│ ├── media_service.py (media processing)
│ └── scheduler_service.py (scheduling logic)
└── models/
├── requests.py (Pydantic request models)
└── responses.py (Pydantic response models)
Effort: 16-24 hours Priority: HIGH Benefits:
- Easier to test individual routers
- Better separation of concerns
- Reduced merge conflicts
- Faster development velocity
2. Large Module Files
Severity: HIGH Impact: Maintainability
Problem Files:
modules/forum_downloader.py(3,971 lines)modules/imginn_module.py(2,542 lines)media-downloader.py(2,653 lines)
Common Issues:
- God objects (classes doing too much)
- Long methods (100+ lines)
- Deep nesting (5+ levels)
- Code duplication
- Difficult to unit test
Recommendations:
Forum Downloader Refactoring:
modules/forum/
├── __init__.py
├── base.py (base forum class)
├── authentication.py (login, 2FA)
├── thread_parser.py (HTML parsing)
├── image_extractor.py (image extraction)
├── download_manager.py (download logic)
└── sites/
├── hqcelebcorner.py (site-specific)
└── picturepub.py (site-specific)
Instagram Module Refactoring:
modules/instagram/
├── __init__.py
├── base_instagram.py (shared logic)
├── fastdl.py (FastDL implementation)
├── imginn.py (ImgInn implementation)
├── toolzu.py (Toolzu implementation)
├── cookie_manager.py (cookie handling)
├── flaresolverr.py (FlareSolverr integration)
└── content_parser.py (HTML parsing)
Effort: 32-48 hours Priority: MEDIUM
3. Code Duplication in Instagram Modules
Severity: MEDIUM Impact: Maintainability, Bug Fixes
Duplication Analysis:
- fastdl_module.py, imginn_module.py, toolzu_module.py share 60-70% code
- Cookie management duplicated 3x
- FlareSolverr integration duplicated 3x
- HTML parsing logic duplicated 3x
- Download logic very similar
Example Duplication:
# Appears in 3 files with minor variations
def _get_flaresolverr_session(self):
response = requests.post(
f"{self.flaresolverr_url}/v1/sessions/create",
json={"maxTimeout": 60000}
)
if response.status_code == 200:
return response.json()['solution']['sessionId']
Solution: Create base class with shared logic
# modules/instagram/base_instagram.py
class BaseInstagramDownloader(ABC):
"""Base class for Instagram-like services"""
def __init__(self, config, unified_db):
self.config = config
self.unified_db = unified_db
self.cookie_manager = CookieManager(config.get('cookie_file'))
self.flaresolverr = FlareSolverrClient(config.get('flaresolverr_url'))
def _get_or_create_session(self):
"""Shared session management logic"""
# Common implementation
def _parse_stories(self, html: str) -> List[Dict]:
"""Shared HTML parsing logic"""
# Common implementation
@abstractmethod
def _get_content_urls(self, username: str) -> List[str]:
"""Platform-specific URL extraction"""
pass
Effort: 12-16 hours Priority: MEDIUM Benefits:
- Fix bugs once, applies to all modules
- Easier to add new Instagram-like platforms
- Less code to maintain
- Consistent behavior
Medium Priority Technical Debt
4. Inconsistent Logging
Severity: MEDIUM Impact: Debugging, Monitoring
Current State:
- Mix of
print(), callbacks,loggingmodule - No structured logging
- Difficult to filter/search logs
- No log levels in many places
- No request IDs for tracing
Examples:
# Different logging approaches in codebase
print(f"Downloading {filename}") # Style 1
if self.log_callback: # Style 2
self.log_callback(f"[{platform}] {message}", "info")
logger.info(f"Download complete: {filename}") # Style 3
Recommendation: Standardize on structured logging
# modules/structured_logger.py
import logging
import json
from datetime import datetime
from typing import Optional
class StructuredLogger:
def __init__(self, name: str, context: Optional[Dict] = None):
self.logger = logging.getLogger(name)
self.context = context or {}
def log(self, level: str, message: str, **extra):
"""Log with structured data"""
log_entry = {
'timestamp': datetime.now().isoformat(),
'level': level.upper(),
'logger': self.logger.name,
'message': message,
**self.context,
**extra
}
getattr(self.logger, level.lower())(json.dumps(log_entry))
def info(self, message: str, **extra):
self.log('info', message, **extra)
def error(self, message: str, **extra):
self.log('error', message, **extra)
def warning(self, message: str, **extra):
self.log('warning', message, **extra)
def with_context(self, **context) -> 'StructuredLogger':
"""Create logger with additional context"""
new_context = {**self.context, **context}
return StructuredLogger(self.logger.name, new_context)
# Usage
logger = StructuredLogger('downloader')
request_logger = logger.with_context(request_id='abc123', user_id=42)
request_logger.info('Starting download',
platform='instagram',
username='testuser',
content_type='stories'
)
# Output: {"timestamp": "2025-10-31T13:00:00", "level": "INFO",
# "message": "Starting download", "request_id": "abc123",
# "user_id": 42, "platform": "instagram", ...}
Effort: 8-12 hours Priority: MEDIUM
5. Missing Database Migrations System
Severity: MEDIUM Impact: Deployment, Upgrades
Current State:
- Schema changes via ad-hoc ALTER TABLE statements
- No version tracking
- No rollback capability
- Difficult to deploy across environments
- Manual schema updates error-prone
Recommendation: Implement Alembic migrations
# Install Alembic
pip install alembic
# Initialize
alembic init alembic
# Create migration
alembic revision --autogenerate -m "Add user preferences column"
# Apply migrations
alembic upgrade head
# Rollback
alembic downgrade -1
Migration Example:
# alembic/versions/001_add_user_preferences.py
def upgrade():
op.add_column('users', sa.Column('preferences', sa.JSON(), nullable=True))
op.create_index('idx_users_username', 'users', ['username'])
def downgrade():
op.drop_index('idx_users_username', 'users')
op.drop_column('users', 'preferences')
Effort: 6-8 hours Priority: MEDIUM
6. No API Documentation (OpenAPI/Swagger)
Severity: MEDIUM Impact: Integration, Developer Experience
Current State:
- No interactive API documentation
- No schema validation documentation
- Difficult for third-party integrations
- Manual endpoint discovery
Solution: FastAPI automatically generates OpenAPI docs
# main.py
app = FastAPI(
title="Media Downloader API",
description="Unified media downloading system",
version="6.3.6",
docs_url="/api/docs",
redoc_url="/api/redoc"
)
# Add tags for organization
@app.get("/api/downloads", tags=["Downloads"])
async def get_downloads():
"""
Get list of downloads with filtering.
Returns:
List of download records with metadata
Raises:
401: Unauthorized - Missing or invalid authentication
500: Internal Server Error - Database or system error
"""
pass
Access docs at:
- Swagger UI:
http://localhost:8000/api/docs - ReDoc:
http://localhost:8000/api/redoc
Effort: 4-6 hours (adding descriptions, examples) Priority: MEDIUM
Low Priority Technical Debt
7. Frontend Type Safety Gaps
Severity: LOW Impact: Development Velocity
Remaining Issues:
- Some components still use
anytype - API response types not fully typed
- Props interfaces could be more specific
- Missing null checks in places
Solution: Progressive enhancement with new types file
// Update components to use types from types/index.ts
import { Download, Platform, User } from '../types'
interface DownloadListProps {
downloads: Download[]
onSelect: (download: Download) => void
currentUser: User
}
const DownloadList: React.FC<DownloadListProps> = ({
downloads,
onSelect,
currentUser
}) => {
// Fully typed component
}
Effort: 6-8 hours Priority: LOW
8. Hardcoded Configuration Values
Severity: LOW Impact: Flexibility
Examples:
# Hardcoded paths
base_path = Path("/opt/immich/md")
media_base = Path("/opt/immich/md")
# Hardcoded timeouts
timeout=10.0
timeout=30
# Hardcoded limits
limit: int = 100
Solution: Move to configuration
# config/defaults.py
DEFAULTS = {
'media_base_path': '/opt/immich/md',
'database_timeout': 10.0,
'api_timeout': 30.0,
'default_page_limit': 100,
'max_page_limit': 1000,
'thumbnail_size': (300, 300),
'cache_ttl': 300
}
# Usage
from config import get_config
config = get_config()
base_path = Path(config.get('media_base_path'))
Effort: 4-6 hours Priority: LOW
Code Quality Improvements
9. Add Pre-commit Hooks
Effort: 2-3 hours Priority: MEDIUM
Setup:
# .pre-commit-config.yaml
repos:
- repo: https://github.com/psf/black
rev: 23.12.1
hooks:
- id: black
language_version: python3.12
- repo: https://github.com/PyCQA/flake8
rev: 7.0.0
hooks:
- id: flake8
args: [--max-line-length=120]
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.8.0
hooks:
- id: mypy
additional_dependencies: [types-all]
- repo: https://github.com/pre-commit/mirrors-eslint
rev: v8.56.0
hooks:
- id: eslint
files: \.(js|ts|tsx)$
types: [file]
Benefits:
- Automatic code formatting
- Catch errors before commit
- Enforce code style
- Prevent bad commits
10. Add GitHub Actions CI/CD
Effort: 4-6 hours Priority: MEDIUM
Workflow:
# .github/workflows/ci.yml
name: CI
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-python@v4
with:
python-version: '3.12'
- run: pip install -r requirements.txt
- run: pytest tests/
- run: python -m py_compile **/*.py
lint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- run: pip install black flake8
- run: black --check .
- run: flake8 .
frontend:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/setup-node@v3
- run: npm install
- run: npm run build
- run: npm run lint
Immediate Quick Wins (< 2 hours each)
1. Add Request ID Tracking
import uuid
from fastapi import Request
@app.middleware("http")
async def add_request_id(request: Request, call_next):
request.state.request_id = str(uuid.uuid4())
response = await call_next(request)
response.headers["X-Request-ID"] = request.state.request_id
return response
2. Add Response Time Logging
import time
@app.middleware("http")
async def log_response_time(request: Request, call_next):
start = time.time()
response = await call_next(request)
duration = time.time() - start
logger.info(f"{request.method} {request.url.path} - {duration:.3f}s")
return response
3. Add Health Check Versioning
@app.get("/api/health")
async def health():
return {
"status": "healthy",
"version": "6.3.6",
"build_date": "2025-10-31",
"python_version": sys.version,
"uptime": get_uptime()
}
4. Add CORS Configuration
from fastapi.middleware.cors import CORSMiddleware
app.add_middleware(
CORSMiddleware,
allow_origins=["https://your-domain.com"],
allow_credentials=True,
allow_methods=["*"],
allow_headers=["*"],
)
5. Add Compression Middleware
from fastapi.middleware.gzip import GZipMiddleware
app.add_middleware(GZipMiddleware, minimum_size=1000)
Summary
Total Technical Debt Identified: 10 major items Estimated Total Effort: 100-140 hours Recommended Priority Order:
- Immediate (< 2h each): Quick wins listed above
- Week 1-2 (16-24h): Refactor api.py into modules
- Week 3-4 (16-24h): Implement testing suite
- Month 2 (32-48h): Refactor large module files
- Month 3 (30-40h): Address remaining items
ROI Analysis:
- High ROI: API refactoring, testing suite, logging standardization
- Medium ROI: Database migrations, code deduplication
- Low ROI (but important): Type safety, pre-commit hooks
Next Steps:
- Review and prioritize with team
- Create issues for each item
- Start with quick wins for immediate impact
- Tackle high-impact items in sprints