591
docs/TECHNICAL_DEBT_ANALYSIS.md
Normal file
591
docs/TECHNICAL_DEBT_ANALYSIS.md
Normal file
@@ -0,0 +1,591 @@
|
||||
# Technical Debt Analysis & Immediate Improvements
|
||||
**Date:** 2025-10-31
|
||||
**Version:** 6.3.6
|
||||
**Analyst:** Automated Code Review
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
This document identifies technical debt, code smells, and immediate improvement opportunities in the Media Downloader codebase.
|
||||
|
||||
---
|
||||
|
||||
## Critical Technical Debt
|
||||
|
||||
### 1. Monolithic API File (2,649 lines)
|
||||
**File:** `/opt/media-downloader/web/backend/api.py`
|
||||
**Severity:** HIGH
|
||||
**Impact:** Maintainability, Testing, Code Review
|
||||
|
||||
**Current State:**
|
||||
- Single file contains all API endpoints
|
||||
- 50+ routes in one file
|
||||
- Multiple responsibilities (auth, downloads, media, scheduler, config)
|
||||
- Difficult to test individual components
|
||||
- High cognitive load for developers
|
||||
|
||||
**Recommendation:**
|
||||
Refactor into modular structure:
|
||||
```
|
||||
web/backend/
|
||||
├── main.py (app initialization, 100-150 lines)
|
||||
├── routers/
|
||||
│ ├── auth.py (authentication endpoints)
|
||||
│ ├── downloads.py (download management)
|
||||
│ ├── media.py (media serving)
|
||||
│ ├── scheduler.py (scheduler management)
|
||||
│ ├── platforms.py (platform configuration)
|
||||
│ └── health.py (health & monitoring)
|
||||
├── services/
|
||||
│ ├── download_service.py (business logic)
|
||||
│ ├── media_service.py (media processing)
|
||||
│ └── scheduler_service.py (scheduling logic)
|
||||
└── models/
|
||||
├── requests.py (Pydantic request models)
|
||||
└── responses.py (Pydantic response models)
|
||||
```
|
||||
|
||||
**Effort:** 16-24 hours
|
||||
**Priority:** HIGH
|
||||
**Benefits:**
|
||||
- Easier to test individual routers
|
||||
- Better separation of concerns
|
||||
- Reduced merge conflicts
|
||||
- Faster development velocity
|
||||
|
||||
---
|
||||
|
||||
### 2. Large Module Files
|
||||
**Severity:** HIGH
|
||||
**Impact:** Maintainability
|
||||
|
||||
**Problem Files:**
|
||||
- `modules/forum_downloader.py` (3,971 lines)
|
||||
- `modules/imginn_module.py` (2,542 lines)
|
||||
- `media-downloader.py` (2,653 lines)
|
||||
|
||||
**Common Issues:**
|
||||
- God objects (classes doing too much)
|
||||
- Long methods (100+ lines)
|
||||
- Deep nesting (5+ levels)
|
||||
- Code duplication
|
||||
- Difficult to unit test
|
||||
|
||||
**Recommendations:**
|
||||
|
||||
#### Forum Downloader Refactoring:
|
||||
```
|
||||
modules/forum/
|
||||
├── __init__.py
|
||||
├── base.py (base forum class)
|
||||
├── authentication.py (login, 2FA)
|
||||
├── thread_parser.py (HTML parsing)
|
||||
├── image_extractor.py (image extraction)
|
||||
├── download_manager.py (download logic)
|
||||
└── sites/
|
||||
├── hqcelebcorner.py (site-specific)
|
||||
└── picturepub.py (site-specific)
|
||||
```
|
||||
|
||||
#### Instagram Module Refactoring:
|
||||
```
|
||||
modules/instagram/
|
||||
├── __init__.py
|
||||
├── base_instagram.py (shared logic)
|
||||
├── fastdl.py (FastDL implementation)
|
||||
├── imginn.py (ImgInn implementation)
|
||||
├── toolzu.py (Toolzu implementation)
|
||||
├── cookie_manager.py (cookie handling)
|
||||
├── flaresolverr.py (FlareSolverr integration)
|
||||
└── content_parser.py (HTML parsing)
|
||||
```
|
||||
|
||||
**Effort:** 32-48 hours
|
||||
**Priority:** MEDIUM
|
||||
|
||||
---
|
||||
|
||||
### 3. Code Duplication in Instagram Modules
|
||||
**Severity:** MEDIUM
|
||||
**Impact:** Maintainability, Bug Fixes
|
||||
|
||||
**Duplication Analysis:**
|
||||
- fastdl_module.py, imginn_module.py, toolzu_module.py share 60-70% code
|
||||
- Cookie management duplicated 3x
|
||||
- FlareSolverr integration duplicated 3x
|
||||
- HTML parsing logic duplicated 3x
|
||||
- Download logic very similar
|
||||
|
||||
**Example Duplication:**
|
||||
```python
|
||||
# Appears in 3 files with minor variations
|
||||
def _get_flaresolverr_session(self):
|
||||
response = requests.post(
|
||||
f"{self.flaresolverr_url}/v1/sessions/create",
|
||||
json={"maxTimeout": 60000}
|
||||
)
|
||||
if response.status_code == 200:
|
||||
return response.json()['solution']['sessionId']
|
||||
```
|
||||
|
||||
**Solution:** Create base class with shared logic
|
||||
```python
|
||||
# modules/instagram/base_instagram.py
|
||||
class BaseInstagramDownloader(ABC):
|
||||
"""Base class for Instagram-like services"""
|
||||
|
||||
def __init__(self, config, unified_db):
|
||||
self.config = config
|
||||
self.unified_db = unified_db
|
||||
self.cookie_manager = CookieManager(config.get('cookie_file'))
|
||||
self.flaresolverr = FlareSolverrClient(config.get('flaresolverr_url'))
|
||||
|
||||
def _get_or_create_session(self):
|
||||
"""Shared session management logic"""
|
||||
# Common implementation
|
||||
|
||||
def _parse_stories(self, html: str) -> List[Dict]:
|
||||
"""Shared HTML parsing logic"""
|
||||
# Common implementation
|
||||
|
||||
@abstractmethod
|
||||
def _get_content_urls(self, username: str) -> List[str]:
|
||||
"""Platform-specific URL extraction"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Effort:** 12-16 hours
|
||||
**Priority:** MEDIUM
|
||||
**Benefits:**
|
||||
- Fix bugs once, applies to all modules
|
||||
- Easier to add new Instagram-like platforms
|
||||
- Less code to maintain
|
||||
- Consistent behavior
|
||||
|
||||
---
|
||||
|
||||
## Medium Priority Technical Debt
|
||||
|
||||
### 4. Inconsistent Logging
|
||||
**Severity:** MEDIUM
|
||||
**Impact:** Debugging, Monitoring
|
||||
|
||||
**Current State:**
|
||||
- Mix of `print()`, callbacks, `logging` module
|
||||
- No structured logging
|
||||
- Difficult to filter/search logs
|
||||
- No log levels in many places
|
||||
- No request IDs for tracing
|
||||
|
||||
**Examples:**
|
||||
```python
|
||||
# Different logging approaches in codebase
|
||||
print(f"Downloading {filename}") # Style 1
|
||||
if self.log_callback: # Style 2
|
||||
self.log_callback(f"[{platform}] {message}", "info")
|
||||
logger.info(f"Download complete: {filename}") # Style 3
|
||||
```
|
||||
|
||||
**Recommendation:** Standardize on structured logging
|
||||
```python
|
||||
# modules/structured_logger.py
|
||||
import logging
|
||||
import json
|
||||
from datetime import datetime
|
||||
from typing import Optional
|
||||
|
||||
class StructuredLogger:
|
||||
def __init__(self, name: str, context: Optional[Dict] = None):
|
||||
self.logger = logging.getLogger(name)
|
||||
self.context = context or {}
|
||||
|
||||
def log(self, level: str, message: str, **extra):
|
||||
"""Log with structured data"""
|
||||
log_entry = {
|
||||
'timestamp': datetime.now().isoformat(),
|
||||
'level': level.upper(),
|
||||
'logger': self.logger.name,
|
||||
'message': message,
|
||||
**self.context,
|
||||
**extra
|
||||
}
|
||||
|
||||
getattr(self.logger, level.lower())(json.dumps(log_entry))
|
||||
|
||||
def info(self, message: str, **extra):
|
||||
self.log('info', message, **extra)
|
||||
|
||||
def error(self, message: str, **extra):
|
||||
self.log('error', message, **extra)
|
||||
|
||||
def warning(self, message: str, **extra):
|
||||
self.log('warning', message, **extra)
|
||||
|
||||
def with_context(self, **context) -> 'StructuredLogger':
|
||||
"""Create logger with additional context"""
|
||||
new_context = {**self.context, **context}
|
||||
return StructuredLogger(self.logger.name, new_context)
|
||||
|
||||
# Usage
|
||||
logger = StructuredLogger('downloader')
|
||||
request_logger = logger.with_context(request_id='abc123', user_id=42)
|
||||
|
||||
request_logger.info('Starting download',
|
||||
platform='instagram',
|
||||
username='testuser',
|
||||
content_type='stories'
|
||||
)
|
||||
# Output: {"timestamp": "2025-10-31T13:00:00", "level": "INFO",
|
||||
# "message": "Starting download", "request_id": "abc123",
|
||||
# "user_id": 42, "platform": "instagram", ...}
|
||||
```
|
||||
|
||||
**Effort:** 8-12 hours
|
||||
**Priority:** MEDIUM
|
||||
|
||||
---
|
||||
|
||||
### 5. Missing Database Migrations System
|
||||
**Severity:** MEDIUM
|
||||
**Impact:** Deployment, Upgrades
|
||||
|
||||
**Current State:**
|
||||
- Schema changes via ad-hoc ALTER TABLE statements
|
||||
- No version tracking
|
||||
- No rollback capability
|
||||
- Difficult to deploy across environments
|
||||
- Manual schema updates error-prone
|
||||
|
||||
**Recommendation:** Implement Alembic migrations
|
||||
```bash
|
||||
# Install Alembic
|
||||
pip install alembic
|
||||
|
||||
# Initialize
|
||||
alembic init alembic
|
||||
|
||||
# Create migration
|
||||
alembic revision --autogenerate -m "Add user preferences column"
|
||||
|
||||
# Apply migrations
|
||||
alembic upgrade head
|
||||
|
||||
# Rollback
|
||||
alembic downgrade -1
|
||||
```
|
||||
|
||||
**Migration Example:**
|
||||
```python
|
||||
# alembic/versions/001_add_user_preferences.py
|
||||
def upgrade():
|
||||
op.add_column('users', sa.Column('preferences', sa.JSON(), nullable=True))
|
||||
op.create_index('idx_users_username', 'users', ['username'])
|
||||
|
||||
def downgrade():
|
||||
op.drop_index('idx_users_username', 'users')
|
||||
op.drop_column('users', 'preferences')
|
||||
```
|
||||
|
||||
**Effort:** 6-8 hours
|
||||
**Priority:** MEDIUM
|
||||
|
||||
---
|
||||
|
||||
### 6. No API Documentation (OpenAPI/Swagger)
|
||||
**Severity:** MEDIUM
|
||||
**Impact:** Integration, Developer Experience
|
||||
|
||||
**Current State:**
|
||||
- No interactive API documentation
|
||||
- No schema validation documentation
|
||||
- Difficult for third-party integrations
|
||||
- Manual endpoint discovery
|
||||
|
||||
**Solution:** FastAPI automatically generates OpenAPI docs
|
||||
```python
|
||||
# main.py
|
||||
app = FastAPI(
|
||||
title="Media Downloader API",
|
||||
description="Unified media downloading system",
|
||||
version="6.3.6",
|
||||
docs_url="/api/docs",
|
||||
redoc_url="/api/redoc"
|
||||
)
|
||||
|
||||
# Add tags for organization
|
||||
@app.get("/api/downloads", tags=["Downloads"])
|
||||
async def get_downloads():
|
||||
"""
|
||||
Get list of downloads with filtering.
|
||||
|
||||
Returns:
|
||||
List of download records with metadata
|
||||
|
||||
Raises:
|
||||
401: Unauthorized - Missing or invalid authentication
|
||||
500: Internal Server Error - Database or system error
|
||||
"""
|
||||
pass
|
||||
```
|
||||
|
||||
**Access docs at:**
|
||||
- Swagger UI: `http://localhost:8000/api/docs`
|
||||
- ReDoc: `http://localhost:8000/api/redoc`
|
||||
|
||||
**Effort:** 4-6 hours (adding descriptions, examples)
|
||||
**Priority:** MEDIUM
|
||||
|
||||
---
|
||||
|
||||
## Low Priority Technical Debt
|
||||
|
||||
### 7. Frontend Type Safety Gaps
|
||||
**Severity:** LOW
|
||||
**Impact:** Development Velocity
|
||||
|
||||
**Remaining Issues:**
|
||||
- Some components still use `any` type
|
||||
- API response types not fully typed
|
||||
- Props interfaces could be more specific
|
||||
- Missing null checks in places
|
||||
|
||||
**Solution:** Progressive enhancement with new types file
|
||||
```typescript
|
||||
// Update components to use types from types/index.ts
|
||||
import { Download, Platform, User } from '../types'
|
||||
|
||||
interface DownloadListProps {
|
||||
downloads: Download[]
|
||||
onSelect: (download: Download) => void
|
||||
currentUser: User
|
||||
}
|
||||
|
||||
const DownloadList: React.FC<DownloadListProps> = ({
|
||||
downloads,
|
||||
onSelect,
|
||||
currentUser
|
||||
}) => {
|
||||
// Fully typed component
|
||||
}
|
||||
```
|
||||
|
||||
**Effort:** 6-8 hours
|
||||
**Priority:** LOW
|
||||
|
||||
---
|
||||
|
||||
### 8. Hardcoded Configuration Values
|
||||
**Severity:** LOW
|
||||
**Impact:** Flexibility
|
||||
|
||||
**Examples:**
|
||||
```python
|
||||
# Hardcoded paths
|
||||
base_path = Path("/opt/immich/md")
|
||||
media_base = Path("/opt/immich/md")
|
||||
|
||||
# Hardcoded timeouts
|
||||
timeout=10.0
|
||||
timeout=30
|
||||
|
||||
# Hardcoded limits
|
||||
limit: int = 100
|
||||
```
|
||||
|
||||
**Solution:** Move to configuration
|
||||
```python
|
||||
# config/defaults.py
|
||||
DEFAULTS = {
|
||||
'media_base_path': '/opt/immich/md',
|
||||
'database_timeout': 10.0,
|
||||
'api_timeout': 30.0,
|
||||
'default_page_limit': 100,
|
||||
'max_page_limit': 1000,
|
||||
'thumbnail_size': (300, 300),
|
||||
'cache_ttl': 300
|
||||
}
|
||||
|
||||
# Usage
|
||||
from config import get_config
|
||||
config = get_config()
|
||||
base_path = Path(config.get('media_base_path'))
|
||||
```
|
||||
|
||||
**Effort:** 4-6 hours
|
||||
**Priority:** LOW
|
||||
|
||||
---
|
||||
|
||||
## Code Quality Improvements
|
||||
|
||||
### 9. Add Pre-commit Hooks
|
||||
**Effort:** 2-3 hours
|
||||
**Priority:** MEDIUM
|
||||
|
||||
**Setup:**
|
||||
```yaml
|
||||
# .pre-commit-config.yaml
|
||||
repos:
|
||||
- repo: https://github.com/psf/black
|
||||
rev: 23.12.1
|
||||
hooks:
|
||||
- id: black
|
||||
language_version: python3.12
|
||||
|
||||
- repo: https://github.com/PyCQA/flake8
|
||||
rev: 7.0.0
|
||||
hooks:
|
||||
- id: flake8
|
||||
args: [--max-line-length=120]
|
||||
|
||||
- repo: https://github.com/pre-commit/mirrors-mypy
|
||||
rev: v1.8.0
|
||||
hooks:
|
||||
- id: mypy
|
||||
additional_dependencies: [types-all]
|
||||
|
||||
- repo: https://github.com/pre-commit/mirrors-eslint
|
||||
rev: v8.56.0
|
||||
hooks:
|
||||
- id: eslint
|
||||
files: \.(js|ts|tsx)$
|
||||
types: [file]
|
||||
```
|
||||
|
||||
**Benefits:**
|
||||
- Automatic code formatting
|
||||
- Catch errors before commit
|
||||
- Enforce code style
|
||||
- Prevent bad commits
|
||||
|
||||
---
|
||||
|
||||
### 10. Add GitHub Actions CI/CD
|
||||
**Effort:** 4-6 hours
|
||||
**Priority:** MEDIUM
|
||||
|
||||
**Workflow:**
|
||||
```yaml
|
||||
# .github/workflows/ci.yml
|
||||
name: CI
|
||||
|
||||
on: [push, pull_request]
|
||||
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- uses: actions/setup-python@v4
|
||||
with:
|
||||
python-version: '3.12'
|
||||
- run: pip install -r requirements.txt
|
||||
- run: pytest tests/
|
||||
- run: python -m py_compile **/*.py
|
||||
|
||||
lint:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- run: pip install black flake8
|
||||
- run: black --check .
|
||||
- run: flake8 .
|
||||
|
||||
frontend:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v3
|
||||
- uses: actions/setup-node@v3
|
||||
- run: npm install
|
||||
- run: npm run build
|
||||
- run: npm run lint
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Immediate Quick Wins (< 2 hours each)
|
||||
|
||||
### 1. Add Request ID Tracking
|
||||
```python
|
||||
import uuid
|
||||
from fastapi import Request
|
||||
|
||||
@app.middleware("http")
|
||||
async def add_request_id(request: Request, call_next):
|
||||
request.state.request_id = str(uuid.uuid4())
|
||||
response = await call_next(request)
|
||||
response.headers["X-Request-ID"] = request.state.request_id
|
||||
return response
|
||||
```
|
||||
|
||||
### 2. Add Response Time Logging
|
||||
```python
|
||||
import time
|
||||
|
||||
@app.middleware("http")
|
||||
async def log_response_time(request: Request, call_next):
|
||||
start = time.time()
|
||||
response = await call_next(request)
|
||||
duration = time.time() - start
|
||||
logger.info(f"{request.method} {request.url.path} - {duration:.3f}s")
|
||||
return response
|
||||
```
|
||||
|
||||
### 3. Add Health Check Versioning
|
||||
```python
|
||||
@app.get("/api/health")
|
||||
async def health():
|
||||
return {
|
||||
"status": "healthy",
|
||||
"version": "6.3.6",
|
||||
"build_date": "2025-10-31",
|
||||
"python_version": sys.version,
|
||||
"uptime": get_uptime()
|
||||
}
|
||||
```
|
||||
|
||||
### 4. Add CORS Configuration
|
||||
```python
|
||||
from fastapi.middleware.cors import CORSMiddleware
|
||||
|
||||
app.add_middleware(
|
||||
CORSMiddleware,
|
||||
allow_origins=["https://your-domain.com"],
|
||||
allow_credentials=True,
|
||||
allow_methods=["*"],
|
||||
allow_headers=["*"],
|
||||
)
|
||||
```
|
||||
|
||||
### 5. Add Compression Middleware
|
||||
```python
|
||||
from fastapi.middleware.gzip import GZipMiddleware
|
||||
|
||||
app.add_middleware(GZipMiddleware, minimum_size=1000)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
**Total Technical Debt Identified:** 10 major items
|
||||
**Estimated Total Effort:** 100-140 hours
|
||||
**Recommended Priority Order:**
|
||||
|
||||
1. **Immediate (< 2h each):** Quick wins listed above
|
||||
2. **Week 1-2 (16-24h):** Refactor api.py into modules
|
||||
3. **Week 3-4 (16-24h):** Implement testing suite
|
||||
4. **Month 2 (32-48h):** Refactor large module files
|
||||
5. **Month 3 (30-40h):** Address remaining items
|
||||
|
||||
**ROI Analysis:**
|
||||
- High ROI: API refactoring, testing suite, logging standardization
|
||||
- Medium ROI: Database migrations, code deduplication
|
||||
- Low ROI (but important): Type safety, pre-commit hooks
|
||||
|
||||
**Next Steps:**
|
||||
1. Review and prioritize with team
|
||||
2. Create issues for each item
|
||||
3. Start with quick wins for immediate impact
|
||||
4. Tackle high-impact items in sprints
|
||||
Reference in New Issue
Block a user