909 lines
25 KiB
Markdown
909 lines
25 KiB
Markdown
# Media Downloader - Comprehensive Code Review
|
||
**Date:** 2025-10-31
|
||
**Version:** 6.3.4
|
||
**Reviewer:** Claude Code (Automated Analysis)
|
||
**Scope:** Full codebase - Backend, Frontend, Database, Architecture
|
||
|
||
---
|
||
|
||
## Executive Summary
|
||
|
||
The Media Downloader is a **feature-rich, architecturally sound application** with excellent modular design and modern technology choices. The codebase demonstrates solid engineering principles with a unified database, clear separation of concerns, and comprehensive feature coverage.
|
||
|
||
**Overall Assessment:**
|
||
- **Code Quality:** 6.5/10 - Good structure but needs refactoring
|
||
- **Security:** 4/10 - **CRITICAL issues** requiring immediate attention
|
||
- **Performance:** 7/10 - Generally good with optimization opportunities
|
||
- **Maintainability:** 6/10 - Large files, some duplication, limited tests
|
||
- **Architecture:** 8/10 - Excellent modular design
|
||
|
||
### Key Statistics
|
||
- **Total Lines of Code:** 37,966
|
||
- **Python Files:** 49 (including 20 modules, 2 backend files)
|
||
- **TypeScript Files:** 20
|
||
- **Documentation Files:** 11 (in docs/)
|
||
- **Test Files:** 0 ⚠️
|
||
|
||
### Critical Findings
|
||
🔴 **4 Critical Security Issues** - Require immediate action
|
||
🟠 **4 High Priority Issues** - Fix within 1-2 weeks
|
||
🟡 **7 Medium Priority Issues** - Address within 1-3 months
|
||
🟢 **5 Low Priority Issues** - Nice to have improvements
|
||
|
||
---
|
||
|
||
## Critical Issues (🔴 Fix Immediately)
|
||
|
||
### 1. Hardcoded Secrets in Configuration
|
||
**Severity:** CRITICAL | **Effort:** 2-4 hours | **Risk:** Data breach
|
||
|
||
**Location:** `/opt/media-downloader/config/settings.json`
|
||
|
||
**Problem:**
|
||
```json
|
||
{
|
||
"password": "cpc6rvm!wvf_wft2EHN",
|
||
"totp_secret": "OVLX4K6NHTUJTUJVL4TLHXJ55SIEDOOY",
|
||
"api_key": "SC1dje6Zo5VhGPmy9vyfkeuBY0MZ7VfgrhI8wIvjOM",
|
||
"api_token": "a3jmhwnhecq9k9dz3tzv2bdk7uc29p"
|
||
}
|
||
```
|
||
|
||
Credentials are stored in plaintext and tracked in version control. Anyone with repository access has full account credentials. Git history cannot be cleaned without force-pushing.
|
||
|
||
**Impact:**
|
||
- All forum passwords, API keys, and TOTP secrets exposed
|
||
- Cannot rotate credentials without code changes
|
||
- Violates OWASP A02:2021 – Cryptographic Failures
|
||
|
||
**Solution:**
|
||
```bash
|
||
# 1. Immediate: Add to .gitignore
|
||
echo "config/settings.json" >> .gitignore
|
||
echo ".env" >> .gitignore
|
||
|
||
# 2. Create environment variable template
|
||
cat > config/settings.example.json <<EOF
|
||
{
|
||
"forums": {
|
||
"password": "FORUM_PASSWORD",
|
||
"totp_secret": "FORUM_TOTP_SECRET"
|
||
},
|
||
"snapchat": {
|
||
"password": "SNAPCHAT_PASSWORD"
|
||
},
|
||
"tiktok": {
|
||
"api_key": "TIKTOK_API_KEY",
|
||
"api_token": "TIKTOK_API_TOKEN"
|
||
}
|
||
}
|
||
EOF
|
||
|
||
# 3. Create .env file (add to .gitignore)
|
||
cat > .env.example <<EOF
|
||
FORUM_PASSWORD=your_password_here
|
||
FORUM_TOTP_SECRET=your_totp_secret_here
|
||
SNAPCHAT_PASSWORD=your_password_here
|
||
TIKTOK_API_KEY=your_api_key_here
|
||
TIKTOK_API_TOKEN=your_api_token_here
|
||
EOF
|
||
```
|
||
|
||
**Implementation:**
|
||
```python
|
||
# modules/secrets_manager.py
|
||
import os
|
||
from pathlib import Path
|
||
from dotenv import load_dotenv
|
||
from typing import Optional
|
||
|
||
class SecretsManager:
|
||
"""Secure secrets management using environment variables"""
|
||
|
||
def __init__(self, env_file: Optional[Path] = None):
|
||
if env_file is None:
|
||
env_file = Path(__file__).parent.parent / '.env'
|
||
|
||
if env_file.exists():
|
||
load_dotenv(env_file)
|
||
|
||
def get_secret(self, key: str, default: Optional[str] = None) -> str:
|
||
"""Get secret from environment, raise if not found and no default"""
|
||
value = os.getenv(key, default)
|
||
if value is None:
|
||
raise ValueError(f"Secret '{key}' not found in environment")
|
||
return value
|
||
|
||
def get_optional_secret(self, key: str) -> Optional[str]:
|
||
"""Get secret from environment, return None if not found"""
|
||
return os.getenv(key)
|
||
|
||
# Usage in modules
|
||
secrets = SecretsManager()
|
||
forum_password = secrets.get_secret('FORUM_PASSWORD')
|
||
```
|
||
|
||
**Rollout Plan:**
|
||
1. Create `.env.example` with placeholder values
|
||
2. Add `.gitignore` entries for `.env` and `config/settings.json`
|
||
3. Document secret setup in `INSTALL.md`
|
||
4. Update all modules to use `SecretsManager`
|
||
5. Notify team to create local `.env` files
|
||
6. Remove secrets from `settings.json` (keep structure)
|
||
|
||
---
|
||
|
||
### 2. SQL Injection Vulnerabilities
|
||
**Severity:** CRITICAL | **Effort:** 4-6 hours | **Risk:** Database compromise
|
||
|
||
**Location:** `/opt/media-downloader/web/backend/api.py` (multiple locations)
|
||
|
||
**Problem:**
|
||
F-string SQL queries with user-controlled input:
|
||
|
||
```python
|
||
# Line ~478-482 (VULNERABLE)
|
||
cursor.execute(f"""
|
||
SELECT COUNT(*) FROM downloads
|
||
WHERE download_date >= datetime('now', '-1 day')
|
||
AND {filters}
|
||
""")
|
||
|
||
# Line ~830-850 (VULNERABLE)
|
||
query = f"SELECT * FROM downloads WHERE platform = '{platform}'"
|
||
cursor.execute(query)
|
||
```
|
||
|
||
The `filters` variable is constructed from user input (`platform`, `source`, `search`) without proper sanitization.
|
||
|
||
**Impact:**
|
||
- Attackers can inject arbitrary SQL commands
|
||
- Can drop tables: `'; DROP TABLE downloads; --`
|
||
- Can exfiltrate data: `' OR 1=1 UNION SELECT * FROM users --`
|
||
- Can bypass authentication
|
||
- OWASP A03:2021 – Injection
|
||
|
||
**Solution:**
|
||
```python
|
||
# BEFORE (VULNERABLE)
|
||
platform = request.query_params.get('platform')
|
||
query = f"SELECT * FROM downloads WHERE platform = '{platform}'"
|
||
cursor.execute(query)
|
||
|
||
# AFTER (SECURE)
|
||
platform = request.query_params.get('platform')
|
||
query = "SELECT * FROM downloads WHERE platform = ?"
|
||
cursor.execute(query, (platform,))
|
||
|
||
# For dynamic filters
|
||
def build_safe_query(filters: dict) -> tuple[str, tuple]:
|
||
"""Build parameterized query from filters"""
|
||
conditions = []
|
||
params = []
|
||
|
||
if filters.get('platform'):
|
||
conditions.append("platform = ?")
|
||
params.append(filters['platform'])
|
||
|
||
if filters.get('source'):
|
||
conditions.append("source = ?")
|
||
params.append(filters['source'])
|
||
|
||
if filters.get('search'):
|
||
conditions.append("(filename LIKE ? OR source LIKE ?)")
|
||
search_pattern = f"%{filters['search']}%"
|
||
params.extend([search_pattern, search_pattern])
|
||
|
||
where_clause = " AND ".join(conditions) if conditions else "1=1"
|
||
return where_clause, tuple(params)
|
||
|
||
# Usage
|
||
filters = build_safe_query(request.query_params)
|
||
query = f"SELECT * FROM downloads WHERE {filters[0]}"
|
||
cursor.execute(query, filters[1])
|
||
```
|
||
|
||
**Files Requiring Fixes:**
|
||
- `/opt/media-downloader/web/backend/api.py` (17+ instances)
|
||
- Lines 478-482, 520-540, 830-850, 910-930
|
||
- `/opt/media-downloader/utilities/db_manager.py` (2 instances)
|
||
|
||
**Testing:**
|
||
```python
|
||
# Test case for SQL injection prevention
|
||
def test_sql_injection_prevention():
|
||
# Try to inject SQL
|
||
malicious_input = "'; DROP TABLE downloads; --"
|
||
response = client.get(f"/api/downloads?platform={malicious_input}")
|
||
|
||
# Should not execute injection
|
||
assert response.status_code in [400, 404] # Bad request or not found
|
||
|
||
# Verify table still exists
|
||
assert db.table_exists('downloads')
|
||
```
|
||
|
||
---
|
||
|
||
### 3. Path Traversal Vulnerabilities
|
||
**Severity:** HIGH | **Effort:** 3-4 hours | **Risk:** File system access
|
||
|
||
**Location:** `/opt/media-downloader/web/backend/api.py` (media endpoints)
|
||
|
||
**Problem:**
|
||
File paths from user input are not validated:
|
||
|
||
```python
|
||
# Lines ~1920+ (VULNERABLE)
|
||
@app.get("/api/media/preview")
|
||
async def get_media_preview(file_path: str, ...):
|
||
# No validation - attacker could use ../../etc/passwd
|
||
return FileResponse(file_path)
|
||
|
||
@app.get("/api/media/thumbnail")
|
||
async def get_media_thumbnail(file_path: str, ...):
|
||
# No validation
|
||
requested_path = Path(file_path)
|
||
return FileResponse(requested_path)
|
||
```
|
||
|
||
**Impact:**
|
||
- Read arbitrary files: `/etc/passwd`, `/etc/shadow`, database files
|
||
- Access configuration with secrets
|
||
- Data exfiltration via media endpoints
|
||
- OWASP A01:2021 – Broken Access Control
|
||
|
||
**Solution:**
|
||
```python
|
||
from pathlib import Path
|
||
from fastapi import HTTPException
|
||
|
||
ALLOWED_MEDIA_BASE = Path("/opt/immich/md")
|
||
|
||
def validate_file_path(file_path: str, allowed_base: Path) -> Path:
|
||
"""
|
||
Ensure file_path is within allowed directory.
|
||
Prevents directory traversal attacks.
|
||
"""
|
||
try:
|
||
# Resolve to absolute path
|
||
requested = Path(file_path).resolve()
|
||
|
||
# Check if within allowed directory
|
||
if not requested.is_relative_to(allowed_base):
|
||
raise ValueError(f"Path outside allowed directory")
|
||
|
||
# Check file exists
|
||
if not requested.exists():
|
||
raise FileNotFoundError()
|
||
|
||
# Check it's a file, not directory
|
||
if not requested.is_file():
|
||
raise ValueError("Path is not a file")
|
||
|
||
return requested
|
||
|
||
except (ValueError, FileNotFoundError) as e:
|
||
raise HTTPException(
|
||
status_code=403,
|
||
detail="Access denied: Invalid file path"
|
||
)
|
||
|
||
@app.get("/api/media/preview")
|
||
async def get_media_preview(
|
||
file_path: str,
|
||
current_user: Dict = Depends(get_current_user_media)
|
||
):
|
||
"""Serve media file with path validation"""
|
||
safe_path = validate_file_path(file_path, ALLOWED_MEDIA_BASE)
|
||
return FileResponse(safe_path)
|
||
```
|
||
|
||
**Test Cases:**
|
||
```python
|
||
# Path traversal attack attempts
|
||
test_cases = [
|
||
"../../etc/passwd",
|
||
"/etc/passwd",
|
||
"../../../root/.ssh/id_rsa",
|
||
"....//....//etc/passwd",
|
||
"%2e%2e%2f%2e%2e%2fetc%2fpasswd", # URL encoded
|
||
]
|
||
|
||
for attack in test_cases:
|
||
response = client.get(f"/api/media/preview?file_path={attack}")
|
||
assert response.status_code == 403, f"Failed to block: {attack}"
|
||
```
|
||
|
||
---
|
||
|
||
### 4. Command Injection Risk
|
||
**Severity:** HIGH | **Effort:** 2-3 hours | **Risk:** Code execution
|
||
|
||
**Location:** `/opt/media-downloader/web/backend/api.py`
|
||
|
||
**Problem:**
|
||
Subprocess calls with user input:
|
||
|
||
```python
|
||
# Line ~1314
|
||
@app.post("/api/platforms/{platform}/trigger")
|
||
async def trigger_platform_download(platform: str, ...):
|
||
cmd = ["python3", "/opt/media-downloader/media-downloader.py", "--platform", platform]
|
||
process = await asyncio.create_subprocess_exec(*cmd, ...)
|
||
```
|
||
|
||
While using a list (safer than shell=True), the `platform` parameter is not validated against a whitelist.
|
||
|
||
**Impact:**
|
||
- Could inject commands if platform validation is bypassed
|
||
- Potential code execution via crafted platform names
|
||
- OWASP A03:2021 – Injection
|
||
|
||
**Solution:**
|
||
```python
|
||
from enum import Enum
|
||
from typing import Literal
|
||
|
||
# Define allowed platforms as enum
|
||
class Platform(str, Enum):
|
||
INSTAGRAM = "instagram"
|
||
FASTDL = "fastdl"
|
||
IMGINN = "imginn"
|
||
TOOLZU = "toolzu"
|
||
SNAPCHAT = "snapchat"
|
||
TIKTOK = "tiktok"
|
||
FORUMS = "forums"
|
||
ALL = "all"
|
||
|
||
@app.post("/api/platforms/{platform}/trigger")
|
||
async def trigger_platform_download(
|
||
platform: Platform, # Type hint enforces validation
|
||
trigger_data: TriggerRequest,
|
||
background_tasks: BackgroundTasks,
|
||
current_user: Dict = Depends(get_current_user)
|
||
):
|
||
"""Trigger download with validated platform"""
|
||
# FastAPI automatically validates against enum
|
||
cmd = [
|
||
"python3",
|
||
"/opt/media-downloader/media-downloader.py",
|
||
"--platform",
|
||
platform.value # Safe - enum member
|
||
]
|
||
|
||
process = await asyncio.create_subprocess_exec(
|
||
*cmd,
|
||
stdout=asyncio.subprocess.PIPE,
|
||
stderr=asyncio.subprocess.PIPE
|
||
)
|
||
```
|
||
|
||
**Additional Hardening:**
|
||
```python
|
||
# Subprocess wrapper with additional safety
|
||
import shlex
|
||
|
||
def safe_subprocess_exec(cmd: List[str], allowed_commands: Set[str]):
|
||
"""Execute subprocess with command whitelist"""
|
||
if cmd[0] not in allowed_commands:
|
||
raise ValueError(f"Command not allowed: {cmd[0]}")
|
||
|
||
# Validate all arguments are safe
|
||
for arg in cmd:
|
||
if any(char in arg for char in [';', '&', '|', '$', '`']):
|
||
raise ValueError(f"Dangerous character in argument: {arg}")
|
||
|
||
return subprocess.run(cmd, capture_output=True, text=True, timeout=300)
|
||
```
|
||
|
||
---
|
||
|
||
## High Priority Issues (🟠 Fix Soon)
|
||
|
||
### 5. Massive Files - Maintainability Crisis
|
||
**Severity:** HIGH | **Effort:** 24-36 hours | **Risk:** Technical debt
|
||
|
||
**Problem:**
|
||
Several files exceed 2,000 lines, violating single responsibility principle:
|
||
|
||
| File | Lines | Size |
|
||
|------|-------|------|
|
||
| `modules/forum_downloader.py` | 3,971 | 167 KB |
|
||
| `media-downloader.py` | 2,653 | - |
|
||
| `web/backend/api.py` | 2,649 | 94 KB |
|
||
| `modules/imginn_module.py` | 2,542 | 129 KB |
|
||
|
||
**Impact:**
|
||
- Difficult to navigate and understand
|
||
- Hard to test individual components
|
||
- Increases cognitive load
|
||
- Higher bug density
|
||
- Makes code reviews painful
|
||
- Merge conflicts more frequent
|
||
|
||
**Recommended Structure:**
|
||
|
||
```
|
||
# For api.py refactoring:
|
||
web/backend/
|
||
├── main.py (FastAPI app initialization, 100-150 lines)
|
||
├── dependencies.py (auth dependencies, 50-100 lines)
|
||
├── middleware.py (CORS, rate limiting, 50-100 lines)
|
||
├── routers/
|
||
│ ├── __init__.py
|
||
│ ├── auth.py (authentication endpoints, 150-200 lines)
|
||
│ ├── downloads.py (download endpoints, 200-300 lines)
|
||
│ ├── scheduler.py (scheduler endpoints, 150-200 lines)
|
||
│ ├── media.py (media endpoints, 150-200 lines)
|
||
│ ├── health.py (health/monitoring, 100-150 lines)
|
||
│ └── config.py (configuration endpoints, 100-150 lines)
|
||
├── services/
|
||
│ ├── download_service.py (download business logic)
|
||
│ ├── scheduler_service.py (scheduler business logic)
|
||
│ └── media_service.py (media processing logic)
|
||
├── models/
|
||
│ ├── requests.py (Pydantic request models)
|
||
│ ├── responses.py (Pydantic response models)
|
||
│ └── schemas.py (database schemas)
|
||
└── utils/
|
||
├── validators.py (input validation)
|
||
└── helpers.py (utility functions)
|
||
```
|
||
|
||
**Migration Plan:**
|
||
1. Create new directory structure
|
||
2. Extract routers one at a time (start with health, least dependencies)
|
||
3. Move business logic to services
|
||
4. Extract Pydantic models
|
||
5. Update imports gradually
|
||
6. Test after each extraction
|
||
7. Remove old code once verified
|
||
|
||
---
|
||
|
||
### 6. Database Connection Pool Exhaustion
|
||
**Severity:** HIGH | **Effort:** 4-6 hours | **Risk:** Application hang
|
||
|
||
**Location:** `/opt/media-downloader/modules/unified_database.py`
|
||
|
||
**Problem:**
|
||
Connection pool implementation has potential leaks:
|
||
|
||
```python
|
||
# Line 119-130 (PROBLEMATIC)
|
||
def get_connection(self, for_write=False):
|
||
try:
|
||
if self.pool:
|
||
with self.pool.get_connection(for_write=for_write) as conn:
|
||
yield conn
|
||
else:
|
||
conn = sqlite3.connect(...)
|
||
# ⚠️ No try/finally - connection might not close on error
|
||
yield conn
|
||
```
|
||
|
||
**Impact:**
|
||
- Connection leaks under error conditions
|
||
- Pool exhaustion causes application hang
|
||
- No monitoring of pool health
|
||
- Memory leaks
|
||
|
||
**Solution:**
|
||
```python
|
||
from contextlib import contextmanager
|
||
from typing import Generator
|
||
import sqlite3
|
||
|
||
@contextmanager
|
||
def get_connection(
|
||
self,
|
||
for_write: bool = False
|
||
) -> Generator[sqlite3.Connection, None, None]:
|
||
"""
|
||
Get database connection with guaranteed cleanup.
|
||
|
||
Args:
|
||
for_write: If True, ensures exclusive write access
|
||
|
||
Yields:
|
||
sqlite3.Connection: Database connection
|
||
|
||
Raises:
|
||
sqlite3.Error: On connection/query errors
|
||
"""
|
||
conn = None
|
||
try:
|
||
if self.pool:
|
||
conn = self.pool.get_connection(for_write=for_write)
|
||
else:
|
||
conn = sqlite3.connect(
|
||
str(self.db_path),
|
||
timeout=30,
|
||
check_same_thread=False
|
||
)
|
||
conn.row_factory = sqlite3.Row
|
||
|
||
yield conn
|
||
|
||
# Commit if no exceptions
|
||
if for_write:
|
||
conn.commit()
|
||
|
||
except sqlite3.Error as e:
|
||
# Rollback on error
|
||
if conn and for_write:
|
||
conn.rollback()
|
||
logger.error(f"Database error: {e}")
|
||
raise
|
||
|
||
finally:
|
||
# Always close connection
|
||
if conn:
|
||
conn.close()
|
||
|
||
# Add pool monitoring
|
||
def get_pool_stats(self) -> dict:
|
||
"""Get connection pool statistics"""
|
||
if not self.pool:
|
||
return {'pool_enabled': False}
|
||
|
||
return {
|
||
'pool_enabled': True,
|
||
'active_connections': self.pool.active_connections,
|
||
'max_connections': self.pool.max_connections,
|
||
'available': self.pool.max_connections - self.pool.active_connections,
|
||
'wait_count': self.pool.wait_count,
|
||
'timeout_count': self.pool.timeout_count
|
||
}
|
||
|
||
# Add to health endpoint
|
||
@app.get("/api/health/database")
|
||
async def get_database_health():
|
||
stats = app_state.db.get_pool_stats()
|
||
|
||
# Alert if low on connections
|
||
if stats.get('available', 0) < 2:
|
||
logger.warning("Database connection pool nearly exhausted")
|
||
|
||
return stats
|
||
```
|
||
|
||
---
|
||
|
||
### 7. No Authentication Rate Limiting (Already Fixed)
|
||
**Severity:** HIGH | **Status:** ✅ FIXED in 6.3.4
|
||
|
||
Rate limiting has been implemented in version 6.3.4 using slowapi:
|
||
- Login: 5 requests/minute
|
||
- Auth endpoints: 10 requests/minute
|
||
- Read endpoints: 100 requests/minute
|
||
|
||
No additional action required.
|
||
|
||
---
|
||
|
||
### 8. Missing CSRF Protection
|
||
**Severity:** HIGH | **Effort:** 2-3 hours | **Risk:** Unauthorized actions
|
||
|
||
**Problem:**
|
||
No CSRF tokens on state-changing operations. Attackers can craft malicious pages that trigger actions on behalf of authenticated users.
|
||
|
||
**Impact:**
|
||
- Delete downloads via CSRF
|
||
- Trigger new downloads
|
||
- Modify configuration
|
||
- Stop running tasks
|
||
- OWASP A01:2021 – Broken Access Control
|
||
|
||
**Solution:**
|
||
```bash
|
||
# Install CSRF protection
|
||
pip install fastapi-csrf-protect
|
||
```
|
||
|
||
```python
|
||
# web/backend/main.py
|
||
from fastapi_csrf_protect import CsrfProtect
|
||
from fastapi_csrf_protect.exceptions import CsrfProtectError
|
||
from pydantic import BaseModel
|
||
|
||
class CsrfSettings(BaseModel):
|
||
secret_key: str = os.getenv('CSRF_SECRET_KEY', secrets.token_urlsafe(32))
|
||
cookie_samesite: str = 'strict'
|
||
|
||
@CsrfProtect.load_config
|
||
def get_csrf_config():
|
||
return CsrfSettings()
|
||
|
||
# Apply to state-changing endpoints
|
||
@app.post("/api/platforms/{platform}/trigger")
|
||
async def trigger_download(
|
||
request: Request,
|
||
csrf_protect: CsrfProtect = Depends()
|
||
):
|
||
# Validate CSRF token
|
||
await csrf_protect.validate_csrf(request)
|
||
# Rest of code...
|
||
|
||
# Frontend: Include CSRF token
|
||
// api.ts
|
||
async post<T>(endpoint: string, data: any): Promise<T> {
|
||
const csrfToken = this.getCsrfToken()
|
||
return fetch(`${API_BASE}${endpoint}`, {
|
||
method: 'POST',
|
||
headers: {
|
||
'Content-Type': 'application/json',
|
||
'X-CSRF-Token': csrfToken
|
||
},
|
||
body: JSON.stringify(data)
|
||
})
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
## Medium Priority Issues (🟡 Address This Quarter)
|
||
|
||
### 9. TypeScript 'any' Type Overuse
|
||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||
|
||
70+ instances of `any` type defeat TypeScript's purpose.
|
||
|
||
**Solution:**
|
||
```typescript
|
||
// Define proper interfaces
|
||
interface User {
|
||
id: number
|
||
username: string
|
||
role: 'admin' | 'user' | 'viewer'
|
||
email?: string
|
||
preferences: UserPreferences
|
||
}
|
||
|
||
interface UserPreferences {
|
||
theme: 'light' | 'dark'
|
||
notifications: boolean
|
||
}
|
||
|
||
interface PlatformConfig {
|
||
enabled: boolean
|
||
check_interval_hours: number
|
||
accounts?: Account[]
|
||
usernames?: string[]
|
||
run_at_start?: boolean
|
||
}
|
||
|
||
// Replace any with proper types
|
||
async getMe(): Promise<User> {
|
||
return this.get<User>('/auth/me')
|
||
}
|
||
```
|
||
|
||
---
|
||
|
||
### 10. No Comprehensive Error Handling
|
||
**Severity:** MEDIUM | **Effort:** 6-8 hours
|
||
|
||
115 try/except blocks with generic `except Exception` catching.
|
||
|
||
**Solution:**
|
||
```python
|
||
# modules/exceptions.py
|
||
class MediaDownloaderError(Exception):
|
||
"""Base exception"""
|
||
pass
|
||
|
||
class DownloadError(MediaDownloaderError):
|
||
"""Download failed"""
|
||
pass
|
||
|
||
class AuthenticationError(MediaDownloaderError):
|
||
"""Authentication failed"""
|
||
pass
|
||
|
||
class RateLimitError(MediaDownloaderError):
|
||
"""Rate limit exceeded"""
|
||
pass
|
||
|
||
class ValidationError(MediaDownloaderError):
|
||
"""Input validation failed"""
|
||
pass
|
||
|
||
# Structured error responses
|
||
@app.exception_handler(MediaDownloaderError)
|
||
async def handle_app_error(request: Request, exc: MediaDownloaderError):
|
||
return JSONResponse(
|
||
status_code=400,
|
||
content={
|
||
'error': exc.__class__.__name__,
|
||
'message': str(exc),
|
||
'timestamp': datetime.now().isoformat()
|
||
}
|
||
)
|
||
```
|
||
|
||
---
|
||
|
||
### 11. Code Duplication Across Modules
|
||
**Severity:** MEDIUM | **Effort:** 6-8 hours
|
||
|
||
Instagram modules share 60-70% similar code.
|
||
|
||
**Solution:**
|
||
```python
|
||
# modules/base_downloader.py
|
||
from abc import ABC, abstractmethod
|
||
|
||
class BaseDownloader(ABC):
|
||
"""Base class for all downloaders"""
|
||
|
||
def __init__(self, unified_db, log_callback, show_progress):
|
||
self.unified_db = unified_db
|
||
self.log_callback = log_callback
|
||
self.show_progress = show_progress
|
||
|
||
def log(self, message: str, level: str = "info"):
|
||
"""Centralized logging"""
|
||
if self.log_callback:
|
||
self.log_callback(f"[{self.platform_name}] {message}", level)
|
||
|
||
def is_downloaded(self, media_id: str) -> bool:
|
||
return self.unified_db.is_downloaded(media_id, self.platform_name)
|
||
|
||
@abstractmethod
|
||
def download(self, username: str) -> int:
|
||
"""Implement in subclass"""
|
||
pass
|
||
```
|
||
|
||
---
|
||
|
||
### 12. Inconsistent Logging
|
||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||
|
||
Mix of print(), custom callbacks, and logging module.
|
||
|
||
**Solution:**
|
||
```python
|
||
import logging
|
||
import json
|
||
|
||
class StructuredLogger:
|
||
def __init__(self, name: str):
|
||
self.logger = logging.getLogger(name)
|
||
handler = logging.FileHandler('logs/media-downloader.log')
|
||
handler.setFormatter(logging.Formatter('%(message)s'))
|
||
self.logger.addHandler(handler)
|
||
self.logger.setLevel(logging.INFO)
|
||
|
||
def log(self, message: str, level: str = "info", **extra):
|
||
log_entry = {
|
||
'timestamp': datetime.now().isoformat(),
|
||
'level': level.upper(),
|
||
'message': message,
|
||
**extra
|
||
}
|
||
getattr(self.logger, level)(json.dumps(log_entry))
|
||
```
|
||
|
||
---
|
||
|
||
### 13. No Database Migration Strategy
|
||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||
|
||
Schema changes via ad-hoc ALTER TABLE statements.
|
||
|
||
**Solution:** Implement Alembic or custom migration system.
|
||
|
||
---
|
||
|
||
### 14. Missing API Validation
|
||
**Severity:** MEDIUM | **Effort:** 3-4 hours
|
||
|
||
Some endpoints lack Pydantic models.
|
||
|
||
**Solution:** Add comprehensive request/response models.
|
||
|
||
---
|
||
|
||
### 15. No Tests
|
||
**Severity:** MEDIUM | **Effort:** 40-60 hours
|
||
|
||
Zero test coverage.
|
||
|
||
**Solution:** Implement pytest with unit, integration, and E2E tests.
|
||
|
||
---
|
||
|
||
## Low Priority Issues (🟢 Nice to Have)
|
||
|
||
### 16. Frontend Re-render Optimization
|
||
Multiple independent polling timers. Consider WebSocket-only updates.
|
||
|
||
### 17. TypeScript Strict Mode Leverage
|
||
Enable additional strict checks.
|
||
|
||
### 18. API Response Caching
|
||
Add caching for expensive queries.
|
||
|
||
### 19. Database Indexes
|
||
Add indexes on frequently queried columns.
|
||
|
||
### 20. API Versioning
|
||
Implement `/api/v1` prefix for future compatibility.
|
||
|
||
---
|
||
|
||
## Strengths
|
||
|
||
✅ **Excellent Modular Architecture** - Clear separation of concerns
|
||
✅ **Comprehensive Database Design** - WAL mode, connection pooling
|
||
✅ **Modern Frontend Stack** - TypeScript, React, TanStack Query
|
||
✅ **Good Type Hints** - Python type hints improve clarity
|
||
✅ **Rate Limiting** - Sophisticated anti-detection measures
|
||
✅ **WebSocket Real-time** - Live updates for better UX
|
||
✅ **Feature Complete** - Multi-platform support, deduplication, notifications
|
||
|
||
---
|
||
|
||
## Implementation Priorities
|
||
|
||
### Week 1 (Critical - 11-17 hours)
|
||
- [ ] Remove secrets from version control
|
||
- [ ] Fix SQL injection vulnerabilities
|
||
- [ ] Add file path validation
|
||
- [ ] Validate subprocess inputs
|
||
|
||
### Month 1 (High Priority - 32-48 hours)
|
||
- [ ] Refactor large files
|
||
- [ ] Fix connection pool handling
|
||
- [ ] Add CSRF protection
|
||
|
||
### Quarter 1 (Medium Priority - 67-98 hours)
|
||
- [ ] Replace TypeScript any types
|
||
- [ ] Implement error handling strategy
|
||
- [ ] Eliminate code duplication
|
||
- [ ] Standardize logging
|
||
- [ ] Add database migrations
|
||
- [ ] Implement test suite
|
||
|
||
### Ongoing (Low Priority - 15-23 hours)
|
||
- [ ] Optimize frontend performance
|
||
- [ ] Leverage TypeScript strict mode
|
||
- [ ] Add API caching
|
||
- [ ] Add database indexes
|
||
- [ ] Implement API versioning
|
||
|
||
---
|
||
|
||
## Metrics
|
||
|
||
**Current State:**
|
||
- Code Quality Score: 6.5/10
|
||
- Security Score: 4/10
|
||
- Test Coverage: 0%
|
||
- Technical Debt: HIGH
|
||
|
||
**Target State (After Improvements):**
|
||
- Code Quality Score: 8.5/10
|
||
- Security Score: 9/10
|
||
- Test Coverage: 70%+
|
||
- Technical Debt: LOW
|
||
|
||
---
|
||
|
||
## Conclusion
|
||
|
||
The Media Downloader is a well-architected application that demonstrates solid engineering principles. However, **critical security issues must be addressed immediately** to prevent data breaches and system compromise.
|
||
|
||
With systematic implementation of these recommendations, this will evolve into a production-ready, enterprise-grade system with excellent security, maintainability, and performance.
|
||
|
||
**Total Estimated Effort:** 125-186 hours (3-4 months at 10-15 hrs/week)
|
||
|
||
**Next Steps:**
|
||
1. Review and prioritize recommendations
|
||
2. Create GitHub issues for each item
|
||
3. Begin with Week 1 critical fixes
|
||
4. Establish regular review cadence
|