908
docs/archive/CODE_REVIEW_2025-10-31.md
Normal file
908
docs/archive/CODE_REVIEW_2025-10-31.md
Normal file
@@ -0,0 +1,908 @@
|
||||
# Media Downloader - Comprehensive Code Review
|
||||
**Date:** 2025-10-31
|
||||
**Version:** 6.3.4
|
||||
**Reviewer:** Claude Code (Automated Analysis)
|
||||
**Scope:** Full codebase - Backend, Frontend, Database, Architecture
|
||||
|
||||
---
|
||||
|
||||
## Executive Summary
|
||||
|
||||
The Media Downloader is a **feature-rich, architecturally sound application** with excellent modular design and modern technology choices. The codebase demonstrates solid engineering principles with a unified database, clear separation of concerns, and comprehensive feature coverage.
|
||||
|
||||
**Overall Assessment:**
|
||||
- **Code Quality:** 6.5/10 - Good structure but needs refactoring
|
||||
- **Security:** 4/10 - **CRITICAL issues** requiring immediate attention
|
||||
- **Performance:** 7/10 - Generally good with optimization opportunities
|
||||
- **Maintainability:** 6/10 - Large files, some duplication, limited tests
|
||||
- **Architecture:** 8/10 - Excellent modular design
|
||||
|
||||
### Key Statistics
|
||||
- **Total Lines of Code:** 37,966
|
||||
- **Python Files:** 49 (including 20 modules, 2 backend files)
|
||||
- **TypeScript Files:** 20
|
||||
- **Documentation Files:** 11 (in docs/)
|
||||
- **Test Files:** 0 ⚠️
|
||||
|
||||
### Critical Findings
|
||||
🔴 **4 Critical Security Issues** - Require immediate action
|
||||
🟠 **4 High Priority Issues** - Fix within 1-2 weeks
|
||||
🟡 **7 Medium Priority Issues** - Address within 1-3 months
|
||||
🟢 **5 Low Priority Issues** - Nice to have improvements
|
||||
|
||||
---
|
||||
|
||||
## Critical Issues (🔴 Fix Immediately)
|
||||
|
||||
### 1. Hardcoded Secrets in Configuration
|
||||
**Severity:** CRITICAL | **Effort:** 2-4 hours | **Risk:** Data breach
|
||||
|
||||
**Location:** `/opt/media-downloader/config/settings.json`
|
||||
|
||||
**Problem:**
|
||||
```json
|
||||
{
|
||||
"password": "cpc6rvm!wvf_wft2EHN",
|
||||
"totp_secret": "OVLX4K6NHTUJTUJVL4TLHXJ55SIEDOOY",
|
||||
"api_key": "SC1dje6Zo5VhGPmy9vyfkeuBY0MZ7VfgrhI8wIvjOM",
|
||||
"api_token": "a3jmhwnhecq9k9dz3tzv2bdk7uc29p"
|
||||
}
|
||||
```
|
||||
|
||||
Credentials are stored in plaintext and tracked in version control. Anyone with repository access has full account credentials. Git history cannot be cleaned without force-pushing.
|
||||
|
||||
**Impact:**
|
||||
- All forum passwords, API keys, and TOTP secrets exposed
|
||||
- Cannot rotate credentials without code changes
|
||||
- Violates OWASP A02:2021 – Cryptographic Failures
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# 1. Immediate: Add to .gitignore
|
||||
echo "config/settings.json" >> .gitignore
|
||||
echo ".env" >> .gitignore
|
||||
|
||||
# 2. Create environment variable template
|
||||
cat > config/settings.example.json <<EOF
|
||||
{
|
||||
"forums": {
|
||||
"password": "FORUM_PASSWORD",
|
||||
"totp_secret": "FORUM_TOTP_SECRET"
|
||||
},
|
||||
"snapchat": {
|
||||
"password": "SNAPCHAT_PASSWORD"
|
||||
},
|
||||
"tiktok": {
|
||||
"api_key": "TIKTOK_API_KEY",
|
||||
"api_token": "TIKTOK_API_TOKEN"
|
||||
}
|
||||
}
|
||||
EOF
|
||||
|
||||
# 3. Create .env file (add to .gitignore)
|
||||
cat > .env.example <<EOF
|
||||
FORUM_PASSWORD=your_password_here
|
||||
FORUM_TOTP_SECRET=your_totp_secret_here
|
||||
SNAPCHAT_PASSWORD=your_password_here
|
||||
TIKTOK_API_KEY=your_api_key_here
|
||||
TIKTOK_API_TOKEN=your_api_token_here
|
||||
EOF
|
||||
```
|
||||
|
||||
**Implementation:**
|
||||
```python
|
||||
# modules/secrets_manager.py
|
||||
import os
|
||||
from pathlib import Path
|
||||
from dotenv import load_dotenv
|
||||
from typing import Optional
|
||||
|
||||
class SecretsManager:
|
||||
"""Secure secrets management using environment variables"""
|
||||
|
||||
def __init__(self, env_file: Optional[Path] = None):
|
||||
if env_file is None:
|
||||
env_file = Path(__file__).parent.parent / '.env'
|
||||
|
||||
if env_file.exists():
|
||||
load_dotenv(env_file)
|
||||
|
||||
def get_secret(self, key: str, default: Optional[str] = None) -> str:
|
||||
"""Get secret from environment, raise if not found and no default"""
|
||||
value = os.getenv(key, default)
|
||||
if value is None:
|
||||
raise ValueError(f"Secret '{key}' not found in environment")
|
||||
return value
|
||||
|
||||
def get_optional_secret(self, key: str) -> Optional[str]:
|
||||
"""Get secret from environment, return None if not found"""
|
||||
return os.getenv(key)
|
||||
|
||||
# Usage in modules
|
||||
secrets = SecretsManager()
|
||||
forum_password = secrets.get_secret('FORUM_PASSWORD')
|
||||
```
|
||||
|
||||
**Rollout Plan:**
|
||||
1. Create `.env.example` with placeholder values
|
||||
2. Add `.gitignore` entries for `.env` and `config/settings.json`
|
||||
3. Document secret setup in `INSTALL.md`
|
||||
4. Update all modules to use `SecretsManager`
|
||||
5. Notify team to create local `.env` files
|
||||
6. Remove secrets from `settings.json` (keep structure)
|
||||
|
||||
---
|
||||
|
||||
### 2. SQL Injection Vulnerabilities
|
||||
**Severity:** CRITICAL | **Effort:** 4-6 hours | **Risk:** Database compromise
|
||||
|
||||
**Location:** `/opt/media-downloader/web/backend/api.py` (multiple locations)
|
||||
|
||||
**Problem:**
|
||||
F-string SQL queries with user-controlled input:
|
||||
|
||||
```python
|
||||
# Line ~478-482 (VULNERABLE)
|
||||
cursor.execute(f"""
|
||||
SELECT COUNT(*) FROM downloads
|
||||
WHERE download_date >= datetime('now', '-1 day')
|
||||
AND {filters}
|
||||
""")
|
||||
|
||||
# Line ~830-850 (VULNERABLE)
|
||||
query = f"SELECT * FROM downloads WHERE platform = '{platform}'"
|
||||
cursor.execute(query)
|
||||
```
|
||||
|
||||
The `filters` variable is constructed from user input (`platform`, `source`, `search`) without proper sanitization.
|
||||
|
||||
**Impact:**
|
||||
- Attackers can inject arbitrary SQL commands
|
||||
- Can drop tables: `'; DROP TABLE downloads; --`
|
||||
- Can exfiltrate data: `' OR 1=1 UNION SELECT * FROM users --`
|
||||
- Can bypass authentication
|
||||
- OWASP A03:2021 – Injection
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
# BEFORE (VULNERABLE)
|
||||
platform = request.query_params.get('platform')
|
||||
query = f"SELECT * FROM downloads WHERE platform = '{platform}'"
|
||||
cursor.execute(query)
|
||||
|
||||
# AFTER (SECURE)
|
||||
platform = request.query_params.get('platform')
|
||||
query = "SELECT * FROM downloads WHERE platform = ?"
|
||||
cursor.execute(query, (platform,))
|
||||
|
||||
# For dynamic filters
|
||||
def build_safe_query(filters: dict) -> tuple[str, tuple]:
|
||||
"""Build parameterized query from filters"""
|
||||
conditions = []
|
||||
params = []
|
||||
|
||||
if filters.get('platform'):
|
||||
conditions.append("platform = ?")
|
||||
params.append(filters['platform'])
|
||||
|
||||
if filters.get('source'):
|
||||
conditions.append("source = ?")
|
||||
params.append(filters['source'])
|
||||
|
||||
if filters.get('search'):
|
||||
conditions.append("(filename LIKE ? OR source LIKE ?)")
|
||||
search_pattern = f"%{filters['search']}%"
|
||||
params.extend([search_pattern, search_pattern])
|
||||
|
||||
where_clause = " AND ".join(conditions) if conditions else "1=1"
|
||||
return where_clause, tuple(params)
|
||||
|
||||
# Usage
|
||||
filters = build_safe_query(request.query_params)
|
||||
query = f"SELECT * FROM downloads WHERE {filters[0]}"
|
||||
cursor.execute(query, filters[1])
|
||||
```
|
||||
|
||||
**Files Requiring Fixes:**
|
||||
- `/opt/media-downloader/web/backend/api.py` (17+ instances)
|
||||
- Lines 478-482, 520-540, 830-850, 910-930
|
||||
- `/opt/media-downloader/utilities/db_manager.py` (2 instances)
|
||||
|
||||
**Testing:**
|
||||
```python
|
||||
# Test case for SQL injection prevention
|
||||
def test_sql_injection_prevention():
|
||||
# Try to inject SQL
|
||||
malicious_input = "'; DROP TABLE downloads; --"
|
||||
response = client.get(f"/api/downloads?platform={malicious_input}")
|
||||
|
||||
# Should not execute injection
|
||||
assert response.status_code in [400, 404] # Bad request or not found
|
||||
|
||||
# Verify table still exists
|
||||
assert db.table_exists('downloads')
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 3. Path Traversal Vulnerabilities
|
||||
**Severity:** HIGH | **Effort:** 3-4 hours | **Risk:** File system access
|
||||
|
||||
**Location:** `/opt/media-downloader/web/backend/api.py` (media endpoints)
|
||||
|
||||
**Problem:**
|
||||
File paths from user input are not validated:
|
||||
|
||||
```python
|
||||
# Lines ~1920+ (VULNERABLE)
|
||||
@app.get("/api/media/preview")
|
||||
async def get_media_preview(file_path: str, ...):
|
||||
# No validation - attacker could use ../../etc/passwd
|
||||
return FileResponse(file_path)
|
||||
|
||||
@app.get("/api/media/thumbnail")
|
||||
async def get_media_thumbnail(file_path: str, ...):
|
||||
# No validation
|
||||
requested_path = Path(file_path)
|
||||
return FileResponse(requested_path)
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Read arbitrary files: `/etc/passwd`, `/etc/shadow`, database files
|
||||
- Access configuration with secrets
|
||||
- Data exfiltration via media endpoints
|
||||
- OWASP A01:2021 – Broken Access Control
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
from pathlib import Path
|
||||
from fastapi import HTTPException
|
||||
|
||||
ALLOWED_MEDIA_BASE = Path("/opt/immich/md")
|
||||
|
||||
def validate_file_path(file_path: str, allowed_base: Path) -> Path:
|
||||
"""
|
||||
Ensure file_path is within allowed directory.
|
||||
Prevents directory traversal attacks.
|
||||
"""
|
||||
try:
|
||||
# Resolve to absolute path
|
||||
requested = Path(file_path).resolve()
|
||||
|
||||
# Check if within allowed directory
|
||||
if not requested.is_relative_to(allowed_base):
|
||||
raise ValueError(f"Path outside allowed directory")
|
||||
|
||||
# Check file exists
|
||||
if not requested.exists():
|
||||
raise FileNotFoundError()
|
||||
|
||||
# Check it's a file, not directory
|
||||
if not requested.is_file():
|
||||
raise ValueError("Path is not a file")
|
||||
|
||||
return requested
|
||||
|
||||
except (ValueError, FileNotFoundError) as e:
|
||||
raise HTTPException(
|
||||
status_code=403,
|
||||
detail="Access denied: Invalid file path"
|
||||
)
|
||||
|
||||
@app.get("/api/media/preview")
|
||||
async def get_media_preview(
|
||||
file_path: str,
|
||||
current_user: Dict = Depends(get_current_user_media)
|
||||
):
|
||||
"""Serve media file with path validation"""
|
||||
safe_path = validate_file_path(file_path, ALLOWED_MEDIA_BASE)
|
||||
return FileResponse(safe_path)
|
||||
```
|
||||
|
||||
**Test Cases:**
|
||||
```python
|
||||
# Path traversal attack attempts
|
||||
test_cases = [
|
||||
"../../etc/passwd",
|
||||
"/etc/passwd",
|
||||
"../../../root/.ssh/id_rsa",
|
||||
"....//....//etc/passwd",
|
||||
"%2e%2e%2f%2e%2e%2fetc%2fpasswd", # URL encoded
|
||||
]
|
||||
|
||||
for attack in test_cases:
|
||||
response = client.get(f"/api/media/preview?file_path={attack}")
|
||||
assert response.status_code == 403, f"Failed to block: {attack}"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 4. Command Injection Risk
|
||||
**Severity:** HIGH | **Effort:** 2-3 hours | **Risk:** Code execution
|
||||
|
||||
**Location:** `/opt/media-downloader/web/backend/api.py`
|
||||
|
||||
**Problem:**
|
||||
Subprocess calls with user input:
|
||||
|
||||
```python
|
||||
# Line ~1314
|
||||
@app.post("/api/platforms/{platform}/trigger")
|
||||
async def trigger_platform_download(platform: str, ...):
|
||||
cmd = ["python3", "/opt/media-downloader/media-downloader.py", "--platform", platform]
|
||||
process = await asyncio.create_subprocess_exec(*cmd, ...)
|
||||
```
|
||||
|
||||
While using a list (safer than shell=True), the `platform` parameter is not validated against a whitelist.
|
||||
|
||||
**Impact:**
|
||||
- Could inject commands if platform validation is bypassed
|
||||
- Potential code execution via crafted platform names
|
||||
- OWASP A03:2021 – Injection
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
from enum import Enum
|
||||
from typing import Literal
|
||||
|
||||
# Define allowed platforms as enum
|
||||
class Platform(str, Enum):
|
||||
INSTAGRAM = "instagram"
|
||||
FASTDL = "fastdl"
|
||||
IMGINN = "imginn"
|
||||
TOOLZU = "toolzu"
|
||||
SNAPCHAT = "snapchat"
|
||||
TIKTOK = "tiktok"
|
||||
FORUMS = "forums"
|
||||
ALL = "all"
|
||||
|
||||
@app.post("/api/platforms/{platform}/trigger")
|
||||
async def trigger_platform_download(
|
||||
platform: Platform, # Type hint enforces validation
|
||||
trigger_data: TriggerRequest,
|
||||
background_tasks: BackgroundTasks,
|
||||
current_user: Dict = Depends(get_current_user)
|
||||
):
|
||||
"""Trigger download with validated platform"""
|
||||
# FastAPI automatically validates against enum
|
||||
cmd = [
|
||||
"python3",
|
||||
"/opt/media-downloader/media-downloader.py",
|
||||
"--platform",
|
||||
platform.value # Safe - enum member
|
||||
]
|
||||
|
||||
process = await asyncio.create_subprocess_exec(
|
||||
*cmd,
|
||||
stdout=asyncio.subprocess.PIPE,
|
||||
stderr=asyncio.subprocess.PIPE
|
||||
)
|
||||
```
|
||||
|
||||
**Additional Hardening:**
|
||||
```python
|
||||
# Subprocess wrapper with additional safety
|
||||
import shlex
|
||||
|
||||
def safe_subprocess_exec(cmd: List[str], allowed_commands: Set[str]):
|
||||
"""Execute subprocess with command whitelist"""
|
||||
if cmd[0] not in allowed_commands:
|
||||
raise ValueError(f"Command not allowed: {cmd[0]}")
|
||||
|
||||
# Validate all arguments are safe
|
||||
for arg in cmd:
|
||||
if any(char in arg for char in [';', '&', '|', '$', '`']):
|
||||
raise ValueError(f"Dangerous character in argument: {arg}")
|
||||
|
||||
return subprocess.run(cmd, capture_output=True, text=True, timeout=300)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## High Priority Issues (🟠 Fix Soon)
|
||||
|
||||
### 5. Massive Files - Maintainability Crisis
|
||||
**Severity:** HIGH | **Effort:** 24-36 hours | **Risk:** Technical debt
|
||||
|
||||
**Problem:**
|
||||
Several files exceed 2,000 lines, violating single responsibility principle:
|
||||
|
||||
| File | Lines | Size |
|
||||
|------|-------|------|
|
||||
| `modules/forum_downloader.py` | 3,971 | 167 KB |
|
||||
| `media-downloader.py` | 2,653 | - |
|
||||
| `web/backend/api.py` | 2,649 | 94 KB |
|
||||
| `modules/imginn_module.py` | 2,542 | 129 KB |
|
||||
|
||||
**Impact:**
|
||||
- Difficult to navigate and understand
|
||||
- Hard to test individual components
|
||||
- Increases cognitive load
|
||||
- Higher bug density
|
||||
- Makes code reviews painful
|
||||
- Merge conflicts more frequent
|
||||
|
||||
**Recommended Structure:**
|
||||
|
||||
```
|
||||
# For api.py refactoring:
|
||||
web/backend/
|
||||
├── main.py (FastAPI app initialization, 100-150 lines)
|
||||
├── dependencies.py (auth dependencies, 50-100 lines)
|
||||
├── middleware.py (CORS, rate limiting, 50-100 lines)
|
||||
├── routers/
|
||||
│ ├── __init__.py
|
||||
│ ├── auth.py (authentication endpoints, 150-200 lines)
|
||||
│ ├── downloads.py (download endpoints, 200-300 lines)
|
||||
│ ├── scheduler.py (scheduler endpoints, 150-200 lines)
|
||||
│ ├── media.py (media endpoints, 150-200 lines)
|
||||
│ ├── health.py (health/monitoring, 100-150 lines)
|
||||
│ └── config.py (configuration endpoints, 100-150 lines)
|
||||
├── services/
|
||||
│ ├── download_service.py (download business logic)
|
||||
│ ├── scheduler_service.py (scheduler business logic)
|
||||
│ └── media_service.py (media processing logic)
|
||||
├── models/
|
||||
│ ├── requests.py (Pydantic request models)
|
||||
│ ├── responses.py (Pydantic response models)
|
||||
│ └── schemas.py (database schemas)
|
||||
└── utils/
|
||||
├── validators.py (input validation)
|
||||
└── helpers.py (utility functions)
|
||||
```
|
||||
|
||||
**Migration Plan:**
|
||||
1. Create new directory structure
|
||||
2. Extract routers one at a time (start with health, least dependencies)
|
||||
3. Move business logic to services
|
||||
4. Extract Pydantic models
|
||||
5. Update imports gradually
|
||||
6. Test after each extraction
|
||||
7. Remove old code once verified
|
||||
|
||||
---
|
||||
|
||||
### 6. Database Connection Pool Exhaustion
|
||||
**Severity:** HIGH | **Effort:** 4-6 hours | **Risk:** Application hang
|
||||
|
||||
**Location:** `/opt/media-downloader/modules/unified_database.py`
|
||||
|
||||
**Problem:**
|
||||
Connection pool implementation has potential leaks:
|
||||
|
||||
```python
|
||||
# Line 119-130 (PROBLEMATIC)
|
||||
def get_connection(self, for_write=False):
|
||||
try:
|
||||
if self.pool:
|
||||
with self.pool.get_connection(for_write=for_write) as conn:
|
||||
yield conn
|
||||
else:
|
||||
conn = sqlite3.connect(...)
|
||||
# ⚠️ No try/finally - connection might not close on error
|
||||
yield conn
|
||||
```
|
||||
|
||||
**Impact:**
|
||||
- Connection leaks under error conditions
|
||||
- Pool exhaustion causes application hang
|
||||
- No monitoring of pool health
|
||||
- Memory leaks
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
from contextlib import contextmanager
|
||||
from typing import Generator
|
||||
import sqlite3
|
||||
|
||||
@contextmanager
|
||||
def get_connection(
|
||||
self,
|
||||
for_write: bool = False
|
||||
) -> Generator[sqlite3.Connection, None, None]:
|
||||
"""
|
||||
Get database connection with guaranteed cleanup.
|
||||
|
||||
Args:
|
||||
for_write: If True, ensures exclusive write access
|
||||
|
||||
Yields:
|
||||
sqlite3.Connection: Database connection
|
||||
|
||||
Raises:
|
||||
sqlite3.Error: On connection/query errors
|
||||
"""
|
||||
conn = None
|
||||
try:
|
||||
if self.pool:
|
||||
conn = self.pool.get_connection(for_write=for_write)
|
||||
else:
|
||||
conn = sqlite3.connect(
|
||||
str(self.db_path),
|
||||
timeout=30,
|
||||
check_same_thread=False
|
||||
)
|
||||
conn.row_factory = sqlite3.Row
|
||||
|
||||
yield conn
|
||||
|
||||
# Commit if no exceptions
|
||||
if for_write:
|
||||
conn.commit()
|
||||
|
||||
except sqlite3.Error as e:
|
||||
# Rollback on error
|
||||
if conn and for_write:
|
||||
conn.rollback()
|
||||
logger.error(f"Database error: {e}")
|
||||
raise
|
||||
|
||||
finally:
|
||||
# Always close connection
|
||||
if conn:
|
||||
conn.close()
|
||||
|
||||
# Add pool monitoring
|
||||
def get_pool_stats(self) -> dict:
|
||||
"""Get connection pool statistics"""
|
||||
if not self.pool:
|
||||
return {'pool_enabled': False}
|
||||
|
||||
return {
|
||||
'pool_enabled': True,
|
||||
'active_connections': self.pool.active_connections,
|
||||
'max_connections': self.pool.max_connections,
|
||||
'available': self.pool.max_connections - self.pool.active_connections,
|
||||
'wait_count': self.pool.wait_count,
|
||||
'timeout_count': self.pool.timeout_count
|
||||
}
|
||||
|
||||
# Add to health endpoint
|
||||
@app.get("/api/health/database")
|
||||
async def get_database_health():
|
||||
stats = app_state.db.get_pool_stats()
|
||||
|
||||
# Alert if low on connections
|
||||
if stats.get('available', 0) < 2:
|
||||
logger.warning("Database connection pool nearly exhausted")
|
||||
|
||||
return stats
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 7. No Authentication Rate Limiting (Already Fixed)
|
||||
**Severity:** HIGH | **Status:** ✅ FIXED in 6.3.4
|
||||
|
||||
Rate limiting has been implemented in version 6.3.4 using slowapi:
|
||||
- Login: 5 requests/minute
|
||||
- Auth endpoints: 10 requests/minute
|
||||
- Read endpoints: 100 requests/minute
|
||||
|
||||
No additional action required.
|
||||
|
||||
---
|
||||
|
||||
### 8. Missing CSRF Protection
|
||||
**Severity:** HIGH | **Effort:** 2-3 hours | **Risk:** Unauthorized actions
|
||||
|
||||
**Problem:**
|
||||
No CSRF tokens on state-changing operations. Attackers can craft malicious pages that trigger actions on behalf of authenticated users.
|
||||
|
||||
**Impact:**
|
||||
- Delete downloads via CSRF
|
||||
- Trigger new downloads
|
||||
- Modify configuration
|
||||
- Stop running tasks
|
||||
- OWASP A01:2021 – Broken Access Control
|
||||
|
||||
**Solution:**
|
||||
```bash
|
||||
# Install CSRF protection
|
||||
pip install fastapi-csrf-protect
|
||||
```
|
||||
|
||||
```python
|
||||
# web/backend/main.py
|
||||
from fastapi_csrf_protect import CsrfProtect
|
||||
from fastapi_csrf_protect.exceptions import CsrfProtectError
|
||||
from pydantic import BaseModel
|
||||
|
||||
class CsrfSettings(BaseModel):
|
||||
secret_key: str = os.getenv('CSRF_SECRET_KEY', secrets.token_urlsafe(32))
|
||||
cookie_samesite: str = 'strict'
|
||||
|
||||
@CsrfProtect.load_config
|
||||
def get_csrf_config():
|
||||
return CsrfSettings()
|
||||
|
||||
# Apply to state-changing endpoints
|
||||
@app.post("/api/platforms/{platform}/trigger")
|
||||
async def trigger_download(
|
||||
request: Request,
|
||||
csrf_protect: CsrfProtect = Depends()
|
||||
):
|
||||
# Validate CSRF token
|
||||
await csrf_protect.validate_csrf(request)
|
||||
# Rest of code...
|
||||
|
||||
# Frontend: Include CSRF token
|
||||
// api.ts
|
||||
async post<T>(endpoint: string, data: any): Promise<T> {
|
||||
const csrfToken = this.getCsrfToken()
|
||||
return fetch(`${API_BASE}${endpoint}`, {
|
||||
method: 'POST',
|
||||
headers: {
|
||||
'Content-Type': 'application/json',
|
||||
'X-CSRF-Token': csrfToken
|
||||
},
|
||||
body: JSON.stringify(data)
|
||||
})
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Medium Priority Issues (🟡 Address This Quarter)
|
||||
|
||||
### 9. TypeScript 'any' Type Overuse
|
||||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||||
|
||||
70+ instances of `any` type defeat TypeScript's purpose.
|
||||
|
||||
**Solution:**
|
||||
```typescript
|
||||
// Define proper interfaces
|
||||
interface User {
|
||||
id: number
|
||||
username: string
|
||||
role: 'admin' | 'user' | 'viewer'
|
||||
email?: string
|
||||
preferences: UserPreferences
|
||||
}
|
||||
|
||||
interface UserPreferences {
|
||||
theme: 'light' | 'dark'
|
||||
notifications: boolean
|
||||
}
|
||||
|
||||
interface PlatformConfig {
|
||||
enabled: boolean
|
||||
check_interval_hours: number
|
||||
accounts?: Account[]
|
||||
usernames?: string[]
|
||||
run_at_start?: boolean
|
||||
}
|
||||
|
||||
// Replace any with proper types
|
||||
async getMe(): Promise<User> {
|
||||
return this.get<User>('/auth/me')
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 10. No Comprehensive Error Handling
|
||||
**Severity:** MEDIUM | **Effort:** 6-8 hours
|
||||
|
||||
115 try/except blocks with generic `except Exception` catching.
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
# modules/exceptions.py
|
||||
class MediaDownloaderError(Exception):
|
||||
"""Base exception"""
|
||||
pass
|
||||
|
||||
class DownloadError(MediaDownloaderError):
|
||||
"""Download failed"""
|
||||
pass
|
||||
|
||||
class AuthenticationError(MediaDownloaderError):
|
||||
"""Authentication failed"""
|
||||
pass
|
||||
|
||||
class RateLimitError(MediaDownloaderError):
|
||||
"""Rate limit exceeded"""
|
||||
pass
|
||||
|
||||
class ValidationError(MediaDownloaderError):
|
||||
"""Input validation failed"""
|
||||
pass
|
||||
|
||||
# Structured error responses
|
||||
@app.exception_handler(MediaDownloaderError)
|
||||
async def handle_app_error(request: Request, exc: MediaDownloaderError):
|
||||
return JSONResponse(
|
||||
status_code=400,
|
||||
content={
|
||||
'error': exc.__class__.__name__,
|
||||
'message': str(exc),
|
||||
'timestamp': datetime.now().isoformat()
|
||||
}
|
||||
)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 11. Code Duplication Across Modules
|
||||
**Severity:** MEDIUM | **Effort:** 6-8 hours
|
||||
|
||||
Instagram modules share 60-70% similar code.
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
# modules/base_downloader.py
|
||||
from abc import ABC, abstractmethod
|
||||
|
||||
class BaseDownloader(ABC):
|
||||
"""Base class for all downloaders"""
|
||||
|
||||
def __init__(self, unified_db, log_callback, show_progress):
|
||||
self.unified_db = unified_db
|
||||
self.log_callback = log_callback
|
||||
self.show_progress = show_progress
|
||||
|
||||
def log(self, message: str, level: str = "info"):
|
||||
"""Centralized logging"""
|
||||
if self.log_callback:
|
||||
self.log_callback(f"[{self.platform_name}] {message}", level)
|
||||
|
||||
def is_downloaded(self, media_id: str) -> bool:
|
||||
return self.unified_db.is_downloaded(media_id, self.platform_name)
|
||||
|
||||
@abstractmethod
|
||||
def download(self, username: str) -> int:
|
||||
"""Implement in subclass"""
|
||||
pass
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 12. Inconsistent Logging
|
||||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||||
|
||||
Mix of print(), custom callbacks, and logging module.
|
||||
|
||||
**Solution:**
|
||||
```python
|
||||
import logging
|
||||
import json
|
||||
|
||||
class StructuredLogger:
|
||||
def __init__(self, name: str):
|
||||
self.logger = logging.getLogger(name)
|
||||
handler = logging.FileHandler('logs/media-downloader.log')
|
||||
handler.setFormatter(logging.Formatter('%(message)s'))
|
||||
self.logger.addHandler(handler)
|
||||
self.logger.setLevel(logging.INFO)
|
||||
|
||||
def log(self, message: str, level: str = "info", **extra):
|
||||
log_entry = {
|
||||
'timestamp': datetime.now().isoformat(),
|
||||
'level': level.upper(),
|
||||
'message': message,
|
||||
**extra
|
||||
}
|
||||
getattr(self.logger, level)(json.dumps(log_entry))
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 13. No Database Migration Strategy
|
||||
**Severity:** MEDIUM | **Effort:** 4-6 hours
|
||||
|
||||
Schema changes via ad-hoc ALTER TABLE statements.
|
||||
|
||||
**Solution:** Implement Alembic or custom migration system.
|
||||
|
||||
---
|
||||
|
||||
### 14. Missing API Validation
|
||||
**Severity:** MEDIUM | **Effort:** 3-4 hours
|
||||
|
||||
Some endpoints lack Pydantic models.
|
||||
|
||||
**Solution:** Add comprehensive request/response models.
|
||||
|
||||
---
|
||||
|
||||
### 15. No Tests
|
||||
**Severity:** MEDIUM | **Effort:** 40-60 hours
|
||||
|
||||
Zero test coverage.
|
||||
|
||||
**Solution:** Implement pytest with unit, integration, and E2E tests.
|
||||
|
||||
---
|
||||
|
||||
## Low Priority Issues (🟢 Nice to Have)
|
||||
|
||||
### 16. Frontend Re-render Optimization
|
||||
Multiple independent polling timers. Consider WebSocket-only updates.
|
||||
|
||||
### 17. TypeScript Strict Mode Leverage
|
||||
Enable additional strict checks.
|
||||
|
||||
### 18. API Response Caching
|
||||
Add caching for expensive queries.
|
||||
|
||||
### 19. Database Indexes
|
||||
Add indexes on frequently queried columns.
|
||||
|
||||
### 20. API Versioning
|
||||
Implement `/api/v1` prefix for future compatibility.
|
||||
|
||||
---
|
||||
|
||||
## Strengths
|
||||
|
||||
✅ **Excellent Modular Architecture** - Clear separation of concerns
|
||||
✅ **Comprehensive Database Design** - WAL mode, connection pooling
|
||||
✅ **Modern Frontend Stack** - TypeScript, React, TanStack Query
|
||||
✅ **Good Type Hints** - Python type hints improve clarity
|
||||
✅ **Rate Limiting** - Sophisticated anti-detection measures
|
||||
✅ **WebSocket Real-time** - Live updates for better UX
|
||||
✅ **Feature Complete** - Multi-platform support, deduplication, notifications
|
||||
|
||||
---
|
||||
|
||||
## Implementation Priorities
|
||||
|
||||
### Week 1 (Critical - 11-17 hours)
|
||||
- [ ] Remove secrets from version control
|
||||
- [ ] Fix SQL injection vulnerabilities
|
||||
- [ ] Add file path validation
|
||||
- [ ] Validate subprocess inputs
|
||||
|
||||
### Month 1 (High Priority - 32-48 hours)
|
||||
- [ ] Refactor large files
|
||||
- [ ] Fix connection pool handling
|
||||
- [ ] Add CSRF protection
|
||||
|
||||
### Quarter 1 (Medium Priority - 67-98 hours)
|
||||
- [ ] Replace TypeScript any types
|
||||
- [ ] Implement error handling strategy
|
||||
- [ ] Eliminate code duplication
|
||||
- [ ] Standardize logging
|
||||
- [ ] Add database migrations
|
||||
- [ ] Implement test suite
|
||||
|
||||
### Ongoing (Low Priority - 15-23 hours)
|
||||
- [ ] Optimize frontend performance
|
||||
- [ ] Leverage TypeScript strict mode
|
||||
- [ ] Add API caching
|
||||
- [ ] Add database indexes
|
||||
- [ ] Implement API versioning
|
||||
|
||||
---
|
||||
|
||||
## Metrics
|
||||
|
||||
**Current State:**
|
||||
- Code Quality Score: 6.5/10
|
||||
- Security Score: 4/10
|
||||
- Test Coverage: 0%
|
||||
- Technical Debt: HIGH
|
||||
|
||||
**Target State (After Improvements):**
|
||||
- Code Quality Score: 8.5/10
|
||||
- Security Score: 9/10
|
||||
- Test Coverage: 70%+
|
||||
- Technical Debt: LOW
|
||||
|
||||
---
|
||||
|
||||
## Conclusion
|
||||
|
||||
The Media Downloader is a well-architected application that demonstrates solid engineering principles. However, **critical security issues must be addressed immediately** to prevent data breaches and system compromise.
|
||||
|
||||
With systematic implementation of these recommendations, this will evolve into a production-ready, enterprise-grade system with excellent security, maintainability, and performance.
|
||||
|
||||
**Total Estimated Effort:** 125-186 hours (3-4 months at 10-15 hrs/week)
|
||||
|
||||
**Next Steps:**
|
||||
1. Review and prioritize recommendations
|
||||
2. Create GitHub issues for each item
|
||||
3. Begin with Week 1 critical fixes
|
||||
4. Establish regular review cadence
|
||||
Reference in New Issue
Block a user