12 KiB
Code Improvements Implementation Guide
Generated: 2025-11-09
Estimated Total Time: 7-11 hours
Tasks: 18
PHASE 1: CRITICAL SECURITY (Priority: HIGHEST)
1. Fix Token Exposure in URLs ⏱️ 45min
Problem: Tokens passed as query parameters expose them in logs, browser history, referer headers
Current Code (web/frontend/src/lib/api.ts:558-568):
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
const token = localStorage.getItem('auth_token')
const tokenParam = token ? `&token=${encodeURIComponent(token)}` : ''
return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}${tokenParam}`
}
Solution: Use session cookies for media endpoints
Backend Changes:
# web/backend/api.py - Remove token parameter, rely on cookie auth
@app.get("/api/media/thumbnail")
async def get_media_thumbnail(
request: Request,
file_path: str,
media_type: str,
current_user: Dict = Depends(get_current_user_from_cookie) # Use cookie only
):
# Remove: token: str = None parameter
pass
Frontend Changes:
// web/frontend/src/lib/api.ts
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
// Remove token handling - browser will send cookie automatically
return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}`
}
Testing:
- Thumbnails still load after login
- 401 returned when not authenticated
- No tokens visible in browser Network tab URLs
2. Add Path Traversal Validation ⏱️ 30min
Problem: File paths from frontend not validated, risk of ../../../etc/passwd attacks
Solution: Create path validation utility
New File (web/backend/security.py):
from pathlib import Path
from fastapi import HTTPException
def validate_file_path(file_path: str, allowed_base: Path) -> Path:
"""
Validate file path prevents directory traversal
Args:
file_path: User-provided file path
allowed_base: Base directory that file must be under
Returns:
Resolved Path object
Raises:
HTTPException: If path traversal detected
"""
try:
# Resolve to absolute path
real_path = Path(file_path).resolve()
allowed_base = allowed_base.resolve()
# Check if path is under allowed base
if not str(real_path).startswith(str(allowed_base)):
raise HTTPException(
status_code=403,
detail="Access denied: Path traversal detected"
)
# Check if file exists
if not real_path.exists():
raise HTTPException(status_code=404, detail="File not found")
return real_path
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid file path: {e}")
Usage in endpoints:
from web.backend.security import validate_file_path
@app.get("/api/media/preview")
async def get_media_preview(file_path: str, ...):
# Validate path
downloads_base = Path("/opt/media-downloader/downloads")
safe_path = validate_file_path(file_path, downloads_base)
# Use safe_path from here on
return FileResponse(safe_path)
Testing:
- Normal paths work:
/downloads/user/image.jpg - Traversal blocked:
/downloads/../../etc/passwd→ 403 - Absolute paths blocked:
/etc/passwd→ 403
3. Add CSRF Protection ⏱️ 40min
Problem: No CSRF tokens, POST/PUT/DELETE endpoints vulnerable
Solution: Add CSRF middleware
Install dependency:
pip install starlette-csrf
Backend Changes (web/backend/api.py):
from starlette_csrf import CSRFMiddleware
# Add after other middleware
app.add_middleware(
CSRFMiddleware,
secret="<GENERATE-STRONG-SECRET>", # Use same JWT secret
cookie_name="csrftoken",
header_name="X-CSRFToken",
cookie_secure=True, # HTTPS only in production
cookie_httponly=False, # JS needs to read for SPA
cookie_samesite="strict"
)
Frontend Changes (web/frontend/src/lib/api.ts):
private async request<T>(
method: string,
endpoint: string,
data?: any
): Promise<T> {
const token = localStorage.getItem('auth_token')
// Get CSRF token from cookie
const csrfToken = document.cookie
.split('; ')
.find(row => row.startsWith('csrftoken='))
?.split('=')[1]
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (token) {
headers['Authorization'] = `Bearer ${token}`
}
// Add CSRF token to non-GET requests
if (method !== 'GET' && csrfToken) {
headers['X-CSRFToken'] = csrfToken
}
// ... rest of request
}
Testing:
- GET requests work without CSRF token
- POST/PUT/DELETE work with CSRF token
- POST/PUT/DELETE fail (403) without CSRF token
4. Add Rate Limiting to Endpoints ⏱️ 20min
Problem: Rate limiting configured but not applied to most routes
Solution: Add @limiter.limit() decorators
Current State (web/backend/api.py:320-325):
limiter = Limiter(
key_func=get_remote_address,
default_limits=["200/minute"]
)
# But not applied to routes!
Fix - Add to all sensitive endpoints:
# Auth endpoints - strict
@app.post("/api/auth/login")
@limiter.limit("5/minute") # Add this
async def login(credentials: LoginRequest, request: Request):
pass
# Config updates - moderate
@app.put("/api/settings/config")
@limiter.limit("30/minute") # Add this
async def update_config(...):
pass
# Download triggers - moderate
@app.post("/api/scheduler/trigger")
@limiter.limit("10/minute") # Add this
async def trigger_download(...):
pass
# Media endpoints already have limits - verify they work
@app.get("/api/media/thumbnail")
@limiter.limit("5000/minute") # Already present ✓
async def get_media_thumbnail(...):
pass
Testing:
- Login limited to 5 attempts/minute
- Repeated config updates return 429 after limit
- Rate limit resets after time window
5. Add Input Validation on Config Updates ⏱️ 35min
Problem: Config updates lack validation, could set invalid values
Solution: Use Pydantic models for validation
Create validation models (web/backend/models.py):
from pydantic import BaseModel, Field, validator
from typing import Optional
class PushoverConfig(BaseModel):
enabled: bool
user_key: Optional[str] = Field(None, min_length=30, max_length=30)
api_token: Optional[str] = Field(None, min_length=30, max_length=30)
priority: int = Field(0, ge=-2, le=2)
sound: str = Field("pushover", regex="^[a-z_]+$")
@validator('user_key', 'api_token')
def validate_keys(cls, v):
if v and not v.isalnum():
raise ValueError("Keys must be alphanumeric")
return v
class SchedulerConfig(BaseModel):
enabled: bool
interval_hours: int = Field(24, ge=1, le=168) # 1 hour to 1 week
randomize: bool = True
randomize_minutes: int = Field(30, ge=0, le=180)
class ConfigUpdate(BaseModel):
pushover: Optional[PushoverConfig]
scheduler: Optional[SchedulerConfig]
# ... other config sections
Use in endpoint:
@app.put("/api/settings/config")
@limiter.limit("30/minute")
async def update_config(
config: ConfigUpdate, # Pydantic will validate
current_user: Dict = Depends(get_current_user)
):
# Config is already validated by Pydantic
# Safe to use
pass
Testing:
- Valid config updates succeed
- Invalid values return 422 with details
- SQL injection attempts blocked
- XSS attempts sanitized
PHASE 2: PERFORMANCE (Priority: HIGH)
6. Add Database Indexes ⏱️ 15min
Problem: Missing composite index for deduplication queries
Solution: Add indexes to unified_database.py
# modules/unified_database.py - In _create_indexes()
def _create_indexes(self, cursor):
"""Create indexes for better query performance"""
# Existing indexes...
# NEW: Composite index for deduplication
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_file_hash_platform
ON downloads(file_hash, platform)
WHERE file_hash IS NOT NULL
''')
# NEW: Index for metadata searches (if using JSON_EXTRACT)
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_metadata_media_id
ON downloads(json_extract(metadata, '$.media_id'))
WHERE metadata IS NOT NULL
''')
Testing:
EXPLAIN QUERY PLAN
SELECT * FROM downloads
WHERE file_hash = 'abc123' AND platform = 'fastdl';
-- Should show "USING INDEX idx_file_hash_platform"
7. Fix JSON Metadata Searches ⏱️ 45min
Problem: LIKE '%json%' searches are slow, cause full table scans
Current Code (modules/unified_database.py:576-590):
cursor.execute('''
SELECT ... WHERE metadata LIKE ? OR metadata LIKE ?
''', (f'%"media_id": "{media_id}"%', f'%"media_id"%{media_id}%'))
Solution Option 1: Extract media_id to separate column (BEST)
# Add column
cursor.execute('ALTER TABLE downloads ADD COLUMN media_id TEXT')
cursor.execute('CREATE INDEX idx_media_id ON downloads(media_id)')
# When inserting:
media_id = metadata_dict.get('media_id')
cursor.execute('''
INSERT INTO downloads (..., metadata, media_id)
VALUES (..., ?, ?)
''', (json.dumps(metadata), media_id))
# Query becomes fast:
cursor.execute('SELECT * FROM downloads WHERE media_id = ?', (media_id,))
Solution Option 2: Use JSON_EXTRACT (if SQLite 3.38+)
cursor.execute('''
SELECT * FROM downloads
WHERE json_extract(metadata, '$.media_id') = ?
''', (media_id,))
8. Add Redis Result Caching ⏱️ 60min
Requires: Redis server
Install: pip install redis
Setup (web/backend/cache.py):
import redis
import json
from functools import wraps
from typing import Optional
redis_client = redis.Redis(
host='localhost',
port=6379,
decode_responses=True
)
def cache_result(ttl: int = 300):
"""
Decorator to cache function results
Args:
ttl: Time to live in seconds
"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Create cache key
key = f"cache:{func.__name__}:{hash(str(args) + str(kwargs))}"
# Try to get from cache
cached = redis_client.get(key)
if cached:
return json.loads(cached)
# Execute function
result = await func(*args, **kwargs)
# Store in cache
redis_client.setex(key, ttl, json.dumps(result))
return result
return wrapper
return decorator
Usage:
from web.backend.cache import cache_result
@app.get("/api/stats/platforms")
@cache_result(ttl=300) # Cache 5 minutes
async def get_platform_stats():
# Expensive database query
return stats
PHASE 3-5: Additional Tasks
Due to space constraints, see separate files:
docs/IMPLEMENTATION_CODE_QUALITY.md- Tasks 9-12docs/IMPLEMENTATION_RELIABILITY.md- Tasks 13-16docs/IMPLEMENTATION_UI.md- Tasks 17-18
Quick Start Checklist
Today (30-60 min):
- Task 2: Path validation (30min) - Highest security ROI
- Task 4: Rate limiting (20min) - Easy win
- Task 6: Database indexes (15min) - Instant performance boost
This Week (2-3 hours):
- Task 1: Token exposure fix
- Task 3: CSRF protection
- Task 5: Input validation
Next Week (4-6 hours):
- Performance optimizations (Tasks 7-8)
- Code quality improvements (Tasks 9-12)
Later (2-3 hours):
- Reliability improvements (Tasks 13-16)
- UI enhancements (Tasks 17-18)