# Code Improvements Implementation Guide **Generated**: 2025-11-09 **Estimated Total Time**: 7-11 hours **Tasks**: 18 --- ## PHASE 1: CRITICAL SECURITY (Priority: HIGHEST) ### 1. Fix Token Exposure in URLs ⏱️ 45min **Problem**: Tokens passed as query parameters expose them in logs, browser history, referer headers **Current Code** (`web/frontend/src/lib/api.ts:558-568`): ```typescript getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') { const token = localStorage.getItem('auth_token') const tokenParam = token ? `&token=${encodeURIComponent(token)}` : '' return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}${tokenParam}` } ``` **Solution**: Use session cookies for media endpoints **Backend Changes**: ```python # web/backend/api.py - Remove token parameter, rely on cookie auth @app.get("/api/media/thumbnail") async def get_media_thumbnail( request: Request, file_path: str, media_type: str, current_user: Dict = Depends(get_current_user_from_cookie) # Use cookie only ): # Remove: token: str = None parameter pass ``` **Frontend Changes**: ```typescript // web/frontend/src/lib/api.ts getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') { // Remove token handling - browser will send cookie automatically return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}` } ``` **Testing**: - [ ] Thumbnails still load after login - [ ] 401 returned when not authenticated - [ ] No tokens visible in browser Network tab URLs --- ### 2. Add Path Traversal Validation ⏱️ 30min **Problem**: File paths from frontend not validated, risk of `../../../etc/passwd` attacks **Solution**: Create path validation utility **New File** (`web/backend/security.py`): ```python from pathlib import Path from fastapi import HTTPException def validate_file_path(file_path: str, allowed_base: Path) -> Path: """ Validate file path prevents directory traversal Args: file_path: User-provided file path allowed_base: Base directory that file must be under Returns: Resolved Path object Raises: HTTPException: If path traversal detected """ try: # Resolve to absolute path real_path = Path(file_path).resolve() allowed_base = allowed_base.resolve() # Check if path is under allowed base if not str(real_path).startswith(str(allowed_base)): raise HTTPException( status_code=403, detail="Access denied: Path traversal detected" ) # Check if file exists if not real_path.exists(): raise HTTPException(status_code=404, detail="File not found") return real_path except Exception as e: raise HTTPException(status_code=400, detail=f"Invalid file path: {e}") ``` **Usage in endpoints**: ```python from web.backend.security import validate_file_path @app.get("/api/media/preview") async def get_media_preview(file_path: str, ...): # Validate path downloads_base = Path("/opt/media-downloader/downloads") safe_path = validate_file_path(file_path, downloads_base) # Use safe_path from here on return FileResponse(safe_path) ``` **Testing**: - [ ] Normal paths work: `/downloads/user/image.jpg` - [ ] Traversal blocked: `/downloads/../../etc/passwd` → 403 - [ ] Absolute paths blocked: `/etc/passwd` → 403 --- ### 3. Add CSRF Protection ⏱️ 40min **Problem**: No CSRF tokens, POST/PUT/DELETE endpoints vulnerable **Solution**: Add CSRF middleware **Install dependency**: ```bash pip install starlette-csrf ``` **Backend Changes** (`web/backend/api.py`): ```python from starlette_csrf import CSRFMiddleware # Add after other middleware app.add_middleware( CSRFMiddleware, secret="", # Use same JWT secret cookie_name="csrftoken", header_name="X-CSRFToken", cookie_secure=True, # HTTPS only in production cookie_httponly=False, # JS needs to read for SPA cookie_samesite="strict" ) ``` **Frontend Changes** (`web/frontend/src/lib/api.ts`): ```typescript private async request( method: string, endpoint: string, data?: any ): Promise { const token = localStorage.getItem('auth_token') // Get CSRF token from cookie const csrfToken = document.cookie .split('; ') .find(row => row.startsWith('csrftoken=')) ?.split('=')[1] const headers: Record = { 'Content-Type': 'application/json', } if (token) { headers['Authorization'] = `Bearer ${token}` } // Add CSRF token to non-GET requests if (method !== 'GET' && csrfToken) { headers['X-CSRFToken'] = csrfToken } // ... rest of request } ``` **Testing**: - [ ] GET requests work without CSRF token - [ ] POST/PUT/DELETE work with CSRF token - [ ] POST/PUT/DELETE fail (403) without CSRF token --- ### 4. Add Rate Limiting to Endpoints ⏱️ 20min **Problem**: Rate limiting configured but not applied to most routes **Solution**: Add `@limiter.limit()` decorators **Current State** (`web/backend/api.py:320-325`): ```python limiter = Limiter( key_func=get_remote_address, default_limits=["200/minute"] ) # But not applied to routes! ``` **Fix - Add to all sensitive endpoints**: ```python # Auth endpoints - strict @app.post("/api/auth/login") @limiter.limit("5/minute") # Add this async def login(credentials: LoginRequest, request: Request): pass # Config updates - moderate @app.put("/api/settings/config") @limiter.limit("30/minute") # Add this async def update_config(...): pass # Download triggers - moderate @app.post("/api/scheduler/trigger") @limiter.limit("10/minute") # Add this async def trigger_download(...): pass # Media endpoints already have limits - verify they work @app.get("/api/media/thumbnail") @limiter.limit("5000/minute") # Already present ✓ async def get_media_thumbnail(...): pass ``` **Testing**: - [ ] Login limited to 5 attempts/minute - [ ] Repeated config updates return 429 after limit - [ ] Rate limit resets after time window --- ### 5. Add Input Validation on Config Updates ⏱️ 35min **Problem**: Config updates lack validation, could set invalid values **Solution**: Use Pydantic models for validation **Create validation models** (`web/backend/models.py`): ```python from pydantic import BaseModel, Field, validator from typing import Optional class PushoverConfig(BaseModel): enabled: bool user_key: Optional[str] = Field(None, min_length=30, max_length=30) api_token: Optional[str] = Field(None, min_length=30, max_length=30) priority: int = Field(0, ge=-2, le=2) sound: str = Field("pushover", regex="^[a-z_]+$") @validator('user_key', 'api_token') def validate_keys(cls, v): if v and not v.isalnum(): raise ValueError("Keys must be alphanumeric") return v class SchedulerConfig(BaseModel): enabled: bool interval_hours: int = Field(24, ge=1, le=168) # 1 hour to 1 week randomize: bool = True randomize_minutes: int = Field(30, ge=0, le=180) class ConfigUpdate(BaseModel): pushover: Optional[PushoverConfig] scheduler: Optional[SchedulerConfig] # ... other config sections ``` **Use in endpoint**: ```python @app.put("/api/settings/config") @limiter.limit("30/minute") async def update_config( config: ConfigUpdate, # Pydantic will validate current_user: Dict = Depends(get_current_user) ): # Config is already validated by Pydantic # Safe to use pass ``` **Testing**: - [ ] Valid config updates succeed - [ ] Invalid values return 422 with details - [ ] SQL injection attempts blocked - [ ] XSS attempts sanitized --- ## PHASE 2: PERFORMANCE (Priority: HIGH) ### 6. Add Database Indexes ⏱️ 15min **Problem**: Missing composite index for deduplication queries **Solution**: Add indexes to unified_database.py ```python # modules/unified_database.py - In _create_indexes() def _create_indexes(self, cursor): """Create indexes for better query performance""" # Existing indexes... # NEW: Composite index for deduplication cursor.execute(''' CREATE INDEX IF NOT EXISTS idx_file_hash_platform ON downloads(file_hash, platform) WHERE file_hash IS NOT NULL ''') # NEW: Index for metadata searches (if using JSON_EXTRACT) cursor.execute(''' CREATE INDEX IF NOT EXISTS idx_metadata_media_id ON downloads(json_extract(metadata, '$.media_id')) WHERE metadata IS NOT NULL ''') ``` **Testing**: ```sql EXPLAIN QUERY PLAN SELECT * FROM downloads WHERE file_hash = 'abc123' AND platform = 'fastdl'; -- Should show "USING INDEX idx_file_hash_platform" ``` --- ### 7. Fix JSON Metadata Searches ⏱️ 45min **Problem**: `LIKE '%json%'` searches are slow, cause full table scans **Current Code** (`modules/unified_database.py:576-590`): ```python cursor.execute(''' SELECT ... WHERE metadata LIKE ? OR metadata LIKE ? ''', (f'%"media_id": "{media_id}"%', f'%"media_id"%{media_id}%')) ``` **Solution Option 1**: Extract media_id to separate column (BEST) ```python # Add column cursor.execute('ALTER TABLE downloads ADD COLUMN media_id TEXT') cursor.execute('CREATE INDEX idx_media_id ON downloads(media_id)') # When inserting: media_id = metadata_dict.get('media_id') cursor.execute(''' INSERT INTO downloads (..., metadata, media_id) VALUES (..., ?, ?) ''', (json.dumps(metadata), media_id)) # Query becomes fast: cursor.execute('SELECT * FROM downloads WHERE media_id = ?', (media_id,)) ``` **Solution Option 2**: Use JSON_EXTRACT (if SQLite 3.38+) ```python cursor.execute(''' SELECT * FROM downloads WHERE json_extract(metadata, '$.media_id') = ? ''', (media_id,)) ``` --- ### 8. Add Redis Result Caching ⏱️ 60min **Requires**: Redis server **Install**: `pip install redis` **Setup** (`web/backend/cache.py`): ```python import redis import json from functools import wraps from typing import Optional redis_client = redis.Redis( host='localhost', port=6379, decode_responses=True ) def cache_result(ttl: int = 300): """ Decorator to cache function results Args: ttl: Time to live in seconds """ def decorator(func): @wraps(func) async def wrapper(*args, **kwargs): # Create cache key key = f"cache:{func.__name__}:{hash(str(args) + str(kwargs))}" # Try to get from cache cached = redis_client.get(key) if cached: return json.loads(cached) # Execute function result = await func(*args, **kwargs) # Store in cache redis_client.setex(key, ttl, json.dumps(result)) return result return wrapper return decorator ``` **Usage**: ```python from web.backend.cache import cache_result @app.get("/api/stats/platforms") @cache_result(ttl=300) # Cache 5 minutes async def get_platform_stats(): # Expensive database query return stats ``` --- ## PHASE 3-5: Additional Tasks Due to space constraints, see separate files: - `docs/IMPLEMENTATION_CODE_QUALITY.md` - Tasks 9-12 - `docs/IMPLEMENTATION_RELIABILITY.md` - Tasks 13-16 - `docs/IMPLEMENTATION_UI.md` - Tasks 17-18 --- ## Quick Start Checklist **Today (30-60 min):** - [ ] Task 2: Path validation (30min) - Highest security ROI - [ ] Task 4: Rate limiting (20min) - Easy win - [ ] Task 6: Database indexes (15min) - Instant performance boost **This Week (2-3 hours):** - [ ] Task 1: Token exposure fix - [ ] Task 3: CSRF protection - [ ] Task 5: Input validation **Next Week (4-6 hours):** - [ ] Performance optimizations (Tasks 7-8) - [ ] Code quality improvements (Tasks 9-12) **Later (2-3 hours):** - [ ] Reliability improvements (Tasks 13-16) - [ ] UI enhancements (Tasks 17-18)