Files
media-downloader/docs/IMPLEMENTATION_GUIDE.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

12 KiB

Code Improvements Implementation Guide

Generated: 2025-11-09
Estimated Total Time: 7-11 hours
Tasks: 18


PHASE 1: CRITICAL SECURITY (Priority: HIGHEST)

1. Fix Token Exposure in URLs ⏱️ 45min

Problem: Tokens passed as query parameters expose them in logs, browser history, referer headers

Current Code (web/frontend/src/lib/api.ts:558-568):

getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
  const token = localStorage.getItem('auth_token')
  const tokenParam = token ? `&token=${encodeURIComponent(token)}` : ''
  return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}${tokenParam}`
}

Solution: Use session cookies for media endpoints

Backend Changes:

# web/backend/api.py - Remove token parameter, rely on cookie auth
@app.get("/api/media/thumbnail")
async def get_media_thumbnail(
    request: Request,
    file_path: str,
    media_type: str,
    current_user: Dict = Depends(get_current_user_from_cookie)  # Use cookie only
):
    # Remove: token: str = None parameter
    pass

Frontend Changes:

// web/frontend/src/lib/api.ts
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
  // Remove token handling - browser will send cookie automatically
  return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}`
}

Testing:

  • Thumbnails still load after login
  • 401 returned when not authenticated
  • No tokens visible in browser Network tab URLs

2. Add Path Traversal Validation ⏱️ 30min

Problem: File paths from frontend not validated, risk of ../../../etc/passwd attacks

Solution: Create path validation utility

New File (web/backend/security.py):

from pathlib import Path
from fastapi import HTTPException

def validate_file_path(file_path: str, allowed_base: Path) -> Path:
    """
    Validate file path prevents directory traversal
    
    Args:
        file_path: User-provided file path
        allowed_base: Base directory that file must be under
        
    Returns:
        Resolved Path object
        
    Raises:
        HTTPException: If path traversal detected
    """
    try:
        # Resolve to absolute path
        real_path = Path(file_path).resolve()
        allowed_base = allowed_base.resolve()
        
        # Check if path is under allowed base
        if not str(real_path).startswith(str(allowed_base)):
            raise HTTPException(
                status_code=403,
                detail="Access denied: Path traversal detected"
            )
            
        # Check if file exists
        if not real_path.exists():
            raise HTTPException(status_code=404, detail="File not found")
            
        return real_path
        
    except Exception as e:
        raise HTTPException(status_code=400, detail=f"Invalid file path: {e}")

Usage in endpoints:

from web.backend.security import validate_file_path

@app.get("/api/media/preview")
async def get_media_preview(file_path: str, ...):
    # Validate path
    downloads_base = Path("/opt/media-downloader/downloads")
    safe_path = validate_file_path(file_path, downloads_base)
    
    # Use safe_path from here on
    return FileResponse(safe_path)

Testing:

  • Normal paths work: /downloads/user/image.jpg
  • Traversal blocked: /downloads/../../etc/passwd → 403
  • Absolute paths blocked: /etc/passwd → 403

3. Add CSRF Protection ⏱️ 40min

Problem: No CSRF tokens, POST/PUT/DELETE endpoints vulnerable

Solution: Add CSRF middleware

Install dependency:

pip install starlette-csrf

Backend Changes (web/backend/api.py):

from starlette_csrf import CSRFMiddleware

# Add after other middleware
app.add_middleware(
    CSRFMiddleware,
    secret="<GENERATE-STRONG-SECRET>",  # Use same JWT secret
    cookie_name="csrftoken",
    header_name="X-CSRFToken",
    cookie_secure=True,  # HTTPS only in production
    cookie_httponly=False,  # JS needs to read for SPA
    cookie_samesite="strict"
)

Frontend Changes (web/frontend/src/lib/api.ts):

private async request<T>(
  method: string,
  endpoint: string,
  data?: any
): Promise<T> {
  const token = localStorage.getItem('auth_token')
  
  // Get CSRF token from cookie
  const csrfToken = document.cookie
    .split('; ')
    .find(row => row.startsWith('csrftoken='))
    ?.split('=')[1]

  const headers: Record<string, string> = {
    'Content-Type': 'application/json',
  }

  if (token) {
    headers['Authorization'] = `Bearer ${token}`
  }
  
  // Add CSRF token to non-GET requests
  if (method !== 'GET' && csrfToken) {
    headers['X-CSRFToken'] = csrfToken
  }

  // ... rest of request
}

Testing:

  • GET requests work without CSRF token
  • POST/PUT/DELETE work with CSRF token
  • POST/PUT/DELETE fail (403) without CSRF token

4. Add Rate Limiting to Endpoints ⏱️ 20min

Problem: Rate limiting configured but not applied to most routes

Solution: Add @limiter.limit() decorators

Current State (web/backend/api.py:320-325):

limiter = Limiter(
    key_func=get_remote_address,
    default_limits=["200/minute"]
)
# But not applied to routes!

Fix - Add to all sensitive endpoints:

# Auth endpoints - strict
@app.post("/api/auth/login")
@limiter.limit("5/minute")  # Add this
async def login(credentials: LoginRequest, request: Request):
    pass

# Config updates - moderate
@app.put("/api/settings/config")
@limiter.limit("30/minute")  # Add this
async def update_config(...):
    pass

# Download triggers - moderate
@app.post("/api/scheduler/trigger")
@limiter.limit("10/minute")  # Add this
async def trigger_download(...):
    pass

# Media endpoints already have limits - verify they work
@app.get("/api/media/thumbnail")
@limiter.limit("5000/minute")  # Already present ✓
async def get_media_thumbnail(...):
    pass

Testing:

  • Login limited to 5 attempts/minute
  • Repeated config updates return 429 after limit
  • Rate limit resets after time window

5. Add Input Validation on Config Updates ⏱️ 35min

Problem: Config updates lack validation, could set invalid values

Solution: Use Pydantic models for validation

Create validation models (web/backend/models.py):

from pydantic import BaseModel, Field, validator
from typing import Optional

class PushoverConfig(BaseModel):
    enabled: bool
    user_key: Optional[str] = Field(None, min_length=30, max_length=30)
    api_token: Optional[str] = Field(None, min_length=30, max_length=30)
    priority: int = Field(0, ge=-2, le=2)
    sound: str = Field("pushover", regex="^[a-z_]+$")
    
    @validator('user_key', 'api_token')
    def validate_keys(cls, v):
        if v and not v.isalnum():
            raise ValueError("Keys must be alphanumeric")
        return v

class SchedulerConfig(BaseModel):
    enabled: bool
    interval_hours: int = Field(24, ge=1, le=168)  # 1 hour to 1 week
    randomize: bool = True
    randomize_minutes: int = Field(30, ge=0, le=180)

class ConfigUpdate(BaseModel):
    pushover: Optional[PushoverConfig]
    scheduler: Optional[SchedulerConfig]
    # ... other config sections

Use in endpoint:

@app.put("/api/settings/config")
@limiter.limit("30/minute")
async def update_config(
    config: ConfigUpdate,  # Pydantic will validate
    current_user: Dict = Depends(get_current_user)
):
    # Config is already validated by Pydantic
    # Safe to use
    pass

Testing:

  • Valid config updates succeed
  • Invalid values return 422 with details
  • SQL injection attempts blocked
  • XSS attempts sanitized

PHASE 2: PERFORMANCE (Priority: HIGH)

6. Add Database Indexes ⏱️ 15min

Problem: Missing composite index for deduplication queries

Solution: Add indexes to unified_database.py

# modules/unified_database.py - In _create_indexes()
def _create_indexes(self, cursor):
    """Create indexes for better query performance"""
    
    # Existing indexes...
    
    # NEW: Composite index for deduplication
    cursor.execute('''
        CREATE INDEX IF NOT EXISTS idx_file_hash_platform 
        ON downloads(file_hash, platform)
        WHERE file_hash IS NOT NULL
    ''')
    
    # NEW: Index for metadata searches (if using JSON_EXTRACT)
    cursor.execute('''
        CREATE INDEX IF NOT EXISTS idx_metadata_media_id
        ON downloads(json_extract(metadata, '$.media_id'))
        WHERE metadata IS NOT NULL
    ''')

Testing:

EXPLAIN QUERY PLAN 
SELECT * FROM downloads 
WHERE file_hash = 'abc123' AND platform = 'fastdl';
-- Should show "USING INDEX idx_file_hash_platform"

7. Fix JSON Metadata Searches ⏱️ 45min

Problem: LIKE '%json%' searches are slow, cause full table scans

Current Code (modules/unified_database.py:576-590):

cursor.execute('''
    SELECT ... WHERE metadata LIKE ? OR metadata LIKE ?
''', (f'%"media_id": "{media_id}"%', f'%"media_id"%{media_id}%'))

Solution Option 1: Extract media_id to separate column (BEST)

# Add column
cursor.execute('ALTER TABLE downloads ADD COLUMN media_id TEXT')
cursor.execute('CREATE INDEX idx_media_id ON downloads(media_id)')

# When inserting:
media_id = metadata_dict.get('media_id')
cursor.execute('''
    INSERT INTO downloads (..., metadata, media_id)
    VALUES (..., ?, ?)
''', (json.dumps(metadata), media_id))

# Query becomes fast:
cursor.execute('SELECT * FROM downloads WHERE media_id = ?', (media_id,))

Solution Option 2: Use JSON_EXTRACT (if SQLite 3.38+)

cursor.execute('''
    SELECT * FROM downloads 
    WHERE json_extract(metadata, '$.media_id') = ?
''', (media_id,))

8. Add Redis Result Caching ⏱️ 60min

Requires: Redis server
Install: pip install redis

Setup (web/backend/cache.py):

import redis
import json
from functools import wraps
from typing import Optional

redis_client = redis.Redis(
    host='localhost',
    port=6379,
    decode_responses=True
)

def cache_result(ttl: int = 300):
    """
    Decorator to cache function results
    
    Args:
        ttl: Time to live in seconds
    """
    def decorator(func):
        @wraps(func)
        async def wrapper(*args, **kwargs):
            # Create cache key
            key = f"cache:{func.__name__}:{hash(str(args) + str(kwargs))}"
            
            # Try to get from cache
            cached = redis_client.get(key)
            if cached:
                return json.loads(cached)
            
            # Execute function
            result = await func(*args, **kwargs)
            
            # Store in cache
            redis_client.setex(key, ttl, json.dumps(result))
            
            return result
        return wrapper
    return decorator

Usage:

from web.backend.cache import cache_result

@app.get("/api/stats/platforms")
@cache_result(ttl=300)  # Cache 5 minutes
async def get_platform_stats():
    # Expensive database query
    return stats

PHASE 3-5: Additional Tasks

Due to space constraints, see separate files:

  • docs/IMPLEMENTATION_CODE_QUALITY.md - Tasks 9-12
  • docs/IMPLEMENTATION_RELIABILITY.md - Tasks 13-16
  • docs/IMPLEMENTATION_UI.md - Tasks 17-18

Quick Start Checklist

Today (30-60 min):

  • Task 2: Path validation (30min) - Highest security ROI
  • Task 4: Rate limiting (20min) - Easy win
  • Task 6: Database indexes (15min) - Instant performance boost

This Week (2-3 hours):

  • Task 1: Token exposure fix
  • Task 3: CSRF protection
  • Task 5: Input validation

Next Week (4-6 hours):

  • Performance optimizations (Tasks 7-8)
  • Code quality improvements (Tasks 9-12)

Later (2-3 hours):

  • Reliability improvements (Tasks 13-16)
  • UI enhancements (Tasks 17-18)