Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Todd
2026-03-29 22:42:55 -04:00
commit 0d7b2b1aab
389 changed files with 280296 additions and 0 deletions

View File

@@ -0,0 +1,462 @@
# Code Improvements Implementation Guide
**Generated**: 2025-11-09
**Estimated Total Time**: 7-11 hours
**Tasks**: 18
---
## PHASE 1: CRITICAL SECURITY (Priority: HIGHEST)
### 1. Fix Token Exposure in URLs ⏱️ 45min
**Problem**: Tokens passed as query parameters expose them in logs, browser history, referer headers
**Current Code** (`web/frontend/src/lib/api.ts:558-568`):
```typescript
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
const token = localStorage.getItem('auth_token')
const tokenParam = token ? `&token=${encodeURIComponent(token)}` : ''
return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}${tokenParam}`
}
```
**Solution**: Use session cookies for media endpoints
**Backend Changes**:
```python
# web/backend/api.py - Remove token parameter, rely on cookie auth
@app.get("/api/media/thumbnail")
async def get_media_thumbnail(
request: Request,
file_path: str,
media_type: str,
current_user: Dict = Depends(get_current_user_from_cookie) # Use cookie only
):
# Remove: token: str = None parameter
pass
```
**Frontend Changes**:
```typescript
// web/frontend/src/lib/api.ts
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') {
// Remove token handling - browser will send cookie automatically
return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}`
}
```
**Testing**:
- [ ] Thumbnails still load after login
- [ ] 401 returned when not authenticated
- [ ] No tokens visible in browser Network tab URLs
---
### 2. Add Path Traversal Validation ⏱️ 30min
**Problem**: File paths from frontend not validated, risk of `../../../etc/passwd` attacks
**Solution**: Create path validation utility
**New File** (`web/backend/security.py`):
```python
from pathlib import Path
from fastapi import HTTPException
def validate_file_path(file_path: str, allowed_base: Path) -> Path:
"""
Validate file path prevents directory traversal
Args:
file_path: User-provided file path
allowed_base: Base directory that file must be under
Returns:
Resolved Path object
Raises:
HTTPException: If path traversal detected
"""
try:
# Resolve to absolute path
real_path = Path(file_path).resolve()
allowed_base = allowed_base.resolve()
# Check if path is under allowed base
if not str(real_path).startswith(str(allowed_base)):
raise HTTPException(
status_code=403,
detail="Access denied: Path traversal detected"
)
# Check if file exists
if not real_path.exists():
raise HTTPException(status_code=404, detail="File not found")
return real_path
except Exception as e:
raise HTTPException(status_code=400, detail=f"Invalid file path: {e}")
```
**Usage in endpoints**:
```python
from web.backend.security import validate_file_path
@app.get("/api/media/preview")
async def get_media_preview(file_path: str, ...):
# Validate path
downloads_base = Path("/opt/media-downloader/downloads")
safe_path = validate_file_path(file_path, downloads_base)
# Use safe_path from here on
return FileResponse(safe_path)
```
**Testing**:
- [ ] Normal paths work: `/downloads/user/image.jpg`
- [ ] Traversal blocked: `/downloads/../../etc/passwd` → 403
- [ ] Absolute paths blocked: `/etc/passwd` → 403
---
### 3. Add CSRF Protection ⏱️ 40min
**Problem**: No CSRF tokens, POST/PUT/DELETE endpoints vulnerable
**Solution**: Add CSRF middleware
**Install dependency**:
```bash
pip install starlette-csrf
```
**Backend Changes** (`web/backend/api.py`):
```python
from starlette_csrf import CSRFMiddleware
# Add after other middleware
app.add_middleware(
CSRFMiddleware,
secret="<GENERATE-STRONG-SECRET>", # Use same JWT secret
cookie_name="csrftoken",
header_name="X-CSRFToken",
cookie_secure=True, # HTTPS only in production
cookie_httponly=False, # JS needs to read for SPA
cookie_samesite="strict"
)
```
**Frontend Changes** (`web/frontend/src/lib/api.ts`):
```typescript
private async request<T>(
method: string,
endpoint: string,
data?: any
): Promise<T> {
const token = localStorage.getItem('auth_token')
// Get CSRF token from cookie
const csrfToken = document.cookie
.split('; ')
.find(row => row.startsWith('csrftoken='))
?.split('=')[1]
const headers: Record<string, string> = {
'Content-Type': 'application/json',
}
if (token) {
headers['Authorization'] = `Bearer ${token}`
}
// Add CSRF token to non-GET requests
if (method !== 'GET' && csrfToken) {
headers['X-CSRFToken'] = csrfToken
}
// ... rest of request
}
```
**Testing**:
- [ ] GET requests work without CSRF token
- [ ] POST/PUT/DELETE work with CSRF token
- [ ] POST/PUT/DELETE fail (403) without CSRF token
---
### 4. Add Rate Limiting to Endpoints ⏱️ 20min
**Problem**: Rate limiting configured but not applied to most routes
**Solution**: Add `@limiter.limit()` decorators
**Current State** (`web/backend/api.py:320-325`):
```python
limiter = Limiter(
key_func=get_remote_address,
default_limits=["200/minute"]
)
# But not applied to routes!
```
**Fix - Add to all sensitive endpoints**:
```python
# Auth endpoints - strict
@app.post("/api/auth/login")
@limiter.limit("5/minute") # Add this
async def login(credentials: LoginRequest, request: Request):
pass
# Config updates - moderate
@app.put("/api/settings/config")
@limiter.limit("30/minute") # Add this
async def update_config(...):
pass
# Download triggers - moderate
@app.post("/api/scheduler/trigger")
@limiter.limit("10/minute") # Add this
async def trigger_download(...):
pass
# Media endpoints already have limits - verify they work
@app.get("/api/media/thumbnail")
@limiter.limit("5000/minute") # Already present ✓
async def get_media_thumbnail(...):
pass
```
**Testing**:
- [ ] Login limited to 5 attempts/minute
- [ ] Repeated config updates return 429 after limit
- [ ] Rate limit resets after time window
---
### 5. Add Input Validation on Config Updates ⏱️ 35min
**Problem**: Config updates lack validation, could set invalid values
**Solution**: Use Pydantic models for validation
**Create validation models** (`web/backend/models.py`):
```python
from pydantic import BaseModel, Field, validator
from typing import Optional
class PushoverConfig(BaseModel):
enabled: bool
user_key: Optional[str] = Field(None, min_length=30, max_length=30)
api_token: Optional[str] = Field(None, min_length=30, max_length=30)
priority: int = Field(0, ge=-2, le=2)
sound: str = Field("pushover", regex="^[a-z_]+$")
@validator('user_key', 'api_token')
def validate_keys(cls, v):
if v and not v.isalnum():
raise ValueError("Keys must be alphanumeric")
return v
class SchedulerConfig(BaseModel):
enabled: bool
interval_hours: int = Field(24, ge=1, le=168) # 1 hour to 1 week
randomize: bool = True
randomize_minutes: int = Field(30, ge=0, le=180)
class ConfigUpdate(BaseModel):
pushover: Optional[PushoverConfig]
scheduler: Optional[SchedulerConfig]
# ... other config sections
```
**Use in endpoint**:
```python
@app.put("/api/settings/config")
@limiter.limit("30/minute")
async def update_config(
config: ConfigUpdate, # Pydantic will validate
current_user: Dict = Depends(get_current_user)
):
# Config is already validated by Pydantic
# Safe to use
pass
```
**Testing**:
- [ ] Valid config updates succeed
- [ ] Invalid values return 422 with details
- [ ] SQL injection attempts blocked
- [ ] XSS attempts sanitized
---
## PHASE 2: PERFORMANCE (Priority: HIGH)
### 6. Add Database Indexes ⏱️ 15min
**Problem**: Missing composite index for deduplication queries
**Solution**: Add indexes to unified_database.py
```python
# modules/unified_database.py - In _create_indexes()
def _create_indexes(self, cursor):
"""Create indexes for better query performance"""
# Existing indexes...
# NEW: Composite index for deduplication
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_file_hash_platform
ON downloads(file_hash, platform)
WHERE file_hash IS NOT NULL
''')
# NEW: Index for metadata searches (if using JSON_EXTRACT)
cursor.execute('''
CREATE INDEX IF NOT EXISTS idx_metadata_media_id
ON downloads(json_extract(metadata, '$.media_id'))
WHERE metadata IS NOT NULL
''')
```
**Testing**:
```sql
EXPLAIN QUERY PLAN
SELECT * FROM downloads
WHERE file_hash = 'abc123' AND platform = 'fastdl';
-- Should show "USING INDEX idx_file_hash_platform"
```
---
### 7. Fix JSON Metadata Searches ⏱️ 45min
**Problem**: `LIKE '%json%'` searches are slow, cause full table scans
**Current Code** (`modules/unified_database.py:576-590`):
```python
cursor.execute('''
SELECT ... WHERE metadata LIKE ? OR metadata LIKE ?
''', (f'%"media_id": "{media_id}"%', f'%"media_id"%{media_id}%'))
```
**Solution Option 1**: Extract media_id to separate column (BEST)
```python
# Add column
cursor.execute('ALTER TABLE downloads ADD COLUMN media_id TEXT')
cursor.execute('CREATE INDEX idx_media_id ON downloads(media_id)')
# When inserting:
media_id = metadata_dict.get('media_id')
cursor.execute('''
INSERT INTO downloads (..., metadata, media_id)
VALUES (..., ?, ?)
''', (json.dumps(metadata), media_id))
# Query becomes fast:
cursor.execute('SELECT * FROM downloads WHERE media_id = ?', (media_id,))
```
**Solution Option 2**: Use JSON_EXTRACT (if SQLite 3.38+)
```python
cursor.execute('''
SELECT * FROM downloads
WHERE json_extract(metadata, '$.media_id') = ?
''', (media_id,))
```
---
### 8. Add Redis Result Caching ⏱️ 60min
**Requires**: Redis server
**Install**: `pip install redis`
**Setup** (`web/backend/cache.py`):
```python
import redis
import json
from functools import wraps
from typing import Optional
redis_client = redis.Redis(
host='localhost',
port=6379,
decode_responses=True
)
def cache_result(ttl: int = 300):
"""
Decorator to cache function results
Args:
ttl: Time to live in seconds
"""
def decorator(func):
@wraps(func)
async def wrapper(*args, **kwargs):
# Create cache key
key = f"cache:{func.__name__}:{hash(str(args) + str(kwargs))}"
# Try to get from cache
cached = redis_client.get(key)
if cached:
return json.loads(cached)
# Execute function
result = await func(*args, **kwargs)
# Store in cache
redis_client.setex(key, ttl, json.dumps(result))
return result
return wrapper
return decorator
```
**Usage**:
```python
from web.backend.cache import cache_result
@app.get("/api/stats/platforms")
@cache_result(ttl=300) # Cache 5 minutes
async def get_platform_stats():
# Expensive database query
return stats
```
---
## PHASE 3-5: Additional Tasks
Due to space constraints, see separate files:
- `docs/IMPLEMENTATION_CODE_QUALITY.md` - Tasks 9-12
- `docs/IMPLEMENTATION_RELIABILITY.md` - Tasks 13-16
- `docs/IMPLEMENTATION_UI.md` - Tasks 17-18
---
## Quick Start Checklist
**Today (30-60 min):**
- [ ] Task 2: Path validation (30min) - Highest security ROI
- [ ] Task 4: Rate limiting (20min) - Easy win
- [ ] Task 6: Database indexes (15min) - Instant performance boost
**This Week (2-3 hours):**
- [ ] Task 1: Token exposure fix
- [ ] Task 3: CSRF protection
- [ ] Task 5: Input validation
**Next Week (4-6 hours):**
- [ ] Performance optimizations (Tasks 7-8)
- [ ] Code quality improvements (Tasks 9-12)
**Later (2-3 hours):**
- [ ] Reliability improvements (Tasks 13-16)
- [ ] UI enhancements (Tasks 17-18)