20 KiB
Media Downloader - Comprehensive Code Review
Executive Summary
The Media Downloader application is a sophisticated multi-platform media download system with ~30,775 lines of Python and TypeScript code. It integrates Instagram, TikTok, Forums, Snapchat, and other platforms with a web-based management interface. Overall architecture is well-designed with proper separation of concerns, but there are several security, performance, and code quality issues that need attention.
Overall Assessment: B+ (Good with room for improvement in specific areas)
1. ARCHITECTURE & DESIGN PATTERNS
Strengths
-
Unified Database Architecture (
/opt/media-downloader/modules/unified_database.py)- Excellent consolidation of multiple platform databases into single unified DB
- Connection pooling implemented correctly (lines 21-92)
- Proper use of context managers for resource management
- Well-designed adapter pattern for platform-specific compatibility (lines 1707-2080)
-
Module Organization
- Clean separation: downloaders, database, UI, utilities
- Each platform has dedicated module (fastdl, tiktok, instagram, snapchat, etc.)
- Settings manager provides centralized configuration
-
Authentication Layer
- Proper use of JWT tokens with bcrypt password hashing
- Rate limiting on login attempts (5 attempts, 15-min lockout)
- Support for 2FA (TOTP, Passkeys, Duo)
Issues
-
Tight Coupling in Main Application
- Location:
/opt/media-downloader/media-downloader.py(lines 1-100) - Issue: Core class imports 20+ modules directly, making it tightly coupled
- Impact: Hard to test individual components; difficult to extend
- Recommendation: Create dependency injection container or factory pattern
- Location:
-
Incomplete Separation of Concerns
- Location:
/opt/media-downloader/modules/fastdl_module.py(lines 35-70) - Issue: Browser automation logic mixed with download logic
- Recommendation: Extract Playwright interactions into separate browser manager class
- Location:
-
Missing Interface Definitions
- No clear contracts between modules
- Recommendation: Add type hints and Protocol classes for module boundaries
2. SECURITY ISSUES
Critical Issues
-
Token Exposure in URLs
- Location:
/opt/media-downloader/web/frontend/src/lib/api.ts(lines 558-568) - Issue: Authentication tokens passed as query parameters for media preview/thumbnails
getMediaThumbnailUrl(filePath: string, mediaType: 'image' | 'video') { const token = localStorage.getItem('auth_token') const tokenParam = token ? `&token=${encodeURIComponent(token)}` : '' return `${API_BASE}/media/thumbnail?file_path=${encodeURIComponent(filePath)}&media_type=${mediaType}${tokenParam}` }- Risk: Tokens visible in browser history, server logs, referrer headers
- Fix: Use Authorization header instead; implement server-side session validation for media endpoints
- Location:
-
Weak File Path Validation
- Location:
/opt/media-downloader/web/backend/api.py(likely in file handling endpoints) - Issue: File paths received from frontend may not be properly validated
- Risk: Path traversal attacks (../ sequences)
- Fix:
from pathlib import Path def validate_file_path(file_path: str, allowed_base: Path) -> Path: real_path = Path(file_path).resolve() if not str(real_path).startswith(str(allowed_base)): raise ValueError("Path traversal detected") return real_path
- Location:
-
Missing CSRF Protection
- Location:
/opt/media-downloader/web/backend/api.py(lines 318-320) - Issue: SessionMiddleware added but no CSRF tokens implemented
- Impact: POST/PUT/DELETE requests vulnerable to CSRF
- Fix: Add CSRF middleware (
starlette-csrf)
- Location:
High Priority Issues
-
Subprocess Usage Without Validation
- Location:
/opt/media-downloader/modules/tiktok_module.py(lines 294, 422, 440) - Issue: Uses subprocess.run() for yt-dlp commands
result = subprocess.run(cmd, capture_output=True, text=True, cwd=output_dir)- Risk: If
usernameor other params are unsanitized, could lead to command injection - Fix: Use list form of subprocess.run (which is safer) and validate all inputs
- Location:
-
SQL Injection Protection Issues
- Location:
/opt/media-downloader/modules/unified_database.py(lines 576-577) - Issue: Uses LIKE patterns with string formatting:
pattern1 = f'%"media_id": "{media_id}"%' # Potential SQL injection if media_id not sanitized- Current State: Properly uses parameterized queries, but patterns could be safer
- Recommendation: Add explicit input validation before using in LIKE patterns
- Location:
-
Credentials in Environment & Files
- Location:
/opt/media-downloader/.jwt_secret,/opt/media-downloader/.env - Issue: Sensitive files with improper permissions
- Fix:
- Ensure .jwt_secret is mode 0600 (already done in auth_manager.py line 38)
- .env should not be committed to git
- Consider using vault/secrets manager for production
- Location:
-
No Input Validation on Config Updates
- Location:
/opt/media-downloader/web/backend/api.py(lines 349-351) - Issue: Config updates from frontend lack validation
- Impact: Could set invalid/malicious values
- Fix: Add Pydantic validators for all config fields
- Location:
-
Missing Rate Limiting on API Endpoints
- Location:
/opt/media-downloader/web/backend/api.py(lines 322-325) - Issue: Rate limiter configured but not applied to routes
- Fix: Add
@limiter.limit()decorators on endpoints, especially:- Media downloads
- Configuration updates
- Scheduler triggers
- Location:
Medium Priority Issues
-
Insufficient Error Message Sanitization
- Location: Various modules show detailed error messages in logs
- Risk: Error messages may expose internal paths/configuration
- Fix: Return generic messages to clients, detailed logs server-side only
-
Missing Security Headers
- Location:
/opt/media-downloader/web/backend/api.py(app creation) - Missing: Content-Security-Policy, X-Frame-Options, X-Content-Type-Options
- Fix: Add security headers middleware
- Location:
3. PERFORMANCE OPTIMIZATIONS
Database Performance
-
Connection Pool Configuration ✓ GOOD
/opt/media-downloader/modules/unified_database.py(lines 21-45)- Pool size of 5 (default), configurable to 20 for API
- WAL mode enabled for better concurrency
- Good index strategy (lines 338-377)
-
Query Optimization Issues
a) N+1 Problem in Face Recognition
- Location:
/opt/media-downloader/modules/face_recognition_module.py - Issue: Likely fetches file list, then queries metadata for each file
- Recommendation: Join operations or batch queries
b) Missing Indexes
- Location:
/opt/media-downloader/modules/unified_database.py(lines 338-377) - Current Indexes: ✓ Platform, source, status, dates (good)
- Missing:
- Composite index on (file_hash, platform) for deduplication checks
- Index on metadata field (though JSON search is problematic)
c) JSON Metadata Searches
- Location:
/opt/media-downloader/modules/unified_database.py(lines 576-590) - Issue: Uses LIKE on JSON metadata field - very inefficient
cursor.execute('''SELECT ... WHERE metadata LIKE ? OR metadata LIKE ?''', (f'%"media_id": "{media_id}"%', f'%"media_id"%{media_id}%'))- Impact: Full table scans on large datasets
- Fix: Use JSON_EXTRACT() for JSON queries (if database supports) or extract media_id to separate column
- Location:
-
File I/O Bottlenecks
a) Hash Calculation on Every Download
- Location:
/opt/media-downloader/modules/unified_database.py(lines 437-461) - Issue: SHA256 hash computed for every file download
- Fix: Cache hashes, compute asynchronously, or skip for non-deduplicated files
b) Synchronous File Operations in Async Context
- Location:
/opt/media-downloader/web/backend/api.py(likely file operations) - Issue: Could block event loop
- Fix: Use
aiofilesorasyncio.to_thread()for file I/O
- Location:
-
Image Processing Performance
- Location:
/opt/media-downloader/modules/face_recognition_module.py - Issue: Face recognition runs on main thread, blocks other operations
- Current: Semaphore limits to 1 concurrent (good)
- Suggestion: Make async, use process pool for CPU-bound face detection
- Location:
-
Caching Opportunities
- Missing: Result caching for frequently accessed data
- Recommendation: Add Redis/in-memory caching for:
- Platform stats (cache 5 minutes)
- Download filters (cache 15 minutes)
- System health (cache 1 minute)
Frontend Performance
-
No Pagination Implementation Found
- Location:
/opt/media-downloader/web/frontend/src/lib/api.ts(lines 225-289) - Issue: API supports pagination but unclear if UI implements infinite scroll
- Recommendation: Implement virtual scrolling for large media galleries
- Location:
-
Unoptimized Asset Loading
- Location: Built assets in
/opt/media-downloader/web/backend/static/assets/ - Issue: Multiple .js chunks loaded (index-*.js variations suggest no optimization)
- Recommendation: Check Vite build config for code splitting optimization
- Location: Built assets in
4. CODE QUALITY
Code Duplication
-
Adapter Pattern Duplication
- Location:
/opt/media-downloader/modules/unified_database.py(lines 1708-2080) - Issue: Multiple adapter classes (FastDLDatabaseAdapter, TikTokDatabaseAdapter, etc.) with similar structure
- Lines Affected: ~372 lines of repetitive code
- Fix: Create generic adapter base class with template method pattern
- Location:
-
Download Manager Pattern Repeated
- Location: Each platform module has similar download logic
- Recommendation: Extract to common base class
-
Cookie/Session Management Duplicated
- Location: fastdl_module, imginn_module, toolzu_module, snapchat_module
- Recommendation: Create shared CookieManager utility
Error Handling
-
Bare Exception Handlers
- Locations:
/opt/media-downloader/modules/fastdl_module.py(line 100+)/opt/media-downloader/media-downloader.py(lines 2084-2085)
except: # Too broad! break- Risk: Suppresses unexpected errors
- Fix: Catch specific exceptions
- Locations:
-
Missing Error Recovery
- Location:
/opt/media-downloader/modules/forum_downloader.py(lines 83+) - Issue: ForumDownloader has minimal retry logic
- Recommendation: Add exponential backoff with jitter
- Location:
-
Logging Inconsistency
- Location: Throughout codebase
- Issue: Mix of logger.info(), print(), and log() callbacks
- Fix: Standardize on logger module everywhere
Complexity Issues
-
Long Functions
- Location:
/opt/media-downloader/media-downloader.py - Issue: Main class likely has 200+ line methods
- Recommendation: Break into smaller, testable methods
- Location:
-
Complex Conditional Logic
- Location:
2FA implementation in auth_manager.py - Issue: Multiple nested if/elif chains for 2FA method selection
- Fix: Strategy pattern with 2FA providers
- Location:
Missing Type Hints
- Inconsistent Type Coverage
- Status: Backend has some type hints, but inconsistent
- Examples:
/opt/media-downloader/modules/download_manager.py: ✓ Good type hints/opt/media-downloader/modules/fastdl_module.py: ✗ Minimal type hints
- Recommendation: Use
mypy --stricton entire codebase
5. FEATURE OPPORTUNITIES
User Experience
-
Download Scheduling Enhancements
- Current: Basic interval-based scheduling
- Suggestion: Add cron expression support
- Effort: Medium
-
Batch Operations
- Current: Single file operations
- Suggestion: Queue system for batch config changes
- Effort: Medium
-
Search & Filters
- Current: Basic platform/source filters
- Suggestions:
- Date range picker UI
- File size filters
- Content type hierarchy
- Effort: Low
-
Advanced Metadata Editing
- Current: Read-only metadata display
- Suggestion: Edit post dates, tags, descriptions
- Effort: Medium
-
Duplicate Detection Improvements
- Current: File hash based
- Suggestion: Perceptual hashing for images (detect same photo at different resolutions)
- Effort: High
Integration Features
-
Webhook Support
- Use Case: Trigger downloads from external services
- Effort: Medium
-
API Key Authentication
- Current: JWT only
- Suggestion: Support API keys for programmatic access
- Effort: Low
-
Export/Import Functionality
- Suggestion: Export download history, settings to JSON/CSV
- Effort: Low
Platform Support
- Additional Platforms
- Missing: LinkedIn, Pinterest, X/Twitter, Reddit
- Effort: High per platform
6. BUG RISKS
Race Conditions
-
Database Write Conflicts
- Location:
/opt/media-downloader/modules/unified_database.py(lines 728-793) - Issue: Multiple processes writing simultaneously could hit database locks
- Current Mitigation: WAL mode, write locks, retries (good!)
- Enhancement: Add distributed lock if scaling to multiple servers
- Location:
-
Face Recognition Concurrent Access
- Location:
/opt/media-downloader/web/backend/api.py(line 225) - Issue: Face recognition limited to 1 concurrent via semaphore
- Status: ✓ Protected
- Note: But blocking may cause timeouts if many requests queue
- Location:
-
Cookie/Session File Access
- Location:
/opt/media-downloader/modules/fastdl_module.py(line 77) - Issue: Multiple downloader instances reading/writing cookies.json simultaneously
- Risk: File corruption or lost updates
- Fix: Add file locking
- Location:
Memory Leaks
-
Unclosed File Handles
- Location:
/opt/media-downloader/modules/download_manager.py(streams) - Review: Check all file operations use context managers
- Status: Need to verify
- Location:
-
WebSocket Connection Leaks
- Location:
/opt/media-downloader/web/backend/api.py(lines 334-348) - Issue: ConnectionManager stores WebSocket refs
- Risk: Disconnected clients not properly cleaned up
- Fix: Add timeout/heartbeat for stale connections
- Location:
-
Large Image Processing
- Location: Image thumbnail generation
- Risk: In-memory image processing could OOM with large files
- Recommendation: Stream processing or size limits
Data Integrity
-
Incomplete Download Tracking
- Location:
/opt/media-downloader/modules/download_manager.py(DownloadResult) - Issue: If database insert fails after successful download, file orphaned
- Fix: Transactional approach - record first, then download
- Location:
-
Timestamp Modification
- Location:
/opt/media-downloader/media-downloader.py(lines 2033-2035) - Issue: Using
os.utime()may fail silently
os.utime(dest_file, (ts, ts)) # No error handling- Fix: Check return value and log failures
- Location:
-
Partial Recycle Bin Operations
- Location:
/opt/media-downloader/modules/unified_database.py(lines 1472-1533) - Issue: If file move fails but DB updates success, inconsistent state
- Fix: Rollback DB changes if file move fails
- Location:
7. SPECIFIC CODE ISSUES
Path Handling
-
Hardcoded Paths
- Location:
/opt/media-downloader/modules/unified_database.pyline 1432:/opt/immich/recycle- Various modules hardcode
/opt/media-downloader
- Issue: Not portable, breaks if deployed elsewhere
- Fix: Use environment variables with fallbacks
- Location:
-
Path Validation Missing
- Location: Media file serving endpoints
- Issue: No symlink attack prevention
- Fix: Use
Path.resolve()and verify within allowed directory
Settings Management
- Settings Validation
- Location:
/opt/media-downloader/modules/settings_manager.py - Issue: No schema validation for settings
- Recommendation: Use Pydantic models for all settings
- Location:
API Design
-
Inconsistent Response Formats
- Issue: Some endpoints return {success, data}, others just data
- Recommendation: Standardize on single response envelope
-
Missing API Documentation
- Suggestion: Add OpenAPI/Swagger documentation
- Benefit: Self-documenting API, auto-generated client SDKs
RECOMMENDATIONS PRIORITY LIST
IMMEDIATE (Week 1)
- Remove tokens from URL queries - Use Authorization header only
- Add CSRF protection - Use starlette-csrf
- Fix bare except clauses - Catch specific exceptions
- Add file path validation - Prevent directory traversal
- Add security headers - CSP, X-Frame-Options, etc.
SHORT TERM (Week 2-4)
- Implement rate limiting on routes - Protect all write operations
- Fix JSON search performance - Use proper JSON queries or separate columns
- Add input validation on config - Validate all settings updates
- Extract adapter duplications - Create generic base adapter
- Standardize logging - Remove print(), use logger everywhere
- Add type hints - Run mypy on entire codebase
MEDIUM TERM (Month 2)
- Implement caching layer - Redis/in-memory for hot data
- Add async file I/O - Use aiofiles for media operations
- Extract browser logic - Separate Playwright concerns
- Add WebSocket heartbeat - Prevent connection leaks
- Implement distributed locking - If scaling to multiple instances
LONG TERM (Month 3+)
- Add perceptual hashing - Better duplicate detection
- Implement API key auth - Support programmatic access
- Add webhook support - External service integration
- Refactor main class - Implement dependency injection
TESTING RECOMMENDATIONS
Current State
- Test directory exists (
/opt/media-downloader/tests/) with 10 test files - Status: Need to verify test coverage
Recommendations
- Add unit tests for core database operations
- Add integration tests for download pipeline
- Add security tests (SQL injection, path traversal, CSRF)
- Add load tests for concurrent downloads
- Add UI tests for critical flows (login, config, downloads)
DEPLOYMENT RECOMMENDATIONS
-
Environment Configuration
- Move all hardcoded paths to environment variables
- Document all required env vars
- Use
.env.exampletemplate
-
Database
- Regular backups of media_downloader.db
- Monitor database file size
- Implement retention policies for old records
-
Security
- Use strong JWT secret (already implemented, good)
- Enable HTTPS only in production
- Implement rate limiting on all API endpoints
- Regular security audits
-
Monitoring
- Add health check endpoint monitoring
- Set up alerts for database locks
- Monitor disk space for media/recycle bin
- Log critical errors to centralized system
-
Scaling
- Current design assumes single instance
- For multi-instance: implement distributed locking, session sharing
- Consider message queue for download jobs (Redis/RabbitMQ)
CONCLUSION
The Media Downloader application is well-architected with good separation of concerns, proper database design, and thoughtful authentication implementation. The main areas for improvement are:
- Security: Primarily around token handling, path validation, and CSRF protection
- Performance: Database query optimization, especially JSON searches and file I/O
- Code Quality: Reducing duplication, standardizing error handling and logging
- Testing: Expanding test coverage, especially for security-critical paths
With the recommended fixes prioritized by the provided list, the application can achieve production-grade quality suitable for enterprise deployment.
Overall Code Grade: B+ (Good with specific improvements needed)