Files
media-downloader/docs/GUI_DESIGN_PLAN.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

998 lines
25 KiB
Markdown

# Media Downloader - GUI Design & Implementation Plan
**Version:** 1.0
**Date:** October 25, 2025
**Status:** Planning Phase
---
## Table of Contents
1. [Executive Summary](#executive-summary)
2. [Current System Analysis](#current-system-analysis)
3. [GUI Architecture Options](#gui-architecture-options)
4. [Recommended Approach](#recommended-approach)
5. [Technology Stack](#technology-stack)
6. [Implementation Phases](#implementation-phases)
7. [Feature Roadmap](#feature-roadmap)
8. [API Specification](#api-specification)
9. [UI/UX Design](#uiux-design)
10. [Database Integration](#database-integration)
11. [Real-time Updates](#real-time-updates)
12. [Security Considerations](#security-considerations)
13. [Development Timeline](#development-timeline)
---
## Executive Summary
The Media Downloader GUI project aims to create a modern, user-friendly web interface for managing automated media downloads from multiple platforms (Instagram, TikTok, Snapchat, Forums). The GUI will be modeled after the proven **backup-central** architecture, using Node.js/Express backend with vanilla JavaScript frontend.
### Key Goals:
- **Maintain existing Python backend** - Preserve all battle-tested scraping logic
- **Modern web interface** - Real-time updates, responsive design, dark/light themes
- **Easy management** - Visual account configuration, manual triggers, scheduler control
- **Enterprise-grade** - Similar to backup-central's polished UI and reliability
---
## Current System Analysis
### Existing Architecture
```
media-downloader.py (Python Orchestrator)
├── Unified Database (SQLite with WAL mode)
│ ├── downloads table (1,183+ records)
│ ├── forum_threads, forum_posts
│ ├── scheduler_state, download_queue
│ └── File hash deduplication (NEW)
├── Platform Modules (16 modules)
│ ├── instaloader_module.py (Instagram via API)
│ ├── fastdl_module.py (Instagram web scraper)
│ ├── imginn_module.py (Instagram alternative)
│ ├── toolzu_module.py (High-res Instagram 1920x1440)
│ ├── snapchat_scraper.py (direct Playwright scraper)
│ ├── tiktok_module.py (yt-dlp wrapper)
│ └── forum_downloader.py (7 forum types)
├── Subprocess Wrappers (Playwright automation)
│ ├── fastdl_subprocess_wrapper.py
│ ├── imginn_subprocess_wrapper.py
│ ├── toolzu_subprocess_wrapper.py
│ ├── snapchat_subprocess_wrapper.py
│ └── forum_subprocess_wrapper.py
├── Support Systems
│ ├── scheduler.py (randomized intervals, persistent state)
│ ├── move_module.py (file operations + deduplication)
│ ├── pushover_notifier.py (push notifications)
│ ├── download_manager.py (multi-threaded downloads)
│ └── unified_database.py (connection pooling, WAL mode)
└── Configuration
└── config/settings.json (100+ parameters)
```
### Current Capabilities
**Supported Platforms:**
- Instagram (4 methods: InstaLoader, FastDL, ImgInn, Toolzu)
- TikTok (via yt-dlp)
- Snapchat Stories
- Forums (XenForo, vBulletin, phpBB, Discourse, IPB, MyBB, SMF)
**Advanced Features:**
- Quality upgrade merging (FastDL + Toolzu)
- File hash deduplication (SHA256-based)
- Timestamp preservation (EXIF metadata)
- Randomized scheduler intervals
- Pushover notifications with thumbnails
- Immich photo library integration
- Cookie-based authentication
- 2captcha CAPTCHA solving
- Browser automation (Playwright)
**Statistics:**
- 19,100+ lines of production Python code
- 1,183+ downloads tracked
- 213 files with SHA256 hashes
- 30 duplicate groups detected
- 8 database tables with 17 indexes
---
## GUI Architecture Options
### Option 1: Hybrid Approach ⭐ **RECOMMENDED**
**Architecture:**
```
┌─────────────────────────────────────┐
│ Node.js Web GUI │
│ - Express.js API server │
│ - Vanilla JS frontend │
│ - Real-time WebSocket updates │
│ - Chart.js analytics │
└──────────────┬──────────────────────┘
│ REST API + WebSocket
┌─────────────────────────────────────┐
│ Existing Python Backend │
│ - All platform downloaders │
│ - Database layer │
│ - Scheduler │
│ - Browser automation │
└─────────────────────────────────────┘
```
**Pros:**
✅ Preserves all battle-tested scraping logic
✅ Modern, responsive web UI
✅ Lower risk, faster development (4-8 weeks)
✅ Python ecosystem better for web scraping
✅ Can develop frontend and API simultaneously
**Cons:**
⚠️ Two codebases to maintain (Node.js + Python)
⚠️ Inter-process communication overhead
---
### Option 2: Full Node.js Rewrite
**Architecture:**
```
┌─────────────────────────────────────┐
│ Full Node.js/TypeScript Stack │
│ - Express/Fastify API │
│ - React/Next.js frontend │
│ - Playwright Node.js bindings │
│ - Prisma ORM │
└─────────────────────────────────────┘
```
**Pros:**
✅ Unified JavaScript/TypeScript codebase
✅ Modern tooling, better IDE support
✅ Easier for full-stack JS developers
**Cons:**
❌ 3-6 months minimum development time
❌ Need to reimplement all platform scraping
❌ Risk of losing subtle platform-specific fixes
❌ No instaloader equivalent in Node.js
❌ Complex authentication flows need rediscovery
**Verdict:** Only consider if planning long-term open-source project with JavaScript contributors.
---
### Option 3: Simple Dashboard (Quickest)
**Architecture:**
```
Node.js Dashboard (read-only)
├── Reads SQLite database directly
├── Displays stats, history, schedules
├── Tails Python logs
└── No control features (view-only)
```
**Timeline:** 1-2 weeks
**Use Case:** Quick visibility without control features
---
## Recommended Approach
### **Hybrid Architecture with Backup-Central Design Pattern**
After analyzing `/opt/backup-central`, we recommend adopting its proven architecture:
**Backend Stack:**
- Express.js (HTTP server)
- WebSocket (ws package) for real-time updates
- SQLite3 (reuse existing unified database)
- Winston (structured logging)
- node-cron (scheduler coordination)
- Helmet + Compression (security & performance)
**Frontend Stack:**
- **Vanilla JavaScript** (no React/Vue - faster, simpler)
- Chart.js (analytics visualizations)
- Font Awesome (icons)
- Inter font (modern typography)
- Mobile-responsive CSS
- Dark/Light theme support
**Why Backup-Central's Approach:**
1. Proven in production
2. Simple to understand and maintain
3. Fast loading (no framework overhead)
4. Real-time updates work flawlessly
5. Beautiful, modern UI without complexity
---
## Technology Stack
### Backend (Node.js)
```json
{
"dependencies": {
"express": "^4.18.2",
"ws": "^8.14.2",
"sqlite3": "^5.1.7",
"winston": "^3.18.3",
"node-cron": "^4.2.1",
"compression": "^1.8.1",
"helmet": "^8.1.0",
"dotenv": "^17.2.3",
"express-session": "^1.18.2",
"jsonwebtoken": "^9.0.2"
}
}
```
### Frontend (Vanilla JS)
```html
<!-- Libraries -->
<script src="chart.min.js"></script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700">
```
### Python Integration
```javascript
// Subprocess execution for Python backend
const { spawn } = require('child_process');
function triggerDownload(platform, username) {
return spawn('python3', [
'media-downloader.py',
'--platform', platform,
'--username', username
]);
}
```
---
## Implementation Phases
### **Phase 1: Backend API Foundation** (Week 1-2)
**Deliverables:**
```
media-downloader-gui/
├── server.js (Express + WebSocket)
├── .env.example
├── package.json
└── lib/
├── db-helper.js (SQLite wrapper)
├── python-bridge.js (subprocess manager)
├── logger.js (Winston)
└── api-v1/
├── downloads.js
├── accounts.js
├── stats.js
├── scheduler.js
└── config.js
```
**API Endpoints:**
- `GET /api/downloads` - Query download history
- `GET /api/downloads/recent` - Last 100 downloads
- `POST /api/downloads/trigger` - Manual download trigger
- `GET /api/accounts` - List all configured accounts
- `POST /api/accounts` - Add new account
- `PUT /api/accounts/:id` - Update account
- `DELETE /api/accounts/:id` - Remove account
- `GET /api/stats` - Platform statistics
- `GET /api/scheduler/status` - Scheduler state
- `POST /api/scheduler/start` - Start scheduler
- `POST /api/scheduler/stop` - Stop scheduler
- `GET /api/config` - Read configuration
- `PUT /api/config` - Update configuration
- `GET /api/logs` - Tail Python logs
- `WS /api/live` - Real-time updates
---
### **Phase 2: Core Frontend UI** (Week 3-4)
**Dashboard Layout:**
```
┌─────────────────────────────────────────────────────┐
│ Header: Media Downloader | [Theme] [Profile] [⚙️] │
├─────────────────────────────────────────────────────┤
│ Platform Cards │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │Instagram │ │ TikTok │ │ Snapchat │ │
│ │ 523 DL │ │ 87 DL │ │ 142 DL │ │
│ │ ▶️ Trigger│ │ ▶️ Trigger│ │ ▶️ Trigger│ │
│ └──────────┘ └──────────┘ └──────────┘ │
├─────────────────────────────────────────────────────┤
│ Recent Downloads (Live Feed) │
│ 🟢 evalongoria_20251025... (Instagram/evalongoria) │
│ 🟢 20251025_TikTok... (TikTok/evalongoria) │
│ ⚠️ Duplicate skipped: photo.jpg (hash match) │
├─────────────────────────────────────────────────────┤
│ Statistics (Chart.js) │
│ 📊 Downloads per Platform | 📈 Timeline Graph │
└─────────────────────────────────────────────────────┘
```
**Components:**
1. **Dashboard** (`public/index.html`)
- Platform overview cards
- Live download feed (WebSocket)
- Quick stats
2. **Accounts Manager** (`public/accounts.html`)
- Add/Edit/Delete Instagram usernames
- Add/Edit/Delete TikTok accounts
- Add/Edit/Delete Forum configurations
- Per-account interval settings
3. **Download History** (`public/history.html`)
- Searchable table
- Filter by platform/source/date
- Thumbnail previews
- Duplicate indicators
4. **Scheduler Control** (`public/scheduler.html`)
- Enable/Disable scheduler
- View next run times
- Adjust global intervals
- Force run specific tasks
5. **Configuration Editor** (`public/config.html`)
- JSON editor with validation
- Platform-specific settings
- Notification configuration
- Immich integration settings
6. **Logs Viewer** (`public/logs.html`)
- Tail Python application logs
- Filter by level (DEBUG/INFO/WARNING/ERROR)
- Search functionality
- Auto-scroll toggle
---
### **Phase 3: Advanced Features** (Week 5-6)
**Real-time Features:**
```javascript
// WebSocket message types
{
type: 'download_start',
platform: 'instagram',
username: 'evalongoria',
content_type: 'story'
}
{
type: 'download_complete',
platform: 'instagram',
filename: 'evalongoria_20251025_123456.jpg',
file_size: 245678,
duplicate: false
}
{
type: 'duplicate_detected',
filename: 'photo.jpg',
existing_file: 'photo_original.jpg',
platform: 'instagram'
}
{
type: 'scheduler_update',
task_id: 'instagram:evalongoria',
next_run: '2025-10-25T23:00:00Z'
}
```
**Features:**
- Live download progress bars
- Duplicate detection alerts
- Scheduler countdown timers
- Platform health indicators
- Download speed metrics
---
### **Phase 4: Polish & Deploy** (Week 7-8)
**Final Touches:**
- Mobile-responsive design
- Dark mode implementation
- Keyboard shortcuts
- Toast notifications (success/error)
- Loading skeletons
- Error boundary handling
- Performance optimization
- Security hardening
- Documentation
- Deployment scripts
---
## Feature Roadmap
### **MVP Features** (Phase 1-2)
✅ View download history
✅ See platform statistics
✅ Manual download triggers
✅ Account management (CRUD)
✅ Real-time download feed
✅ Dark/Light theme
✅ Mobile responsive
### **Enhanced Features** (Phase 3)
🔄 Scheduler control (start/stop/adjust)
🔄 Configuration editor
🔄 Logs viewer
🔄 Advanced search/filtering
🔄 Duplicate management UI
🔄 Download queue management
### **Future Features** (Phase 4+)
📋 Batch operations (delete/retry multiple)
📋 Download rules engine (auto-skip based on criteria)
📋 Analytics dashboard (trends, insights)
📋 Export/Import configurations
📋 Webhook integrations
📋 Multi-user support with authentication
📋 API key management
📋 Browser screenshot viewer (see Playwright automation)
📋 Cookie editor (manage authentication)
---
## API Specification
### REST API Endpoints
#### Downloads
**GET /api/downloads**
```javascript
// Query downloads with filters
GET /api/downloads?platform=instagram&limit=50&offset=0
Response:
{
"total": 1183,
"downloads": [
{
"id": 1,
"url": "https://...",
"url_hash": "sha256...",
"platform": "instagram",
"source": "evalongoria",
"content_type": "story",
"filename": "evalongoria_20251025_123456.jpg",
"file_path": "/opt/immich/md/social media/instagram/...",
"file_size": 245678,
"file_hash": "sha256...",
"post_date": "2025-10-25T12:34:56Z",
"download_date": "2025-10-25T12:35:00Z",
"status": "completed",
"metadata": {}
}
]
}
```
**POST /api/downloads/trigger**
```javascript
// Trigger manual download
POST /api/downloads/trigger
{
"platform": "instagram",
"username": "evalongoria",
"content_types": ["stories", "posts"]
}
Response:
{
"status": "started",
"job_id": "instagram_evalongoria_1729900000",
"message": "Download started in background"
}
```
#### Accounts
**GET /api/accounts**
```javascript
GET /api/accounts?platform=instagram
Response:
{
"instagram": [
{
"username": "evalongoria",
"enabled": true,
"check_interval_hours": 6,
"content_types": {
"posts": true,
"stories": true,
"reels": false
}
}
],
"tiktok": [...],
"snapchat": [...]
}
```
**POST /api/accounts**
```javascript
POST /api/accounts
{
"platform": "instagram",
"username": "newuser",
"check_interval_hours": 12,
"content_types": {
"posts": true,
"stories": false
}
}
Response:
{
"success": true,
"account": { ... }
}
```
#### Statistics
**GET /api/stats**
```javascript
GET /api/stats
Response:
{
"platforms": {
"instagram": {
"total": 523,
"completed": 520,
"failed": 3,
"duplicates": 15,
"total_size": 1234567890
},
"tiktok": { ... },
"snapchat": { ... }
},
"recent_activity": {
"last_24h": 45,
"last_7d": 312
}
}
```
#### Scheduler
**GET /api/scheduler/status**
```javascript
GET /api/scheduler/status
Response:
{
"running": true,
"tasks": [
{
"task_id": "instagram:evalongoria",
"last_run": "2025-10-25T12:00:00Z",
"next_run": "2025-10-25T18:00:00Z",
"interval_hours": 6,
"status": "active"
}
]
}
```
#### Configuration
**GET /api/config**
```javascript
GET /api/config
Response:
{
"instagram": { ... },
"tiktok": { ... },
"pushover": { ... },
"immich": { ... }
}
```
**PUT /api/config**
```javascript
PUT /api/config
{
"instagram": {
"enabled": true,
"check_interval_hours": 8
}
}
Response:
{
"success": true,
"config": { ... }
}
```
### WebSocket Events
**Client → Server:**
```javascript
// Subscribe to live updates
{
"action": "subscribe",
"channels": ["downloads", "scheduler", "duplicates"]
}
```
**Server → Client:**
```javascript
// Download started
{
"type": "download_start",
"timestamp": "2025-10-25T12:34:56Z",
"platform": "instagram",
"username": "evalongoria"
}
// Download completed
{
"type": "download_complete",
"timestamp": "2025-10-25T12:35:00Z",
"platform": "instagram",
"filename": "evalongoria_20251025_123456.jpg",
"file_size": 245678,
"duplicate": false
}
// Duplicate detected
{
"type": "duplicate_detected",
"timestamp": "2025-10-25T12:35:05Z",
"filename": "photo.jpg",
"existing_file": {
"filename": "photo_original.jpg",
"platform": "instagram",
"source": "evalongoria"
}
}
```
---
## UI/UX Design
### Design System (Inspired by Backup-Central)
**Colors:**
```css
:root {
/* Light Theme */
--primary-color: #2563eb;
--secondary-color: #64748b;
--success-color: #10b981;
--warning-color: #f59e0b;
--error-color: #ef4444;
--bg-color: #f8fafc;
--card-bg: #ffffff;
--text-color: #1e293b;
--border-color: #e2e8f0;
}
[data-theme="dark"] {
/* Dark Theme */
--primary-color: #3b82f6;
--bg-color: #0f172a;
--card-bg: #1e293b;
--text-color: #f1f5f9;
--border-color: #334155;
}
```
**Typography:**
```css
font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
```
**Components:**
- Cards with subtle shadows
- Rounded corners (8px border-radius)
- Smooth transitions (0.3s ease)
- Gradient accents on hover
- Loading skeletons
- Toast notifications (top-right)
---
## Database Integration
### Database Access Strategy
**Read Operations (Node.js):**
```javascript
// Direct SQLite reads for fast queries
const db = require('better-sqlite3')('/opt/media-downloader/database/media_downloader.db');
const downloads = db.prepare(`
SELECT * FROM downloads
WHERE platform = ?
ORDER BY download_date DESC
LIMIT ?
`).all('instagram', 50);
```
**Write Operations (Python):**
```javascript
// Route through Python backend for consistency
const { spawn } = require('child_process');
function addAccount(platform, username) {
// Update config.json
// Trigger Python process to reload config
}
```
**Why This Approach:**
- Python maintains database writes (consistency)
- Node.js reads for fast UI queries
- No duplicate database logic
- Leverages existing connection pooling
---
## Real-time Updates
### WebSocket Architecture
**Server-Side (Node.js):**
```javascript
const WebSocket = require('ws');
const wss = new WebSocket.Server({ server });
// Broadcast to all connected clients
function broadcast(message) {
wss.clients.forEach(client => {
if (client.readyState === WebSocket.OPEN) {
client.send(JSON.stringify(message));
}
});
}
// Watch Python logs for events
const { spawn } = require('child_process');
const pythonProcess = spawn('python3', ['media-downloader.py', '--daemon']);
pythonProcess.stdout.on('data', (data) => {
// Parse log output and broadcast events
const event = parseLogEvent(data.toString());
if (event) broadcast(event);
});
```
**Client-Side (JavaScript):**
```javascript
const ws = new WebSocket('ws://localhost:3000/api/live');
ws.onmessage = (event) => {
const data = JSON.parse(event.data);
switch(data.type) {
case 'download_complete':
addToDownloadFeed(data);
updateStats();
showToast(`Downloaded ${data.filename}`, 'success');
break;
case 'duplicate_detected':
showToast(`Duplicate skipped: ${data.filename}`, 'warning');
break;
}
};
```
---
## Security Considerations
### Authentication (Optional for Single-User)
**Simple Auth:**
- Environment variable password
- Session-based auth (express-session)
- No registration needed
**Enhanced Auth (Future):**
- TOTP/2FA (speakeasy)
- Passkeys (WebAuthn)
- JWT tokens
- Per-user configurations
### API Security
```javascript
// Helmet for security headers
app.use(helmet());
// CORS configuration
app.use(cors({
origin: process.env.ALLOWED_ORIGINS?.split(',') || '*',
credentials: true
}));
// Rate limiting
const rateLimit = require('express-rate-limit');
const limiter = rateLimit({
windowMs: 15 * 60 * 1000, // 15 minutes
max: 100 // limit each IP to 100 requests per windowMs
});
app.use('/api/', limiter);
```
### Environment Variables
```bash
# .env
NODE_ENV=production
PORT=3000
SESSION_SECRET=random_secret_key
PYTHON_PATH=/opt/media-downloader/venv/bin/python3
DATABASE_PATH=/opt/media-downloader/database/media_downloader.db
CONFIG_PATH=/opt/media-downloader/config/settings.json
```
---
## Development Timeline
### **Estimated Timeline: 8 Weeks**
**Week 1-2: Backend API**
- Express server setup
- Database integration
- Python subprocess bridge
- Basic API endpoints
- WebSocket setup
**Week 3-4: Core Frontend**
- Dashboard layout
- Platform cards
- Download feed
- Account management UI
- Basic stats
**Week 5-6: Advanced Features**
- Real-time updates
- Scheduler control
- Config editor
- Logs viewer
- Search/filtering
**Week 7-8: Polish**
- Mobile responsive
- Dark mode
- Error handling
- Testing
- Documentation
- Deployment
---
## Next Steps
### Immediate Actions:
1. **✅ File Hash Deduplication** - COMPLETED
- Added SHA256 hashing to unified_database.py
- Implemented automatic duplicate detection in move_module.py
- Created utilities for backfilling and managing hashes
- Scanned 213 existing files and found 30 duplicate groups
2. **✅ Directory Cleanup** - COMPLETED
- Moved test files to `tests/` directory
- Moved one-time scripts to `archive/`
- Organized utilities in `utilities/` directory
- Removed obsolete documentation
3. **📋 Begin GUI Development**
- Initialize Node.js project
- Set up Express server
- Create basic API endpoints
- Build dashboard prototype
---
## References
- **Backup-Central:** `/opt/backup-central` - Reference implementation
- **Python Backend:** `/opt/media-downloader/media-downloader.py`
- **Database Schema:** `/opt/media-downloader/modules/unified_database.py`
- **Existing Docs:** `/opt/media-downloader/archive/` (old GUI plans)
---
## Appendix
### Directory Structure After Cleanup
```
/opt/media-downloader/
├── media-downloader.py (main application)
├── setup.py (installation script)
├── INSTALL.md (installation guide)
├── GUI_DESIGN_PLAN.md (this document)
├── requirements.txt
├── config/
│ └── settings.json
├── database/
│ ├── media_downloader.db
│ └── scheduler_state.db
├── modules/ (16 Python modules)
│ ├── unified_database.py
│ ├── scheduler.py
│ ├── move_module.py
│ ├── instaloader_module.py
│ ├── fastdl_module.py
│ ├── imginn_module.py
│ ├── toolzu_module.py
│ ├── snapchat_module.py
│ ├── tiktok_module.py
│ ├── forum_downloader.py
│ └── ... (10 more modules)
├── utilities/
│ ├── backfill_file_hashes.py
│ ├── cleanup_database_filenames.py
│ └── scan_and_hash_files.py
├── archive/ (old docs, one-time scripts)
│ ├── HIGH_RES_DOWNLOAD.md
│ ├── SNAPCHAT_*.md
│ ├── TOOLZU-TIMESTAMPS.md
│ ├── WEB_GUI_*.md (4 old GUI docs)
│ ├── cleanup_last_week.py
│ ├── merge-quality-upgrade.py
│ ├── reset_database.py
│ └── debug_snapchat.py
├── tests/ (7 test scripts)
│ ├── test_all_notifications.py
│ ├── test_pushover.py
│ └── ... (5 more tests)
├── subprocess wrappers/ (5 wrappers)
│ ├── fastdl_subprocess_wrapper.py
│ ├── imginn_subprocess_wrapper.py
│ ├── toolzu_subprocess_wrapper.py
│ ├── snapchat_subprocess_wrapper.py
│ └── forum_subprocess_wrapper.py
├── venv/ (Python virtual environment)
├── logs/ (application logs)
├── temp/ (temporary download directories)
└── ... (other directories)
```
---
**End of Document**
For questions or updates, refer to this document as the single source of truth for GUI development planning.