Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00
commit 0d7b2b1aab
389 changed files with 280296 additions and 0 deletions
--- a/docs/SCRAPER_PROXY_SYSTEM.md
+++ b/docs/SCRAPER_PROXY_SYSTEM.md
@@ -0,0 +1,760 @@
+# Scraper Proxy Configuration System
+
+## Overview
+
+This document describes the design and implementation plan for a centralized scraper configuration system that provides:
+
+1. **Per-scraper proxy settings** - Configure different proxies for different scrapers
+2. **Centralized cookie management** - Store cookies in database instead of files
+3. **FlareSolverr integration** - Test connections and refresh Cloudflare cookies
+4. **Cookie upload support** - Upload cookies from browser extensions for authenticated access
+5. **Unified Settings UI** - Single place to manage all scraper configurations
+
+## Background
+
+### Problem Statement
+
+- Proxy settings are not configurable per-module
+- Cookies are stored in scattered JSON files
+- No UI to test FlareSolverr connections or manage cookies
+- Adding new forums requires code changes
+- No visibility into cookie freshness or scraper health
+
+### Solution
+
+A new `scrapers` database table that:
+- Stores configuration for all automated scrapers
+- Provides proxy settings per-scraper
+- Centralizes cookie storage with merge logic
+- Syncs automatically with platform configurations
+- Exposes management via Settings UI
+
+---
+
+## Database Schema
+
+### Table: `scrapers`
+
+```sql
+CREATE TABLE scrapers (
+    id TEXT PRIMARY KEY,
+    name TEXT NOT NULL,
+    type TEXT NOT NULL,               -- 'direct', 'proxy', 'forum', 'cli_tool'
+    module TEXT,                      -- Python module name, NULL for cli_tool
+    base_url TEXT,                    -- Primary URL for the scraper
+    target_platform TEXT,             -- 'instagram', 'snapchat', 'tiktok', NULL for forums/cli
+    enabled INTEGER DEFAULT 1,        -- Enable/disable scraper
+
+    -- Proxy settings
+    proxy_enabled INTEGER DEFAULT 0,
+    proxy_url TEXT,                   -- e.g., "socks5://user:pass@host:port"
+
+    -- Cloudflare/Cookie settings
+    flaresolverr_required INTEGER DEFAULT 0,
+    cookies_json TEXT,                -- JSON blob of cookies
+    cookies_updated_at TEXT,          -- ISO timestamp of last cookie update
+
+    -- Test status
+    last_test_at TEXT,                -- ISO timestamp of last test
+    last_test_status TEXT,            -- 'success', 'failed', 'timeout'
+    last_test_message TEXT,           -- Error message if failed
+
+    -- Module-specific settings
+    settings_json TEXT,               -- Additional JSON settings per-scraper
+
+    created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+    updated_at TEXT DEFAULT CURRENT_TIMESTAMP
+);
+```
+
+### Column Definitions
+
+| Column | Type | Description |
+|--------|------|-------------|
+| `id` | TEXT | Unique identifier (e.g., 'imginn', 'forum_phun') |
+| `name` | TEXT | Display name shown in UI |
+| `type` | TEXT | One of: 'direct', 'proxy', 'forum', 'cli_tool' |
+| `module` | TEXT | Python module name (e.g., 'imginn_module'), NULL for CLI tools |
+| `base_url` | TEXT | Primary URL for the service |
+| `target_platform` | TEXT | What platform this scraper downloads from (instagram, snapchat, tiktok, NULL) |
+| `enabled` | INTEGER | 1=enabled, 0=disabled |
+| `proxy_enabled` | INTEGER | 1=use proxy, 0=direct connection |
+| `proxy_url` | TEXT | Proxy URL (http, https, socks5 supported) |
+| `flaresolverr_required` | INTEGER | 1=needs FlareSolverr for Cloudflare bypass |
+| `cookies_json` | TEXT | JSON array of cookie objects |
+| `cookies_updated_at` | TEXT | When cookies were last updated |
+| `last_test_at` | TEXT | When connection was last tested |
+| `last_test_status` | TEXT | Result of last test: 'success', 'failed', 'timeout' |
+| `last_test_message` | TEXT | Error message from last failed test |
+| `settings_json` | TEXT | Module-specific settings as JSON |
+
+### Scraper Types
+
+| Type | Description | Examples |
+|------|-------------|----------|
+| `direct` | Downloads directly from the platform | instagram, tiktok, snapchat, coppermine |
+| `proxy` | Uses a proxy service to download | imginn, fastdl, toolzu |
+| `forum` | Forum scraper | forum_phun, forum_hqcelebcorner, forum_picturepub |
+| `cli_tool` | Command-line tool wrapper | ytdlp, gallerydl |
+
+### Target Platforms
+
+The `target_platform` field indicates what platform the scraper actually downloads content from:
+
+| Scraper | Target Platform | Notes |
+|---------|-----------------|-------|
+| imginn | instagram | Proxy service for Instagram |
+| fastdl | instagram | Proxy service for Instagram |
+| toolzu | instagram | Proxy service for Instagram |
+| snapchat | snapchat | Direct via Playwright scraper |
+| instagram | instagram | Direct via Instaloader |
+| tiktok | tiktok | Direct via yt-dlp internally |
+| coppermine | NULL | Not a social platform |
+| forum_* | NULL | Not a social platform |
+| ytdlp | NULL | Generic tool, multiple platforms |
+| gallerydl | NULL | Generic tool, multiple platforms |
+
+---
+
+## Seed Data
+
+Initial scrapers to populate on first run:
+
+| id | name | type | module | base_url | target_platform | flaresolverr_required |
+|----|------|------|--------|----------|-----------------|----------------------|
+| imginn | Imginn | proxy | imginn_module | https://imginn.com | instagram | 1 |
+| fastdl | FastDL | proxy | fastdl_module | https://fastdl.app | instagram | 1 |
+| toolzu | Toolzu | proxy | toolzu_module | https://toolzu.com | instagram | 1 |
+| snapchat | Snapchat Direct | direct | snapchat_scraper | https://snapchat.com | snapchat | 0 |
+| instagram | Instagram (Direct) | direct | instaloader_module | https://instagram.com | instagram | 0 |
+| tiktok | TikTok | direct | tiktok_module | https://tiktok.com | tiktok | 0 |
+| coppermine | Coppermine | direct | coppermine_module | https://hqdiesel.net | NULL | 1 |
+| forum_phun | Phun.org | forum | forum_downloader | https://forum.phun.org | NULL | 1 |
+| forum_hqcelebcorner | HQCelebCorner | forum | forum_downloader | https://hqcelebcorner.com | NULL | 0 |
+| forum_picturepub | PicturePub | forum | forum_downloader | https://picturepub.net | NULL | 0 |
+| ytdlp | yt-dlp | cli_tool | NULL | NULL | NULL | 0 |
+| gallerydl | gallery-dl | cli_tool | NULL | NULL | NULL | 0 |
+
+### Notes on Seed Data
+
+1. **Snapchat**: Uses direct Playwright-based scraper with optional proxy support (configured per-scraper in Scrapers settings page)
+
+2. **Forums**: Derived from existing `forum_threads` table entries and cookie files
+
+3. **Excluded scrapers**: YouTube and Bilibili are NOT included - they are on-demand downloaders from the Video Downloader page, not scheduled scrapers
+
+---
+
+## Auto-Sync Logic
+
+The scrapers table stays in sync with platform configurations automatically:
+
+### When Forums Change
+- New forum added in Forums settings → Create scraper entry with `type='forum'`
+- Forum removed from settings → Remove scraper entry
+
+### When Modules Are Enabled/Disabled
+- Module enabled → Ensure scraper entry exists
+- Module disabled → Scraper entry remains but `enabled=0`
+
+### No Manual Add/Delete
+- The Scrapers UI does NOT have Add or Delete buttons
+- Scrapers are managed through their respective platform configuration pages
+- Scrapers UI only manages: proxy settings, testing, cookies
+
+---
+
+## Cookie Management
+
+### Storage Format
+
+Cookies are stored as JSON in the `cookies_json` column:
+
+```json
+{
+  "cookies": [
+    {
+      "name": "cf_clearance",
+      "value": "abc123...",
+      "domain": ".imginn.com",
+      "path": "/",
+      "expiry": 1735689600
+    },
+    {
+      "name": "session_id",
+      "value": "xyz789...",
+      "domain": "imginn.com",
+      "path": "/",
+      "expiry": -1
+    }
+  ],
+  "user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36..."
+}
+```
+
+### Cookie Merge Logic
+
+**CRITICAL**: When updating cookies, MERGE with existing - never wipe:
+
+```python
+def merge_cookies(existing_cookies: list, new_cookies: list) -> list:
+    """
+    Merge new cookies into existing, preserving non-updated cookies.
+
+    This ensures:
+    - Cloudflare cookies (cf_clearance, __cf_bm) get refreshed
+    - Site session/auth cookies are preserved
+    - No data loss on test/refresh
+    """
+    # Index existing by name
+    cookie_map = {c['name']: c for c in existing_cookies}
+
+    # Update/add from new cookies
+    for cookie in new_cookies:
+        cookie_map[cookie['name']] = cookie
+
+    return list(cookie_map.values())
+```
+
+### Cookie Sources
+
+1. **FlareSolverr** - Automated Cloudflare bypass, returns CF cookies
+2. **Upload** - User uploads JSON from browser extension (EditThisCookie, Cookie-Editor)
+3. **Module** - Some modules save cookies during operation
+
+### Cookie File Migration
+
+Existing cookie files to migrate on first run:
+
+| File | Scraper ID |
+|------|------------|
+| `cookies/coppermine_cookies.json` | coppermine |
+| `cookies/imginn_cookies.json` | imginn |
+| `cookies/fastdl_cookies.json` | fastdl |
+| `cookies/snapchat_cookies.json` | snapchat |
+| `cookies/forum_cookies_phun.org.json` | forum_phun |
+| `cookies/forum_cookies_HQCelebCorner.json` | forum_hqcelebcorner |
+| `cookies/forum_cookies_PicturePub.json` | forum_picturepub |
+
+---
+
+## Proxy Configuration
+
+### Supported Proxy Formats
+
+```
+http://host:port
+http://user:pass@host:port
+https://host:port
+https://user:pass@host:port
+socks5://host:port
+socks5://user:pass@host:port
+```
+
+### FlareSolverr Proxy Integration
+
+When a scraper has `proxy_enabled=1`, the proxy is passed to FlareSolverr:
+
+```python
+payload = {
+    "cmd": "request.get",
+    "url": url,
+    "maxTimeout": 120000
+}
+if proxy_url:
+    payload["proxy"] = {"url": proxy_url}
+```
+
+**Important**: Cloudflare cookies are tied to IP address. If FlareSolverr uses a proxy, subsequent requests MUST use the same proxy or cookies will be invalid.
+
+### Per-Module Proxy Usage
+
+| Module | How Proxy is Used |
+|--------|-------------------|
+| coppermine_module | `requests.Session(proxies={...})` |
+| imginn_module | Playwright `proxy` option |
+| fastdl_module | Playwright `proxy` option |
+| toolzu_module | Playwright `proxy` option |
+| snapchat_scraper | Playwright `proxy` option (optional, configured in Scrapers page) |
+| instaloader_module | Instaloader `proxy` parameter |
+| tiktok_module | yt-dlp `--proxy` flag |
+| forum_downloader | Playwright `proxy` option + requests |
+| ytdlp | `--proxy` flag |
+| gallerydl | `--proxy` flag |
+
+---
+
+## API Endpoints
+
+### GET /api/scrapers
+
+List all scrapers with optional type filter.
+
+**Query Parameters:**
+- `type` (optional): Filter by type ('direct', 'proxy', 'forum', 'cli_tool')
+
+**Response:**
+```json
+{
+  "scrapers": [
+    {
+      "id": "imginn",
+      "name": "Imginn",
+      "type": "proxy",
+      "module": "imginn_module",
+      "base_url": "https://imginn.com",
+      "target_platform": "instagram",
+      "enabled": true,
+      "proxy_enabled": false,
+      "proxy_url": null,
+      "flaresolverr_required": true,
+      "cookies_count": 23,
+      "cookies_updated_at": "2025-12-01T10:30:00",
+      "cookies_fresh": true,
+      "last_test_at": "2025-12-01T10:30:00",
+      "last_test_status": "success",
+      "last_test_message": null
+    }
+  ]
+}
+```
+
+### GET /api/scrapers/{id}
+
+Get single scraper configuration.
+
+### PUT /api/scrapers/{id}
+
+Update scraper settings.
+
+**Request Body:**
+```json
+{
+  "enabled": true,
+  "proxy_enabled": true,
+  "proxy_url": "socks5://user:pass@host:port",
+  "base_url": "https://new-domain.com"
+}
+```
+
+### POST /api/scrapers/{id}/test
+
+Test connection via FlareSolverr (if required) and save cookies on success.
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Connection successful, 23 cookies saved",
+  "cookies_count": 23
+}
+```
+
+### POST /api/scrapers/{id}/cookies
+
+Upload cookies from JSON file. Merges with existing cookies.
+
+**Request Body:**
+```json
+{
+  "cookies": [
+    {"name": "session", "value": "abc123", "domain": ".example.com"}
+  ]
+}
+```
+
+**Response:**
+```json
+{
+  "success": true,
+  "message": "Merged 5 cookies (total: 28)",
+  "cookies_count": 28
+}
+```
+
+### DELETE /api/scrapers/{id}/cookies
+
+Clear all cookies for a scraper.
+
+---
+
+## Frontend UI
+
+### Settings > Scrapers Tab
+
+The Scrapers tab displays all scrapers grouped by type/platform:
+
+```
+┌───────────────────────────────────────────────────────────────────────┐
+│ Settings > Scrapers                                                   │
+├───────────────────────────────────────────────────────────────────────┤
+│                                              Filter: [All Types ▼]    │
+│                                                                       │
+│ ─── Instagram Proxies ────────────────────────────────────────────── │
+│                                                                       │
+│ ┌───────────────────────────────────────────────────────────────────┐ │
+│ │ ● Imginn                                              [Enabled ▼] │ │
+│ │   https://imginn.com                                              │ │
+│ │   ☐ Use Proxy  [                                              ]   │ │
+│ │   Cloudflare: Required │ Cookies: ✓ Fresh (2h ago, 23 cookies)    │ │
+│ │   [Test Connection] [Upload Cookies] [Clear Cookies]              │ │
+│ └───────────────────────────────────────────────────────────────────┘ │
+│                                                                       │
+│ ─── Direct ───────────────────────────────────────────────────────── │
+│                                                                       │
+│ ┌───────────────────────────────────────────────────────────────────┐ │
+│ │ ● Instagram (Direct)                                  [Enabled ▼] │ │
+│ │   https://instagram.com                                           │ │
+│ │   ☐ Use Proxy  [                                              ]   │ │
+│ │   Cloudflare: Not Required │ Cookies: ✓ 12 cookies                │ │
+│ │   [Test Connection] [Upload Cookies] [Clear Cookies]              │ │
+│ └───────────────────────────────────────────────────────────────────┘ │
+│                                                                       │
+│ ─── Forums ───────────────────────────────────────────────────────── │
+│                                                                       │
+│ ┌───────────────────────────────────────────────────────────────────┐ │
+│ │ ● Phun.org                                            [Enabled ▼] │ │
+│ │   https://forum.phun.org                                          │ │
+│ │   ☐ Use Proxy  [                                              ]   │ │
+│ │   Cloudflare: Required │ Cookies: ⚠ Expired (3 days)              │ │
+│ │   [Test Connection] [Upload Cookies] [Clear Cookies]              │ │
+│ └───────────────────────────────────────────────────────────────────┘ │
+│                                                                       │
+│ ─── CLI Tools ────────────────────────────────────────────────────── │
+│                                                                       │
+│ ┌───────────────────────────────────────────────────────────────────┐ │
+│ │ ● yt-dlp                                              [Enabled ▼] │ │
+│ │   Generic video downloader                                        │ │
+│ │   ☐ Use Proxy  [                                              ]   │ │
+│ │   [Test Connection] [Upload Cookies]                              │ │
+│ └───────────────────────────────────────────────────────────────────┘ │
+└───────────────────────────────────────────────────────────────────────┘
+```
+
+### Button Visibility
+
+| Button | When Shown |
+|--------|------------|
+| Test Connection | Always |
+| Upload Cookies | Always |
+| Clear Cookies | When cookies exist |
+
+### No Add/Delete Buttons
+
+Scrapers are NOT added or deleted from this UI. They are managed through:
+- Forums settings (for forum scrapers)
+- Platform settings (for other scrapers)
+
+This UI only manages:
+- Enable/disable
+- Proxy configuration
+- Cookie testing/upload/clear
+
+---
+
+## Module Integration
+
+### Common Pattern
+
+All modules follow this pattern to load scraper configuration:
+
+```python
+class SomeModule:
+    def __init__(self, unified_db=None, scraper_id='some_scraper', ...):
+        self.db = unified_db
+        self.scraper_id = scraper_id
+
+        # Load config from DB
+        self.config = self.db.get_scraper(scraper_id) if self.db else {}
+
+        # Check if enabled
+        if not self.config.get('enabled', True):
+            raise ScraperDisabledError(f"{scraper_id} is disabled")
+
+        # Get base URL from DB (not hardcoded)
+        self.base_url = self.config.get('base_url', 'https://default.com')
+
+        # Get proxy config
+        self.proxy_url = None
+        if self.config.get('proxy_enabled') and self.config.get('proxy_url'):
+            self.proxy_url = self.config['proxy_url']
+
+        # Initialize CloudflareHandler with DB storage
+        self.cf_handler = CloudflareHandler(
+            module_name=self.scraper_id,
+            scraper_id=self.scraper_id,
+            unified_db=self.db,
+            proxy_url=self.proxy_url,
+            ...
+        )
+```
+
+### CloudflareHandler Changes
+
+```python
+class CloudflareHandler:
+    def __init__(self,
+                 module_name: str,
+                 scraper_id: str = None,      # For DB cookie storage
+                 unified_db = None,            # DB reference
+                 proxy_url: str = None,        # Proxy support
+                 cookie_file: str = None,      # DEPRECATED: backwards compat
+                 ...):
+        self.scraper_id = scraper_id
+        self.db = unified_db
+        self.proxy_url = proxy_url
+
+    def get_cookies_via_flaresolverr(self, url: str, max_retries: int = 2) -> bool:
+        payload = {
+            "cmd": "request.get",
+            "url": url,
+            "maxTimeout": 120000
+        }
+        # Add proxy if configured
+        if self.proxy_url:
+            payload["proxy"] = {"url": self.proxy_url}
+
+        # ... rest of implementation
+
+        # On success, merge cookies (don't replace)
+        if success:
+            existing = self.load_cookies_from_db()
+            merged = self.merge_cookies(existing, new_cookies)
+            self.save_cookies_to_db(merged)
+
+    def load_cookies_from_db(self) -> list:
+        if self.db and self.scraper_id:
+            config = self.db.get_scraper(self.scraper_id)
+            if config and config.get('cookies_json'):
+                data = json.loads(config['cookies_json'])
+                return data.get('cookies', [])
+        return []
+
+    def save_cookies_to_db(self, cookies: list, user_agent: str = None):
+        if self.db and self.scraper_id:
+            data = {
+                'cookies': cookies,
+                'user_agent': user_agent
+            }
+            self.db.update_scraper_cookies(self.scraper_id, json.dumps(data))
+
+    def merge_cookies(self, existing: list, new: list) -> list:
+        cookie_map = {c['name']: c for c in existing}
+        for cookie in new:
+            cookie_map[cookie['name']] = cookie
+        return list(cookie_map.values())
+```
+
+---
+
+## Scheduler Integration
+
+The scheduler uses the scrapers table to determine what to run:
+
+```python
+def run_scheduled_downloads(self):
+    # Get all enabled scrapers
+    scrapers = self.db.get_all_scrapers()
+    enabled_scrapers = [s for s in scrapers if s['enabled']]
+
+    for scraper in enabled_scrapers:
+        if scraper['type'] == 'forum':
+            self.run_forum_download(scraper['id'])
+        elif scraper['id'] == 'coppermine':
+            self.run_coppermine_download()
+        elif scraper['id'] == 'instagram':
+            self.run_instagram_download()
+        elif scraper['id'] == 'tiktok':
+            self.run_tiktok_download()
+        # etc.
+```
+
+---
+
+## Migration Plan
+
+### Step 1: Create Table
+
+Add to `unified_database.py`:
+
+```python
+def _create_scrapers_table(self):
+    self.cursor.execute('''
+        CREATE TABLE IF NOT EXISTS scrapers (
+            id TEXT PRIMARY KEY,
+            name TEXT NOT NULL,
+            type TEXT NOT NULL,
+            module TEXT,
+            base_url TEXT,
+            target_platform TEXT,
+            enabled INTEGER DEFAULT 1,
+            proxy_enabled INTEGER DEFAULT 0,
+            proxy_url TEXT,
+            flaresolverr_required INTEGER DEFAULT 0,
+            cookies_json TEXT,
+            cookies_updated_at TEXT,
+            last_test_at TEXT,
+            last_test_status TEXT,
+            last_test_message TEXT,
+            settings_json TEXT,
+            created_at TEXT DEFAULT CURRENT_TIMESTAMP,
+            updated_at TEXT DEFAULT CURRENT_TIMESTAMP
+        )
+    ''')
+```
+
+### Step 2: Seed Initial Data
+
+```python
+def _seed_scrapers(self):
+    scrapers = [
+        ('imginn', 'Imginn', 'proxy', 'imginn_module', 'https://imginn.com', 'instagram', 1),
+        ('fastdl', 'FastDL', 'proxy', 'fastdl_module', 'https://fastdl.app', 'instagram', 1),
+        ('toolzu', 'Toolzu', 'proxy', 'toolzu_module', 'https://toolzu.com', 'instagram', 1),
+        ('snapchat', 'Snapchat Direct', 'direct', 'snapchat_scraper', 'https://snapchat.com', 'snapchat', 0),
+        ('instagram', 'Instagram (Direct)', 'direct', 'instaloader_module', 'https://instagram.com', 'instagram', 0),
+        ('tiktok', 'TikTok', 'direct', 'tiktok_module', 'https://tiktok.com', 'tiktok', 0),
+        ('coppermine', 'Coppermine', 'direct', 'coppermine_module', 'https://hqdiesel.net', None, 1),
+        ('forum_phun', 'Phun.org', 'forum', 'forum_downloader', 'https://forum.phun.org', None, 1),
+        ('forum_hqcelebcorner', 'HQCelebCorner', 'forum', 'forum_downloader', 'https://hqcelebcorner.com', None, 0),
+        ('forum_picturepub', 'PicturePub', 'forum', 'forum_downloader', 'https://picturepub.net', None, 0),
+        ('ytdlp', 'yt-dlp', 'cli_tool', None, None, None, 0),
+        ('gallerydl', 'gallery-dl', 'cli_tool', None, None, None, 0),
+    ]
+
+    for s in scrapers:
+        self.cursor.execute('''
+            INSERT OR IGNORE INTO scrapers
+            (id, name, type, module, base_url, target_platform, flaresolverr_required)
+            VALUES (?, ?, ?, ?, ?, ?, ?)
+        ''', s)
+```
+
+### Step 3: Migrate Cookies
+
+```python
+def _migrate_cookies_to_db(self):
+    cookie_files = {
+        'coppermine': '/opt/media-downloader/cookies/coppermine_cookies.json',
+        'imginn': '/opt/media-downloader/cookies/imginn_cookies.json',
+        'fastdl': '/opt/media-downloader/cookies/fastdl_cookies.json',
+        'snapchat': '/opt/media-downloader/cookies/snapchat_cookies.json',
+        'forum_phun': '/opt/media-downloader/cookies/forum_cookies_phun.org.json',
+        'forum_hqcelebcorner': '/opt/media-downloader/cookies/forum_cookies_HQCelebCorner.json',
+        'forum_picturepub': '/opt/media-downloader/cookies/forum_cookies_PicturePub.json',
+    }
+
+    for scraper_id, cookie_file in cookie_files.items():
+        if os.path.exists(cookie_file):
+            try:
+                with open(cookie_file, 'r') as f:
+                    data = json.load(f)
+
+                # Store in DB
+                self.cursor.execute('''
+                    UPDATE scrapers
+                    SET cookies_json = ?, cookies_updated_at = ?
+                    WHERE id = ?
+                ''', (json.dumps(data), datetime.now().isoformat(), scraper_id))
+
+                self.logger.info(f"Migrated cookies for {scraper_id}")
+            except Exception as e:
+                self.logger.error(f"Failed to migrate cookies for {scraper_id}: {e}")
+```
+
+### Step 4: Migrate Snapchat proxy_domain
+
+```python
+def _migrate_snapchat_proxy_domain(self):
+    # Get current proxy_domain from settings
+    settings = self.get_setting('snapchat')
+    if settings and 'proxy_domain' in settings:
+        proxy_domain = settings['proxy_domain']
+        base_url = f"https://{proxy_domain}"
+
+        self.cursor.execute('''
+            UPDATE scrapers SET base_url = ? WHERE id = 'snapchat'
+        ''', (base_url,))
+
+        # Remove from settings (now in scrapers table)
+        del settings['proxy_domain']
+        self.save_setting('snapchat', settings)
+```
+
+---
+
+## Implementation Order
+
+| Step | Task | Files to Modify |
+|------|------|-----------------|
+| 1 | Database schema + migration | `unified_database.py` |
+| 2 | Backend API endpoints | `api.py` |
+| 3 | CloudflareHandler proxy + DB storage + merge logic | `cloudflare_handler.py` |
+| 4 | Frontend Scrapers tab | `ScrapersTab.tsx`, `Settings.tsx`, `api.ts` |
+| 5 | Update coppermine_module (test case) | `coppermine_module.py` |
+| 6 | Test end-to-end | - |
+| 7 | Update remaining modules | `imginn_module.py`, `fastdl_module.py`, `toolzu_module.py`, `snapchat_scraper.py`, `instaloader_module.py`, `tiktok_module.py`, `forum_downloader.py` |
+| 8 | Update scheduler | `scheduler.py` |
+| 9 | Cookie file cleanup | Remove old cookie files after verification |
+
+---
+
+## Testing Checklist
+
+### Database
+- [ ] Table created on first run
+- [ ] Seed data populated correctly
+- [ ] Cookies migrated from files
+- [ ] Snapchat proxy_domain migrated
+
+### API
+- [ ] GET /api/scrapers returns all scrapers
+- [ ] GET /api/scrapers?type=forum filters correctly
+- [ ] PUT /api/scrapers/{id} updates settings
+- [ ] POST /api/scrapers/{id}/test works with FlareSolverr
+- [ ] POST /api/scrapers/{id}/test works with proxy
+- [ ] POST /api/scrapers/{id}/cookies merges correctly
+- [ ] DELETE /api/scrapers/{id}/cookies clears cookies
+
+### Frontend
+- [ ] Scrapers tab displays all scrapers
+- [ ] Grouping by type works
+- [ ] Filter dropdown works
+- [ ] Enable/disable toggle works
+- [ ] Proxy checkbox and URL input work
+- [ ] Test Connection button works
+- [ ] Upload Cookies button works
+- [ ] Clear Cookies button works
+- [ ] Cookie status shows correctly (fresh/expired/none)
+
+### Modules
+- [ ] coppermine_module loads config from DB
+- [ ] coppermine_module uses proxy when configured
+- [ ] coppermine_module uses cookies from DB
+- [ ] All other modules updated and working
+
+### Scheduler
+- [ ] Only runs enabled scrapers
+- [ ] Passes correct scraper_id to modules
+
+---
+
+## Rollback Plan
+
+If issues occur:
+
+1. **Database**: The old cookie files are preserved as backups
+2. **Modules**: Can fall back to reading cookie files if DB fails
+3. **API**: Add backwards compatibility for old endpoints if needed
+
+---
+
+## Future Enhancements
+
+Potential additions not in initial scope:
+
+1. **Rotating proxies** - Support proxy pools with rotation
+2. **Proxy health monitoring** - Track proxy success/failure rates
+3. **Auto-refresh cookies** - Background job to refresh expiring cookies
+4. **Cookie export** - Download cookies as JSON for backup
+5. **Scraper metrics** - Track download success rates per scraper