Initial commit

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Todd
2026-03-29 22:42:55 -04:00
commit 0d7b2b1aab
389 changed files with 280296 additions and 0 deletions

View File

@@ -0,0 +1,760 @@
# Scraper Proxy Configuration System
## Overview
This document describes the design and implementation plan for a centralized scraper configuration system that provides:
1. **Per-scraper proxy settings** - Configure different proxies for different scrapers
2. **Centralized cookie management** - Store cookies in database instead of files
3. **FlareSolverr integration** - Test connections and refresh Cloudflare cookies
4. **Cookie upload support** - Upload cookies from browser extensions for authenticated access
5. **Unified Settings UI** - Single place to manage all scraper configurations
## Background
### Problem Statement
- Proxy settings are not configurable per-module
- Cookies are stored in scattered JSON files
- No UI to test FlareSolverr connections or manage cookies
- Adding new forums requires code changes
- No visibility into cookie freshness or scraper health
### Solution
A new `scrapers` database table that:
- Stores configuration for all automated scrapers
- Provides proxy settings per-scraper
- Centralizes cookie storage with merge logic
- Syncs automatically with platform configurations
- Exposes management via Settings UI
---
## Database Schema
### Table: `scrapers`
```sql
CREATE TABLE scrapers (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
type TEXT NOT NULL, -- 'direct', 'proxy', 'forum', 'cli_tool'
module TEXT, -- Python module name, NULL for cli_tool
base_url TEXT, -- Primary URL for the scraper
target_platform TEXT, -- 'instagram', 'snapchat', 'tiktok', NULL for forums/cli
enabled INTEGER DEFAULT 1, -- Enable/disable scraper
-- Proxy settings
proxy_enabled INTEGER DEFAULT 0,
proxy_url TEXT, -- e.g., "socks5://user:pass@host:port"
-- Cloudflare/Cookie settings
flaresolverr_required INTEGER DEFAULT 0,
cookies_json TEXT, -- JSON blob of cookies
cookies_updated_at TEXT, -- ISO timestamp of last cookie update
-- Test status
last_test_at TEXT, -- ISO timestamp of last test
last_test_status TEXT, -- 'success', 'failed', 'timeout'
last_test_message TEXT, -- Error message if failed
-- Module-specific settings
settings_json TEXT, -- Additional JSON settings per-scraper
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
);
```
### Column Definitions
| Column | Type | Description |
|--------|------|-------------|
| `id` | TEXT | Unique identifier (e.g., 'imginn', 'forum_phun') |
| `name` | TEXT | Display name shown in UI |
| `type` | TEXT | One of: 'direct', 'proxy', 'forum', 'cli_tool' |
| `module` | TEXT | Python module name (e.g., 'imginn_module'), NULL for CLI tools |
| `base_url` | TEXT | Primary URL for the service |
| `target_platform` | TEXT | What platform this scraper downloads from (instagram, snapchat, tiktok, NULL) |
| `enabled` | INTEGER | 1=enabled, 0=disabled |
| `proxy_enabled` | INTEGER | 1=use proxy, 0=direct connection |
| `proxy_url` | TEXT | Proxy URL (http, https, socks5 supported) |
| `flaresolverr_required` | INTEGER | 1=needs FlareSolverr for Cloudflare bypass |
| `cookies_json` | TEXT | JSON array of cookie objects |
| `cookies_updated_at` | TEXT | When cookies were last updated |
| `last_test_at` | TEXT | When connection was last tested |
| `last_test_status` | TEXT | Result of last test: 'success', 'failed', 'timeout' |
| `last_test_message` | TEXT | Error message from last failed test |
| `settings_json` | TEXT | Module-specific settings as JSON |
### Scraper Types
| Type | Description | Examples |
|------|-------------|----------|
| `direct` | Downloads directly from the platform | instagram, tiktok, snapchat, coppermine |
| `proxy` | Uses a proxy service to download | imginn, fastdl, toolzu |
| `forum` | Forum scraper | forum_phun, forum_hqcelebcorner, forum_picturepub |
| `cli_tool` | Command-line tool wrapper | ytdlp, gallerydl |
### Target Platforms
The `target_platform` field indicates what platform the scraper actually downloads content from:
| Scraper | Target Platform | Notes |
|---------|-----------------|-------|
| imginn | instagram | Proxy service for Instagram |
| fastdl | instagram | Proxy service for Instagram |
| toolzu | instagram | Proxy service for Instagram |
| snapchat | snapchat | Direct via Playwright scraper |
| instagram | instagram | Direct via Instaloader |
| tiktok | tiktok | Direct via yt-dlp internally |
| coppermine | NULL | Not a social platform |
| forum_* | NULL | Not a social platform |
| ytdlp | NULL | Generic tool, multiple platforms |
| gallerydl | NULL | Generic tool, multiple platforms |
---
## Seed Data
Initial scrapers to populate on first run:
| id | name | type | module | base_url | target_platform | flaresolverr_required |
|----|------|------|--------|----------|-----------------|----------------------|
| imginn | Imginn | proxy | imginn_module | https://imginn.com | instagram | 1 |
| fastdl | FastDL | proxy | fastdl_module | https://fastdl.app | instagram | 1 |
| toolzu | Toolzu | proxy | toolzu_module | https://toolzu.com | instagram | 1 |
| snapchat | Snapchat Direct | direct | snapchat_scraper | https://snapchat.com | snapchat | 0 |
| instagram | Instagram (Direct) | direct | instaloader_module | https://instagram.com | instagram | 0 |
| tiktok | TikTok | direct | tiktok_module | https://tiktok.com | tiktok | 0 |
| coppermine | Coppermine | direct | coppermine_module | https://hqdiesel.net | NULL | 1 |
| forum_phun | Phun.org | forum | forum_downloader | https://forum.phun.org | NULL | 1 |
| forum_hqcelebcorner | HQCelebCorner | forum | forum_downloader | https://hqcelebcorner.com | NULL | 0 |
| forum_picturepub | PicturePub | forum | forum_downloader | https://picturepub.net | NULL | 0 |
| ytdlp | yt-dlp | cli_tool | NULL | NULL | NULL | 0 |
| gallerydl | gallery-dl | cli_tool | NULL | NULL | NULL | 0 |
### Notes on Seed Data
1. **Snapchat**: Uses direct Playwright-based scraper with optional proxy support (configured per-scraper in Scrapers settings page)
2. **Forums**: Derived from existing `forum_threads` table entries and cookie files
3. **Excluded scrapers**: YouTube and Bilibili are NOT included - they are on-demand downloaders from the Video Downloader page, not scheduled scrapers
---
## Auto-Sync Logic
The scrapers table stays in sync with platform configurations automatically:
### When Forums Change
- New forum added in Forums settings → Create scraper entry with `type='forum'`
- Forum removed from settings → Remove scraper entry
### When Modules Are Enabled/Disabled
- Module enabled → Ensure scraper entry exists
- Module disabled → Scraper entry remains but `enabled=0`
### No Manual Add/Delete
- The Scrapers UI does NOT have Add or Delete buttons
- Scrapers are managed through their respective platform configuration pages
- Scrapers UI only manages: proxy settings, testing, cookies
---
## Cookie Management
### Storage Format
Cookies are stored as JSON in the `cookies_json` column:
```json
{
"cookies": [
{
"name": "cf_clearance",
"value": "abc123...",
"domain": ".imginn.com",
"path": "/",
"expiry": 1735689600
},
{
"name": "session_id",
"value": "xyz789...",
"domain": "imginn.com",
"path": "/",
"expiry": -1
}
],
"user_agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36..."
}
```
### Cookie Merge Logic
**CRITICAL**: When updating cookies, MERGE with existing - never wipe:
```python
def merge_cookies(existing_cookies: list, new_cookies: list) -> list:
"""
Merge new cookies into existing, preserving non-updated cookies.
This ensures:
- Cloudflare cookies (cf_clearance, __cf_bm) get refreshed
- Site session/auth cookies are preserved
- No data loss on test/refresh
"""
# Index existing by name
cookie_map = {c['name']: c for c in existing_cookies}
# Update/add from new cookies
for cookie in new_cookies:
cookie_map[cookie['name']] = cookie
return list(cookie_map.values())
```
### Cookie Sources
1. **FlareSolverr** - Automated Cloudflare bypass, returns CF cookies
2. **Upload** - User uploads JSON from browser extension (EditThisCookie, Cookie-Editor)
3. **Module** - Some modules save cookies during operation
### Cookie File Migration
Existing cookie files to migrate on first run:
| File | Scraper ID |
|------|------------|
| `cookies/coppermine_cookies.json` | coppermine |
| `cookies/imginn_cookies.json` | imginn |
| `cookies/fastdl_cookies.json` | fastdl |
| `cookies/snapchat_cookies.json` | snapchat |
| `cookies/forum_cookies_phun.org.json` | forum_phun |
| `cookies/forum_cookies_HQCelebCorner.json` | forum_hqcelebcorner |
| `cookies/forum_cookies_PicturePub.json` | forum_picturepub |
---
## Proxy Configuration
### Supported Proxy Formats
```
http://host:port
http://user:pass@host:port
https://host:port
https://user:pass@host:port
socks5://host:port
socks5://user:pass@host:port
```
### FlareSolverr Proxy Integration
When a scraper has `proxy_enabled=1`, the proxy is passed to FlareSolverr:
```python
payload = {
"cmd": "request.get",
"url": url,
"maxTimeout": 120000
}
if proxy_url:
payload["proxy"] = {"url": proxy_url}
```
**Important**: Cloudflare cookies are tied to IP address. If FlareSolverr uses a proxy, subsequent requests MUST use the same proxy or cookies will be invalid.
### Per-Module Proxy Usage
| Module | How Proxy is Used |
|--------|-------------------|
| coppermine_module | `requests.Session(proxies={...})` |
| imginn_module | Playwright `proxy` option |
| fastdl_module | Playwright `proxy` option |
| toolzu_module | Playwright `proxy` option |
| snapchat_scraper | Playwright `proxy` option (optional, configured in Scrapers page) |
| instaloader_module | Instaloader `proxy` parameter |
| tiktok_module | yt-dlp `--proxy` flag |
| forum_downloader | Playwright `proxy` option + requests |
| ytdlp | `--proxy` flag |
| gallerydl | `--proxy` flag |
---
## API Endpoints
### GET /api/scrapers
List all scrapers with optional type filter.
**Query Parameters:**
- `type` (optional): Filter by type ('direct', 'proxy', 'forum', 'cli_tool')
**Response:**
```json
{
"scrapers": [
{
"id": "imginn",
"name": "Imginn",
"type": "proxy",
"module": "imginn_module",
"base_url": "https://imginn.com",
"target_platform": "instagram",
"enabled": true,
"proxy_enabled": false,
"proxy_url": null,
"flaresolverr_required": true,
"cookies_count": 23,
"cookies_updated_at": "2025-12-01T10:30:00",
"cookies_fresh": true,
"last_test_at": "2025-12-01T10:30:00",
"last_test_status": "success",
"last_test_message": null
}
]
}
```
### GET /api/scrapers/{id}
Get single scraper configuration.
### PUT /api/scrapers/{id}
Update scraper settings.
**Request Body:**
```json
{
"enabled": true,
"proxy_enabled": true,
"proxy_url": "socks5://user:pass@host:port",
"base_url": "https://new-domain.com"
}
```
### POST /api/scrapers/{id}/test
Test connection via FlareSolverr (if required) and save cookies on success.
**Response:**
```json
{
"success": true,
"message": "Connection successful, 23 cookies saved",
"cookies_count": 23
}
```
### POST /api/scrapers/{id}/cookies
Upload cookies from JSON file. Merges with existing cookies.
**Request Body:**
```json
{
"cookies": [
{"name": "session", "value": "abc123", "domain": ".example.com"}
]
}
```
**Response:**
```json
{
"success": true,
"message": "Merged 5 cookies (total: 28)",
"cookies_count": 28
}
```
### DELETE /api/scrapers/{id}/cookies
Clear all cookies for a scraper.
---
## Frontend UI
### Settings > Scrapers Tab
The Scrapers tab displays all scrapers grouped by type/platform:
```
┌───────────────────────────────────────────────────────────────────────┐
│ Settings > Scrapers │
├───────────────────────────────────────────────────────────────────────┤
│ Filter: [All Types ▼] │
│ │
│ ─── Instagram Proxies ────────────────────────────────────────────── │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ ● Imginn [Enabled ▼] │ │
│ │ https://imginn.com │ │
│ │ ☐ Use Proxy [ ] │ │
│ │ Cloudflare: Required │ Cookies: ✓ Fresh (2h ago, 23 cookies) │ │
│ │ [Test Connection] [Upload Cookies] [Clear Cookies] │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ─── Direct ───────────────────────────────────────────────────────── │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ ● Instagram (Direct) [Enabled ▼] │ │
│ │ https://instagram.com │ │
│ │ ☐ Use Proxy [ ] │ │
│ │ Cloudflare: Not Required │ Cookies: ✓ 12 cookies │ │
│ │ [Test Connection] [Upload Cookies] [Clear Cookies] │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ─── Forums ───────────────────────────────────────────────────────── │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ ● Phun.org [Enabled ▼] │ │
│ │ https://forum.phun.org │ │
│ │ ☐ Use Proxy [ ] │ │
│ │ Cloudflare: Required │ Cookies: ⚠ Expired (3 days) │ │
│ │ [Test Connection] [Upload Cookies] [Clear Cookies] │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ─── CLI Tools ────────────────────────────────────────────────────── │
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ ● yt-dlp [Enabled ▼] │ │
│ │ Generic video downloader │ │
│ │ ☐ Use Proxy [ ] │ │
│ │ [Test Connection] [Upload Cookies] │ │
│ └───────────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────────┘
```
### Button Visibility
| Button | When Shown |
|--------|------------|
| Test Connection | Always |
| Upload Cookies | Always |
| Clear Cookies | When cookies exist |
### No Add/Delete Buttons
Scrapers are NOT added or deleted from this UI. They are managed through:
- Forums settings (for forum scrapers)
- Platform settings (for other scrapers)
This UI only manages:
- Enable/disable
- Proxy configuration
- Cookie testing/upload/clear
---
## Module Integration
### Common Pattern
All modules follow this pattern to load scraper configuration:
```python
class SomeModule:
def __init__(self, unified_db=None, scraper_id='some_scraper', ...):
self.db = unified_db
self.scraper_id = scraper_id
# Load config from DB
self.config = self.db.get_scraper(scraper_id) if self.db else {}
# Check if enabled
if not self.config.get('enabled', True):
raise ScraperDisabledError(f"{scraper_id} is disabled")
# Get base URL from DB (not hardcoded)
self.base_url = self.config.get('base_url', 'https://default.com')
# Get proxy config
self.proxy_url = None
if self.config.get('proxy_enabled') and self.config.get('proxy_url'):
self.proxy_url = self.config['proxy_url']
# Initialize CloudflareHandler with DB storage
self.cf_handler = CloudflareHandler(
module_name=self.scraper_id,
scraper_id=self.scraper_id,
unified_db=self.db,
proxy_url=self.proxy_url,
...
)
```
### CloudflareHandler Changes
```python
class CloudflareHandler:
def __init__(self,
module_name: str,
scraper_id: str = None, # For DB cookie storage
unified_db = None, # DB reference
proxy_url: str = None, # Proxy support
cookie_file: str = None, # DEPRECATED: backwards compat
...):
self.scraper_id = scraper_id
self.db = unified_db
self.proxy_url = proxy_url
def get_cookies_via_flaresolverr(self, url: str, max_retries: int = 2) -> bool:
payload = {
"cmd": "request.get",
"url": url,
"maxTimeout": 120000
}
# Add proxy if configured
if self.proxy_url:
payload["proxy"] = {"url": self.proxy_url}
# ... rest of implementation
# On success, merge cookies (don't replace)
if success:
existing = self.load_cookies_from_db()
merged = self.merge_cookies(existing, new_cookies)
self.save_cookies_to_db(merged)
def load_cookies_from_db(self) -> list:
if self.db and self.scraper_id:
config = self.db.get_scraper(self.scraper_id)
if config and config.get('cookies_json'):
data = json.loads(config['cookies_json'])
return data.get('cookies', [])
return []
def save_cookies_to_db(self, cookies: list, user_agent: str = None):
if self.db and self.scraper_id:
data = {
'cookies': cookies,
'user_agent': user_agent
}
self.db.update_scraper_cookies(self.scraper_id, json.dumps(data))
def merge_cookies(self, existing: list, new: list) -> list:
cookie_map = {c['name']: c for c in existing}
for cookie in new:
cookie_map[cookie['name']] = cookie
return list(cookie_map.values())
```
---
## Scheduler Integration
The scheduler uses the scrapers table to determine what to run:
```python
def run_scheduled_downloads(self):
# Get all enabled scrapers
scrapers = self.db.get_all_scrapers()
enabled_scrapers = [s for s in scrapers if s['enabled']]
for scraper in enabled_scrapers:
if scraper['type'] == 'forum':
self.run_forum_download(scraper['id'])
elif scraper['id'] == 'coppermine':
self.run_coppermine_download()
elif scraper['id'] == 'instagram':
self.run_instagram_download()
elif scraper['id'] == 'tiktok':
self.run_tiktok_download()
# etc.
```
---
## Migration Plan
### Step 1: Create Table
Add to `unified_database.py`:
```python
def _create_scrapers_table(self):
self.cursor.execute('''
CREATE TABLE IF NOT EXISTS scrapers (
id TEXT PRIMARY KEY,
name TEXT NOT NULL,
type TEXT NOT NULL,
module TEXT,
base_url TEXT,
target_platform TEXT,
enabled INTEGER DEFAULT 1,
proxy_enabled INTEGER DEFAULT 0,
proxy_url TEXT,
flaresolverr_required INTEGER DEFAULT 0,
cookies_json TEXT,
cookies_updated_at TEXT,
last_test_at TEXT,
last_test_status TEXT,
last_test_message TEXT,
settings_json TEXT,
created_at TEXT DEFAULT CURRENT_TIMESTAMP,
updated_at TEXT DEFAULT CURRENT_TIMESTAMP
)
''')
```
### Step 2: Seed Initial Data
```python
def _seed_scrapers(self):
scrapers = [
('imginn', 'Imginn', 'proxy', 'imginn_module', 'https://imginn.com', 'instagram', 1),
('fastdl', 'FastDL', 'proxy', 'fastdl_module', 'https://fastdl.app', 'instagram', 1),
('toolzu', 'Toolzu', 'proxy', 'toolzu_module', 'https://toolzu.com', 'instagram', 1),
('snapchat', 'Snapchat Direct', 'direct', 'snapchat_scraper', 'https://snapchat.com', 'snapchat', 0),
('instagram', 'Instagram (Direct)', 'direct', 'instaloader_module', 'https://instagram.com', 'instagram', 0),
('tiktok', 'TikTok', 'direct', 'tiktok_module', 'https://tiktok.com', 'tiktok', 0),
('coppermine', 'Coppermine', 'direct', 'coppermine_module', 'https://hqdiesel.net', None, 1),
('forum_phun', 'Phun.org', 'forum', 'forum_downloader', 'https://forum.phun.org', None, 1),
('forum_hqcelebcorner', 'HQCelebCorner', 'forum', 'forum_downloader', 'https://hqcelebcorner.com', None, 0),
('forum_picturepub', 'PicturePub', 'forum', 'forum_downloader', 'https://picturepub.net', None, 0),
('ytdlp', 'yt-dlp', 'cli_tool', None, None, None, 0),
('gallerydl', 'gallery-dl', 'cli_tool', None, None, None, 0),
]
for s in scrapers:
self.cursor.execute('''
INSERT OR IGNORE INTO scrapers
(id, name, type, module, base_url, target_platform, flaresolverr_required)
VALUES (?, ?, ?, ?, ?, ?, ?)
''', s)
```
### Step 3: Migrate Cookies
```python
def _migrate_cookies_to_db(self):
cookie_files = {
'coppermine': '/opt/media-downloader/cookies/coppermine_cookies.json',
'imginn': '/opt/media-downloader/cookies/imginn_cookies.json',
'fastdl': '/opt/media-downloader/cookies/fastdl_cookies.json',
'snapchat': '/opt/media-downloader/cookies/snapchat_cookies.json',
'forum_phun': '/opt/media-downloader/cookies/forum_cookies_phun.org.json',
'forum_hqcelebcorner': '/opt/media-downloader/cookies/forum_cookies_HQCelebCorner.json',
'forum_picturepub': '/opt/media-downloader/cookies/forum_cookies_PicturePub.json',
}
for scraper_id, cookie_file in cookie_files.items():
if os.path.exists(cookie_file):
try:
with open(cookie_file, 'r') as f:
data = json.load(f)
# Store in DB
self.cursor.execute('''
UPDATE scrapers
SET cookies_json = ?, cookies_updated_at = ?
WHERE id = ?
''', (json.dumps(data), datetime.now().isoformat(), scraper_id))
self.logger.info(f"Migrated cookies for {scraper_id}")
except Exception as e:
self.logger.error(f"Failed to migrate cookies for {scraper_id}: {e}")
```
### Step 4: Migrate Snapchat proxy_domain
```python
def _migrate_snapchat_proxy_domain(self):
# Get current proxy_domain from settings
settings = self.get_setting('snapchat')
if settings and 'proxy_domain' in settings:
proxy_domain = settings['proxy_domain']
base_url = f"https://{proxy_domain}"
self.cursor.execute('''
UPDATE scrapers SET base_url = ? WHERE id = 'snapchat'
''', (base_url,))
# Remove from settings (now in scrapers table)
del settings['proxy_domain']
self.save_setting('snapchat', settings)
```
---
## Implementation Order
| Step | Task | Files to Modify |
|------|------|-----------------|
| 1 | Database schema + migration | `unified_database.py` |
| 2 | Backend API endpoints | `api.py` |
| 3 | CloudflareHandler proxy + DB storage + merge logic | `cloudflare_handler.py` |
| 4 | Frontend Scrapers tab | `ScrapersTab.tsx`, `Settings.tsx`, `api.ts` |
| 5 | Update coppermine_module (test case) | `coppermine_module.py` |
| 6 | Test end-to-end | - |
| 7 | Update remaining modules | `imginn_module.py`, `fastdl_module.py`, `toolzu_module.py`, `snapchat_scraper.py`, `instaloader_module.py`, `tiktok_module.py`, `forum_downloader.py` |
| 8 | Update scheduler | `scheduler.py` |
| 9 | Cookie file cleanup | Remove old cookie files after verification |
---
## Testing Checklist
### Database
- [ ] Table created on first run
- [ ] Seed data populated correctly
- [ ] Cookies migrated from files
- [ ] Snapchat proxy_domain migrated
### API
- [ ] GET /api/scrapers returns all scrapers
- [ ] GET /api/scrapers?type=forum filters correctly
- [ ] PUT /api/scrapers/{id} updates settings
- [ ] POST /api/scrapers/{id}/test works with FlareSolverr
- [ ] POST /api/scrapers/{id}/test works with proxy
- [ ] POST /api/scrapers/{id}/cookies merges correctly
- [ ] DELETE /api/scrapers/{id}/cookies clears cookies
### Frontend
- [ ] Scrapers tab displays all scrapers
- [ ] Grouping by type works
- [ ] Filter dropdown works
- [ ] Enable/disable toggle works
- [ ] Proxy checkbox and URL input work
- [ ] Test Connection button works
- [ ] Upload Cookies button works
- [ ] Clear Cookies button works
- [ ] Cookie status shows correctly (fresh/expired/none)
### Modules
- [ ] coppermine_module loads config from DB
- [ ] coppermine_module uses proxy when configured
- [ ] coppermine_module uses cookies from DB
- [ ] All other modules updated and working
### Scheduler
- [ ] Only runs enabled scrapers
- [ ] Passes correct scraper_id to modules
---
## Rollback Plan
If issues occur:
1. **Database**: The old cookie files are preserved as backups
2. **Modules**: Can fall back to reading cookie files if DB fails
3. **API**: Add backwards compatibility for old endpoints if needed
---
## Future Enhancements
Potential additions not in initial scope:
1. **Rotating proxies** - Support proxy pools with rotation
2. **Proxy health monitoring** - Track proxy success/failure rates
3. **Auto-refresh cookies** - Background job to refresh expiring cookies
4. **Cookie export** - Download cookies as JSON for backup
5. **Scraper metrics** - Track download success rates per scraper