9.6 KiB
Universal Cloudflare Handler
Version: 12.0.1
Module: modules/cloudflare_handler.py
Status: Production
Overview
The Universal Cloudflare Handler provides centralized Cloudflare bypass, error detection, cookie management, and dynamic browser fingerprinting for all download modules in the media-downloader system.
Features
1. Site Status Detection
Before attempting downloads, the handler checks if the target site is accessible:
- WORKING - Site is accessible and responding normally
- SERVER_ERROR - HTTP 500, 502, 503, 504 errors (site is down)
- CLOUDFLARE_CHALLENGE - Cloudflare challenge page detected
- FORBIDDEN - HTTP 403 access denied
- TIMEOUT - Request timed out
- UNKNOWN_ERROR - Other errors
2. Smart Skip Logic
Downloads are automatically skipped when:
- Site returns server errors (500, 502, 503, 504)
- Request times out
- Unknown errors occur
This prevents wasting time and resources on unavailable sites.
3. FlareSolverr Integration
- Automatic Cloudflare bypass using FlareSolverr
- Configurable retry logic (default: 2 attempts)
- 120-second timeout for difficult challenges
- Detects cf_clearance cookie presence
4. Cookie Management
For Playwright (Browser Automation)
# Load cookies into browser context
cf_handler.load_cookies_to_playwright(context)
# Save cookies from browser
cf_handler.save_cookies_from_playwright(context)
# Get cookies as list
cookies = cf_handler.get_cookies_list()
For Requests (HTTP Library)
# Load cookies into session
cf_handler.load_cookies_to_requests(session)
# Get cookies as dictionary
cookies = cf_handler.get_cookies_dict()
5. Cookie Expiration Strategies
Aggressive Mode (Default)
- Cookies expire if older than 12 hours
- Cookies expire if any cookie will expire within 7 days
- Used by: imginn, fastdl, toolzu, snapchat
Conservative Mode
- Only expires if cf_clearance cookie is actually expired
- Minimizes FlareSolverr calls
- Used by: coppermine
6. Dynamic Browser Fingerprinting (v12.0.1)
Critical for cf_clearance cookies to work!
The cf_clearance cookie is tied to the browser fingerprint (User-Agent, headers, etc.). If Playwright uses a different fingerprint than FlareSolverr, the cookies will be rejected.
Key Functions
from modules.cloudflare_handler import (
get_flaresolverr_fingerprint,
get_playwright_context_options,
get_playwright_stealth_scripts,
set_fingerprint_database
)
# Initialize database persistence (call once at startup)
set_fingerprint_database(unified_db)
# Get complete fingerprint (instant from cache/database)
fingerprint = get_flaresolverr_fingerprint()
# Returns: user_agent, sec_ch_ua, locale, timezone, viewport, etc.
# Get ready-to-use Playwright context options
context_options = get_playwright_context_options()
context = browser.new_context(**context_options)
# Add anti-detection scripts
page.add_init_script(get_playwright_stealth_scripts())
Fingerprint Persistence
Fingerprints are cached in three layers:
- Memory cache - Instant access during session
- Database - Persists across restarts (key_value_store table)
- FlareSolverr fetch - Fallback if no cache available
Important: Save Cookies with user_agent
When saving cookies to the database, always include the user_agent:
# CORRECT - includes user_agent
self.unified_db.save_scraper_cookies(
self.scraper_id,
cookies,
user_agent=self.user_agent, # REQUIRED for cf_clearance!
merge=True
)
# WRONG - missing user_agent (cookies won't work)
self.unified_db.save_scraper_cookies(self.scraper_id, cookies)
Usage
Basic Initialization
from modules.cloudflare_handler import CloudflareHandler, SiteStatus
handler = CloudflareHandler(
module_name="MyModule",
cookie_file="/path/to/cookies.json",
user_agent="Mozilla/5.0...",
logger=logger, # Optional
aggressive_expiry=True # or False for conservative
)
Check Site Status
status, error_msg = handler.check_site_status("https://example.com/", timeout=10)
if handler.should_skip_download(status):
print(f"Skipping download - site unavailable: {error_msg}")
return []
elif status == SiteStatus.CLOUDFLARE_CHALLENGE:
print("Cloudflare challenge detected, will attempt bypass")
Get Fresh Cookies via FlareSolverr
success = handler.get_cookies_via_flaresolverr("https://example.com/", max_retries=2)
if success:
print("Got fresh cookies from FlareSolverr")
else:
print("FlareSolverr failed")
Ensure Cookies Are Valid
# Checks expiration and gets new cookies if needed
if handler.ensure_cookies("https://example.com/"):
print("Cookies are valid")
else:
print("Failed to get valid cookies")
Check and Bypass Automatically
# Checks site status and automatically attempts FlareSolverr if needed
status, cookies_obtained = handler.check_and_bypass("https://example.com/")
if handler.should_skip_download(status):
print("Site is down, skipping")
else:
print("Site is accessible, proceeding")
Integration Examples
ImgInn Module
class ImgInnDownloader:
def __init__(self, ...):
# Initialize CloudflareHandler
self.cf_handler = CloudflareHandler(
module_name="ImgInn",
cookie_file=str(self.cookie_file),
user_agent=self.user_agent,
logger=self.logger,
aggressive_expiry=True
)
def download_posts(self, username, ...):
# Check site status before downloading
status, error_msg = self.cf_handler.check_site_status(
"https://imginn.com/",
timeout=10
)
if self.cf_handler.should_skip_download(status):
self.log(f"Skipping - ImgInn unavailable: {error_msg}", "warning")
return []
# Proceed with download...
Coppermine Module (Conservative Mode)
class CoppermineDownloader:
def __init__(self, ...):
# Use conservative mode
self.cf_handler = CloudflareHandler(
module_name="Coppermine",
cookie_file=str(self.cookie_file),
user_agent=self.user_agent,
logger=self.logger,
aggressive_expiry=False # Conservative
)
Configuration
FlareSolverr Setup
The handler expects FlareSolverr running at http://localhost:8191/v1:
docker run -d \
--name flaresolverr \
-p 8191:8191 \
-e LOG_LEVEL=info \
--restart unless-stopped \
ghcr.io/flaresolverr/flaresolverr:latest
Cookie Storage
Cookies are stored in JSON format:
{
"cookies": [
{
"name": "cf_clearance",
"value": "...",
"domain": ".example.com",
"path": "/",
"expiry": 1234567890
}
],
"timestamp": "2025-11-18T12:00:00"
}
Location: /opt/media-downloader/cookies/{module}_cookies.json
Error Handling
Server Errors (500, 502, 503, 504)
if status == SiteStatus.SERVER_ERROR:
# Site is down, skip downloads
return []
Cloudflare Challenges
if status == SiteStatus.CLOUDFLARE_CHALLENGE:
# Attempt FlareSolverr bypass
if handler.get_cookies_via_flaresolverr(url):
# Retry with new cookies
pass
Timeouts
if status == SiteStatus.TIMEOUT:
# Site not responding, skip
return []
Benefits
- Centralized Logic - All Cloudflare handling in one place
- Reduced Duplication - Eliminates 500+ lines of duplicate code across modules
- Better Error Detection - Distinguishes server errors from Cloudflare challenges
- Automatic Skipping - No wasted time on unavailable sites
- Unified Cookie Management - Same cookie handling for all modules
- Backwards Compatible - Existing modules work without changes
Performance Impact
Before CloudflareHandler
- ImgInn down with 500 error
- Wait 120 seconds for Cloudflare challenge that never resolves
- Launch browser, waste resources
- Eventually timeout with error
After CloudflareHandler
- Check site status (10 seconds)
- Detect 500 error immediately
- Skip download with clear message
- No browser launch, no wasted resources
Time Saved: 110 seconds per failed attempt
Module Integration
All 5 download modules now use CloudflareHandler:
| Module | Expiry Mode | Site URL | Notes |
|---|---|---|---|
| imginn | Aggressive | https://imginn.com/ | Instagram proxy |
| fastdl | Aggressive | https://fastdl.app/ | Instagram API |
| toolzu | Aggressive | https://toolzu.com/ | Instagram downloader |
| snapchat | Aggressive | https://storiesdown.com/ | Snapchat proxy |
| coppermine | Conservative | Dynamic (gallery URL) | Photo galleries |
Future Enhancements
Potential improvements:
- Rate limiting integration
- Proxy rotation support
- Multi-FlareSolverr failover
- Cookie pool management
- Site health monitoring
- Automatic retry scheduling
Troubleshooting
FlareSolverr Not Available
# Handler will automatically disable FlareSolverr for session
# Falls back to Playwright-based bypass
Cookies Not Refreshing
# Check cookie file permissions
# Verify FlareSolverr is running
# Check logs for error messages
Site Status Always Returns Error
# Verify network connectivity
# Check firewall rules
# Ensure target site is actually accessible