# Universal Cloudflare Handler **Version:** 12.0.1 **Module:** `modules/cloudflare_handler.py` **Status:** Production ## Overview The Universal Cloudflare Handler provides centralized Cloudflare bypass, error detection, cookie management, and **dynamic browser fingerprinting** for all download modules in the media-downloader system. ## Features ### 1. **Site Status Detection** Before attempting downloads, the handler checks if the target site is accessible: - **WORKING** - Site is accessible and responding normally - **SERVER_ERROR** - HTTP 500, 502, 503, 504 errors (site is down) - **CLOUDFLARE_CHALLENGE** - Cloudflare challenge page detected - **FORBIDDEN** - HTTP 403 access denied - **TIMEOUT** - Request timed out - **UNKNOWN_ERROR** - Other errors ### 2. **Smart Skip Logic** Downloads are automatically skipped when: - Site returns server errors (500, 502, 503, 504) - Request times out - Unknown errors occur This prevents wasting time and resources on unavailable sites. ### 3. **FlareSolverr Integration** - Automatic Cloudflare bypass using FlareSolverr - Configurable retry logic (default: 2 attempts) - 120-second timeout for difficult challenges - Detects cf_clearance cookie presence ### 4. **Cookie Management** #### For Playwright (Browser Automation) ```python # Load cookies into browser context cf_handler.load_cookies_to_playwright(context) # Save cookies from browser cf_handler.save_cookies_from_playwright(context) # Get cookies as list cookies = cf_handler.get_cookies_list() ``` #### For Requests (HTTP Library) ```python # Load cookies into session cf_handler.load_cookies_to_requests(session) # Get cookies as dictionary cookies = cf_handler.get_cookies_dict() ``` ### 5. **Cookie Expiration Strategies** #### Aggressive Mode (Default) - Cookies expire if older than 12 hours - Cookies expire if any cookie will expire within 7 days - Used by: imginn, fastdl, toolzu, snapchat #### Conservative Mode - Only expires if cf_clearance cookie is actually expired - Minimizes FlareSolverr calls - Used by: coppermine ### 6. **Dynamic Browser Fingerprinting** (v12.0.1) **Critical for cf_clearance cookies to work!** The cf_clearance cookie is tied to the browser fingerprint (User-Agent, headers, etc.). If Playwright uses a different fingerprint than FlareSolverr, the cookies will be rejected. #### Key Functions ```python from modules.cloudflare_handler import ( get_flaresolverr_fingerprint, get_playwright_context_options, get_playwright_stealth_scripts, set_fingerprint_database ) # Initialize database persistence (call once at startup) set_fingerprint_database(unified_db) # Get complete fingerprint (instant from cache/database) fingerprint = get_flaresolverr_fingerprint() # Returns: user_agent, sec_ch_ua, locale, timezone, viewport, etc. # Get ready-to-use Playwright context options context_options = get_playwright_context_options() context = browser.new_context(**context_options) # Add anti-detection scripts page.add_init_script(get_playwright_stealth_scripts()) ``` #### Fingerprint Persistence Fingerprints are cached in three layers: 1. **Memory cache** - Instant access during session 2. **Database** - Persists across restarts (key_value_store table) 3. **FlareSolverr fetch** - Fallback if no cache available #### Important: Save Cookies with user_agent When saving cookies to the database, **always include the user_agent**: ```python # CORRECT - includes user_agent self.unified_db.save_scraper_cookies( self.scraper_id, cookies, user_agent=self.user_agent, # REQUIRED for cf_clearance! merge=True ) # WRONG - missing user_agent (cookies won't work) self.unified_db.save_scraper_cookies(self.scraper_id, cookies) ``` ## Usage ### Basic Initialization ```python from modules.cloudflare_handler import CloudflareHandler, SiteStatus handler = CloudflareHandler( module_name="MyModule", cookie_file="/path/to/cookies.json", user_agent="Mozilla/5.0...", logger=logger, # Optional aggressive_expiry=True # or False for conservative ) ``` ### Check Site Status ```python status, error_msg = handler.check_site_status("https://example.com/", timeout=10) if handler.should_skip_download(status): print(f"Skipping download - site unavailable: {error_msg}") return [] elif status == SiteStatus.CLOUDFLARE_CHALLENGE: print("Cloudflare challenge detected, will attempt bypass") ``` ### Get Fresh Cookies via FlareSolverr ```python success = handler.get_cookies_via_flaresolverr("https://example.com/", max_retries=2) if success: print("Got fresh cookies from FlareSolverr") else: print("FlareSolverr failed") ``` ### Ensure Cookies Are Valid ```python # Checks expiration and gets new cookies if needed if handler.ensure_cookies("https://example.com/"): print("Cookies are valid") else: print("Failed to get valid cookies") ``` ### Check and Bypass Automatically ```python # Checks site status and automatically attempts FlareSolverr if needed status, cookies_obtained = handler.check_and_bypass("https://example.com/") if handler.should_skip_download(status): print("Site is down, skipping") else: print("Site is accessible, proceeding") ``` ## Integration Examples ### ImgInn Module ```python class ImgInnDownloader: def __init__(self, ...): # Initialize CloudflareHandler self.cf_handler = CloudflareHandler( module_name="ImgInn", cookie_file=str(self.cookie_file), user_agent=self.user_agent, logger=self.logger, aggressive_expiry=True ) def download_posts(self, username, ...): # Check site status before downloading status, error_msg = self.cf_handler.check_site_status( "https://imginn.com/", timeout=10 ) if self.cf_handler.should_skip_download(status): self.log(f"Skipping - ImgInn unavailable: {error_msg}", "warning") return [] # Proceed with download... ``` ### Coppermine Module (Conservative Mode) ```python class CoppermineDownloader: def __init__(self, ...): # Use conservative mode self.cf_handler = CloudflareHandler( module_name="Coppermine", cookie_file=str(self.cookie_file), user_agent=self.user_agent, logger=self.logger, aggressive_expiry=False # Conservative ) ``` ## Configuration ### FlareSolverr Setup The handler expects FlareSolverr running at `http://localhost:8191/v1`: ```bash docker run -d \ --name flaresolverr \ -p 8191:8191 \ -e LOG_LEVEL=info \ --restart unless-stopped \ ghcr.io/flaresolverr/flaresolverr:latest ``` ### Cookie Storage Cookies are stored in JSON format: ```json { "cookies": [ { "name": "cf_clearance", "value": "...", "domain": ".example.com", "path": "/", "expiry": 1234567890 } ], "timestamp": "2025-11-18T12:00:00" } ``` Location: `/opt/media-downloader/cookies/{module}_cookies.json` ## Error Handling ### Server Errors (500, 502, 503, 504) ```python if status == SiteStatus.SERVER_ERROR: # Site is down, skip downloads return [] ``` ### Cloudflare Challenges ```python if status == SiteStatus.CLOUDFLARE_CHALLENGE: # Attempt FlareSolverr bypass if handler.get_cookies_via_flaresolverr(url): # Retry with new cookies pass ``` ### Timeouts ```python if status == SiteStatus.TIMEOUT: # Site not responding, skip return [] ``` ## Benefits 1. **Centralized Logic** - All Cloudflare handling in one place 2. **Reduced Duplication** - Eliminates 500+ lines of duplicate code across modules 3. **Better Error Detection** - Distinguishes server errors from Cloudflare challenges 4. **Automatic Skipping** - No wasted time on unavailable sites 5. **Unified Cookie Management** - Same cookie handling for all modules 6. **Backwards Compatible** - Existing modules work without changes ## Performance Impact ### Before CloudflareHandler - ImgInn down with 500 error - Wait 120 seconds for Cloudflare challenge that never resolves - Launch browser, waste resources - Eventually timeout with error ### After CloudflareHandler - Check site status (10 seconds) - Detect 500 error immediately - Skip download with clear message - No browser launch, no wasted resources **Time Saved:** 110 seconds per failed attempt ## Module Integration All 5 download modules now use CloudflareHandler: | Module | Expiry Mode | Site URL | Notes | |--------|-------------|----------|-------| | imginn | Aggressive | https://imginn.com/ | Instagram proxy | | fastdl | Aggressive | https://fastdl.app/ | Instagram API | | toolzu | Aggressive | https://toolzu.com/ | Instagram downloader | | snapchat | Aggressive | https://storiesdown.com/ | Snapchat proxy | | coppermine | Conservative | Dynamic (gallery URL) | Photo galleries | ## Future Enhancements Potential improvements: - Rate limiting integration - Proxy rotation support - Multi-FlareSolverr failover - Cookie pool management - Site health monitoring - Automatic retry scheduling ## Troubleshooting ### FlareSolverr Not Available ```python # Handler will automatically disable FlareSolverr for session # Falls back to Playwright-based bypass ``` ### Cookies Not Refreshing ```python # Check cookie file permissions # Verify FlareSolverr is running # Check logs for error messages ``` ### Site Status Always Returns Error ```python # Verify network connectivity # Check firewall rules # Ensure target site is actually accessible ``` ## See Also - [FlareSolverr Integration](FLARESOLVERR.md) - [Download Module Architecture](DOWNLOAD_MODULES.md) - [Cookie Management](COOKIES.md) - [Error Handling Best Practices](ERROR_HANDLING.md)