Files
media-downloader/docs/CLOUDFLARE_HANDLER.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

9.6 KiB

Universal Cloudflare Handler

Version: 12.0.1 Module: modules/cloudflare_handler.py Status: Production

Overview

The Universal Cloudflare Handler provides centralized Cloudflare bypass, error detection, cookie management, and dynamic browser fingerprinting for all download modules in the media-downloader system.

Features

1. Site Status Detection

Before attempting downloads, the handler checks if the target site is accessible:

  • WORKING - Site is accessible and responding normally
  • SERVER_ERROR - HTTP 500, 502, 503, 504 errors (site is down)
  • CLOUDFLARE_CHALLENGE - Cloudflare challenge page detected
  • FORBIDDEN - HTTP 403 access denied
  • TIMEOUT - Request timed out
  • UNKNOWN_ERROR - Other errors

2. Smart Skip Logic

Downloads are automatically skipped when:

  • Site returns server errors (500, 502, 503, 504)
  • Request times out
  • Unknown errors occur

This prevents wasting time and resources on unavailable sites.

3. FlareSolverr Integration

  • Automatic Cloudflare bypass using FlareSolverr
  • Configurable retry logic (default: 2 attempts)
  • 120-second timeout for difficult challenges
  • Detects cf_clearance cookie presence

For Playwright (Browser Automation)

# Load cookies into browser context
cf_handler.load_cookies_to_playwright(context)

# Save cookies from browser
cf_handler.save_cookies_from_playwright(context)

# Get cookies as list
cookies = cf_handler.get_cookies_list()

For Requests (HTTP Library)

# Load cookies into session
cf_handler.load_cookies_to_requests(session)

# Get cookies as dictionary
cookies = cf_handler.get_cookies_dict()

Aggressive Mode (Default)

  • Cookies expire if older than 12 hours
  • Cookies expire if any cookie will expire within 7 days
  • Used by: imginn, fastdl, toolzu, snapchat

Conservative Mode

  • Only expires if cf_clearance cookie is actually expired
  • Minimizes FlareSolverr calls
  • Used by: coppermine

6. Dynamic Browser Fingerprinting (v12.0.1)

Critical for cf_clearance cookies to work!

The cf_clearance cookie is tied to the browser fingerprint (User-Agent, headers, etc.). If Playwright uses a different fingerprint than FlareSolverr, the cookies will be rejected.

Key Functions

from modules.cloudflare_handler import (
    get_flaresolverr_fingerprint,
    get_playwright_context_options,
    get_playwright_stealth_scripts,
    set_fingerprint_database
)

# Initialize database persistence (call once at startup)
set_fingerprint_database(unified_db)

# Get complete fingerprint (instant from cache/database)
fingerprint = get_flaresolverr_fingerprint()
# Returns: user_agent, sec_ch_ua, locale, timezone, viewport, etc.

# Get ready-to-use Playwright context options
context_options = get_playwright_context_options()
context = browser.new_context(**context_options)

# Add anti-detection scripts
page.add_init_script(get_playwright_stealth_scripts())

Fingerprint Persistence

Fingerprints are cached in three layers:

  1. Memory cache - Instant access during session
  2. Database - Persists across restarts (key_value_store table)
  3. FlareSolverr fetch - Fallback if no cache available

Important: Save Cookies with user_agent

When saving cookies to the database, always include the user_agent:

# CORRECT - includes user_agent
self.unified_db.save_scraper_cookies(
    self.scraper_id,
    cookies,
    user_agent=self.user_agent,  # REQUIRED for cf_clearance!
    merge=True
)

# WRONG - missing user_agent (cookies won't work)
self.unified_db.save_scraper_cookies(self.scraper_id, cookies)

Usage

Basic Initialization

from modules.cloudflare_handler import CloudflareHandler, SiteStatus

handler = CloudflareHandler(
    module_name="MyModule",
    cookie_file="/path/to/cookies.json",
    user_agent="Mozilla/5.0...",
    logger=logger,  # Optional
    aggressive_expiry=True  # or False for conservative
)

Check Site Status

status, error_msg = handler.check_site_status("https://example.com/", timeout=10)

if handler.should_skip_download(status):
    print(f"Skipping download - site unavailable: {error_msg}")
    return []
elif status == SiteStatus.CLOUDFLARE_CHALLENGE:
    print("Cloudflare challenge detected, will attempt bypass")

Get Fresh Cookies via FlareSolverr

success = handler.get_cookies_via_flaresolverr("https://example.com/", max_retries=2)

if success:
    print("Got fresh cookies from FlareSolverr")
else:
    print("FlareSolverr failed")

Ensure Cookies Are Valid

# Checks expiration and gets new cookies if needed
if handler.ensure_cookies("https://example.com/"):
    print("Cookies are valid")
else:
    print("Failed to get valid cookies")

Check and Bypass Automatically

# Checks site status and automatically attempts FlareSolverr if needed
status, cookies_obtained = handler.check_and_bypass("https://example.com/")

if handler.should_skip_download(status):
    print("Site is down, skipping")
else:
    print("Site is accessible, proceeding")

Integration Examples

ImgInn Module

class ImgInnDownloader:
    def __init__(self, ...):
        # Initialize CloudflareHandler
        self.cf_handler = CloudflareHandler(
            module_name="ImgInn",
            cookie_file=str(self.cookie_file),
            user_agent=self.user_agent,
            logger=self.logger,
            aggressive_expiry=True
        )

    def download_posts(self, username, ...):
        # Check site status before downloading
        status, error_msg = self.cf_handler.check_site_status(
            "https://imginn.com/",
            timeout=10
        )

        if self.cf_handler.should_skip_download(status):
            self.log(f"Skipping - ImgInn unavailable: {error_msg}", "warning")
            return []

        # Proceed with download...

Coppermine Module (Conservative Mode)

class CoppermineDownloader:
    def __init__(self, ...):
        # Use conservative mode
        self.cf_handler = CloudflareHandler(
            module_name="Coppermine",
            cookie_file=str(self.cookie_file),
            user_agent=self.user_agent,
            logger=self.logger,
            aggressive_expiry=False  # Conservative
        )

Configuration

FlareSolverr Setup

The handler expects FlareSolverr running at http://localhost:8191/v1:

docker run -d \
  --name flaresolverr \
  -p 8191:8191 \
  -e LOG_LEVEL=info \
  --restart unless-stopped \
  ghcr.io/flaresolverr/flaresolverr:latest

Cookies are stored in JSON format:

{
  "cookies": [
    {
      "name": "cf_clearance",
      "value": "...",
      "domain": ".example.com",
      "path": "/",
      "expiry": 1234567890
    }
  ],
  "timestamp": "2025-11-18T12:00:00"
}

Location: /opt/media-downloader/cookies/{module}_cookies.json

Error Handling

Server Errors (500, 502, 503, 504)

if status == SiteStatus.SERVER_ERROR:
    # Site is down, skip downloads
    return []

Cloudflare Challenges

if status == SiteStatus.CLOUDFLARE_CHALLENGE:
    # Attempt FlareSolverr bypass
    if handler.get_cookies_via_flaresolverr(url):
        # Retry with new cookies
        pass

Timeouts

if status == SiteStatus.TIMEOUT:
    # Site not responding, skip
    return []

Benefits

  1. Centralized Logic - All Cloudflare handling in one place
  2. Reduced Duplication - Eliminates 500+ lines of duplicate code across modules
  3. Better Error Detection - Distinguishes server errors from Cloudflare challenges
  4. Automatic Skipping - No wasted time on unavailable sites
  5. Unified Cookie Management - Same cookie handling for all modules
  6. Backwards Compatible - Existing modules work without changes

Performance Impact

Before CloudflareHandler

  • ImgInn down with 500 error
  • Wait 120 seconds for Cloudflare challenge that never resolves
  • Launch browser, waste resources
  • Eventually timeout with error

After CloudflareHandler

  • Check site status (10 seconds)
  • Detect 500 error immediately
  • Skip download with clear message
  • No browser launch, no wasted resources

Time Saved: 110 seconds per failed attempt

Module Integration

All 5 download modules now use CloudflareHandler:

Module Expiry Mode Site URL Notes
imginn Aggressive https://imginn.com/ Instagram proxy
fastdl Aggressive https://fastdl.app/ Instagram API
toolzu Aggressive https://toolzu.com/ Instagram downloader
snapchat Aggressive https://storiesdown.com/ Snapchat proxy
coppermine Conservative Dynamic (gallery URL) Photo galleries

Future Enhancements

Potential improvements:

  • Rate limiting integration
  • Proxy rotation support
  • Multi-FlareSolverr failover
  • Cookie pool management
  • Site health monitoring
  • Automatic retry scheduling

Troubleshooting

FlareSolverr Not Available

# Handler will automatically disable FlareSolverr for session
# Falls back to Playwright-based bypass

Cookies Not Refreshing

# Check cookie file permissions
# Verify FlareSolverr is running
# Check logs for error messages

Site Status Always Returns Error

# Verify network connectivity
# Check firewall rules
# Ensure target site is actually accessible

See Also