2026-03-29 22:43:12 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00
2026-03-29 22:42:55 -04:00

Media Downloader

Version: 13.13.1 Status: Production-Ready

A sophisticated, production-grade media archival system for automated downloading from Instagram, TikTok, Snapchat, YouTube, Coppermine photo galleries, and web forums. Features two-factor authentication (2FA), face recognition, FlareSolverr Cloudflare bypass, advanced deduplication, quality merging, timestamp preservation, database CLI management, and Immich integration.


🌟 Key Features

  • Pinned Post Highlighting, Mobile Header Layout, Feed Navigation Fixes (v13.13.1): Pinned posts in paid content Feed now have amber background tint across all views (desktop/mobile). Mobile post header moves date to its own line. Collapsing pinned section auto-selects first regular post. Keyboard navigation skips collapsed pinned posts
  • Instagram Authenticated API Toggle, Cookie Management UI (v13.13.0): Per-creator "Use Authenticated API" toggle — when enabled, Instagram sync uses browser cookies as the primary fetch method with automatic fallback to unauthenticated GraphQL if cookies expire. Instagram (Authenticated) scraper now visible on Scrapers page with cookie upload/paste UI. Creator cards show green "Authenticated" badge and purple "Filtered" badge for tagged user filters. Cookie health alerts on 401 via WebSocket + Pushover. mds service manager updated with cloud backup services
  • Sync Service Button on Creators Page (v13.12.0): New "Sync All" button on each service header in the Creators page queues background sync for all enabled creators in that service at once. New POST endpoint for bulk service sync with sequential creator processing. Visual feedback with spinning icon and disabled state during active syncs
  • Instagram User ID Fix, Unviewed Count Fix, Bundle Sidebar Default (v13.11.2): Instagram user ID lookup switched to HTML scraping as primary method (bypasses 401 errors from stale cookies). User ID cache now persists to database across API restarts. Unviewed posts count now respects per-creator filter_tagged_users settings. Bundle sidebar collapsed by default in both lightboxes
  • Instagram GraphQL Post Fetching, Scan Progress, Feed Page Size (v13.11.1): Instagram post sync switched from ImgInn API (480-post server-side limit) to direct Instagram GraphQL API, enabling full history retrieval (10,000+ posts) with tagged users and full-res CDN URLs. Real-time progress reporting during scanning phase shows post count after each page. GraphQL delay tuned to 3-5s to avoid rate limiting. Feed initial page size increased from 30 to 50 posts
  • Media Gallery Page, GalleryLightbox, Server-Side Shuffle, Timeline Fix (v13.11.0): New /gallery page replacing /media with Immich-style justified thumbnail layout, daily date grouping, timeline scrubber, slideshow with server-side shuffle (PostgreSQL md5-based deterministic ordering), and infinite scroll. New GalleryLightbox component with embedded metadata display (ffprobe/exiftool), zoom/pan, and swipe navigation. Timeline jump fixed for both galleries by bounding date queries with date_from + date_to. Feed unread count no longer shows phantom items from empty posts. Paid gallery videos now autoplay immediately
  • Justified Gallery Layout, Timeline Navigation, Health Check Fixes (v13.10.0): Paid content gallery now uses Immich-style justified thumbnail rows showing images at their natural aspect ratios. Timeline scrubber supports jumping to any month (even unloaded ones) via date-filtered API reload with dismissible filter chip. Timeline extends full viewport height. Gallery rendering optimized with memoized row computation, callback refs, and React.memo sections. Fixed service health checks for Coppermine, BestEyeCandy, and Reddit (were falling through to wrong generic API check)
  • Press Monitoring, Cloud Backup, New Scrapers, Feed Shuffle (v13.9.0): Celebrity press monitoring via Google News RSS + GDELT with full article extraction, dedicated Press page, and Pushover notifications. Cloud backup system with rclone B2/S3 support, inotify daemon, daily PostgreSQL dumps, and live Dashboard progress widget. Three new paid content scrapers: BestEyeCandy (celebrity photos), Coppermine (PHP galleries), Reddit (via gallery-dl). Backend-driven shuffle for paid content slideshow with deterministic seeding. Mobile feed fix (page size 10→30 to load past pinned posts)
  • Instagram Sync Settings, Appearance Role Combining, Tagged User Filter (v13.7.0): Per-creator Instagram sync settings with toggles for posts, stories, and highlights. Tagged user filter for Instagram feed display — only show posts where selected users are tagged. Dashboard upcoming appearances widget now shows credit type badges (Acting, Host, Producing, etc.). Fixed multi-role appearance combining on upcoming tab, episode deduplication in detail modal, inflated episode counts, notification pending states. All paid content Instagram operations now route exclusively through ImgInn API
  • ImgInn Paid Content Adapter, Reddit Timezone Fix, Lightbox Improvements (v13.6.0): ImgInn API adapter for Paid Content Instagram sync — no Instagram credentials needed. Reddit private gallery timezone offset fix (UTC to local). Bundle sidebar fixes and improvements. Dashboard platform label updates. Codebase cleanup of 36 stale scripts
  • Landscape Lightbox, Min Resolution Filter, TypeScript Fixes (v13.5.1): Private Gallery lightbox now supports landscape phone orientation with touch detection, fixed positioning, and compact controls. Per-person-group minimum resolution filter skips low-res images during import and hides them in gallery view. Fixed TypeScript type errors in api.ts, Config.tsx, and Scheduler.tsx. Cleaned up stale migration/database files and pycache directories
  • ImgInn API Module, Auth Circuit Breaker, Gallery Bridge, FastDL Consolidation (v13.5.0): New ImgInn API scraper module for direct API access (faster than browser scraping). Instagram auth failure circuit breaker — detects expired cookies during download runs, skips stories/tagged, continues posts/reels, sends Pushover alert and triggers cookie health banner. Private gallery bridge now works with instagram_client downloads (was previously missing). FastDL consolidated to single browser session per user. Dashboard "New In" cards fixed with ID-based ordering. Instagram client TLS fingerprint matched to Edge cookies (edge101). Phrase search accounts now visible in gallery account mapping
  • Private Gallery Unread Fixes, Paid Content Banners (v13.4.1): Fixed Private Gallery "View unread" showing nothing (missing SQL JOINs), mark-all-read not refreshing (resetQueries), and no post navigation after upload (pendingSelectPostId). New unviewed posts banner on Paid Content Feed with view/mark-all buttons. New unread messages banner on Dashboard and Feed (violet gradient) with link to Messages page. Consistent gradient+border banner styling
  • Instagram Client API, Module Enable/Disable, Image Info Bar (v13.4.0): New direct Instagram API downloader using curl_cffi with browser TLS fingerprinting — 10-20x faster than ImgInn scraping. Downloads posts (public GraphQL), stories, reels, and tagged posts (authenticated REST). New Snapchat Client module with same direct API approach. Module enable/disable system allows toggling individual platforms from Configuration. Image info bar added to lightbox for parity with video info bar. Cookie health monitoring banner
  • Reddit Community Monitor, Private Gallery Fixes (v13.3.0): Automated Reddit community monitoring for Private Gallery - map subreddits to persons, automatically download new posts (including imgur/redgifs via gallery-dl), encrypt and import with 'reddit' tag. Batch subreddit input, configurable check intervals, real-time progress tracking, two-layer duplicate detection (post history + file hash). Fixed Content-Disposition header encoding for filenames with special characters (emojis, smart quotes). Standardized scheduler and log viewer naming. Dashboard and scheduler UI updated with Reddit Monitor entries
  • Profile Image Caching, FastDL Stories, TikTok Bio Fix (v13.2.0): Avatar and banner images now cached locally during creator sync across all platforms (Instagram, YouTube, TikTok, Twitch, Fansly, OnlyFans, etc.), eliminating broken images from expired CDN URLs. Instagram stories now fetched via FastDL for real post timestamps (ImgInn fallback). TikTok bio emoji corruption fixed. PaidContent logs now visible in log viewer. curl_cffi added for Instagram CDN downloads with browser TLS impersonation
  • PostgreSQL Boolean Fixes, Quality Recheck Scheduler, Dashboard Pinned Posts (v13.1.1): Fixed celebrity appearances ON CONFLICT with partial unique indexes, forum boolean column handling (TRUE/FALSE vs 1/0), paid_content_posts.downloaded and forum_posts.has_images columns converted to boolean, FastDL URL redirect fix (/en to /en2), Fansly quality recheck now runs on scheduled syncs (was only running on manual API syncs), Dashboard recent posts properly excludes pinned posts via new skip_pinned filter
  • PostgreSQL Compatibility Fixes & Security Hardening (v13.1.0): Comprehensive fixes for 15+ PostgreSQL edge cases uncovered in production (datetime parameter translation, boolean column mismatches, strftime/datetime function translations, GROUP BY strictness, text vs timestamp comparisons, passkey schema alignment). Discovery queue endpoints now functional. Paid content image proxy supports Fansly, TikTok, and xHamster CDNs via suffix-based domain matching. SQL injection defense-in-depth with server-side sort_order and table_alias validation. Performance fixes for review page and media gallery (correlated subqueries replaced with LEFT JOINs)
  • PostgreSQL Migration & Private Gallery Fix (v13.0.0): Runtime SQLite→PostgreSQL adapter eliminates database locking under concurrent load. All 87 tables migrated from 6 separate SQLite databases to a single PostgreSQL instance via transparent monkey-patching (zero SQL query changes). Fixed Private Gallery feature toggle failing due to stale feature paths. Critical bug fixes in pg_adapter.py (% escaping, cursor leaks). Consolidated lock-error handling for PostgreSQL compatibility. Added psycopg2-binary to requirements and dependency checker. Updated installer with PostgreSQL setup
  • Username List Editor for Configuration (v12.15.0): New reusable UsernameListEditor component replaces 6 comma-separated username input fields across FastDL, ImgInn, Toolzu, and Snapchat sections. Features alphabetical sorting, duplicate detection with amber badges, bulk paste import modal, individual add/remove, user count badge, and lowercase normalization
  • Streaming Decryption & Download Reliability (v12.14.0): Generator-based streaming decryption for all encrypted videos (memory-efficient, no full-file loading). Auto-migration converts single-shot encrypted files >50MB to chunked format on gallery unlock. All direct downloads now multi-threaded with 5 parallel segments. Stall detection (30s timeout) with 3-retry resume logic for stuck downloads. Duplicate filename check verifies .enc file exists on disk. Subprocess stdin double-close fix. Scheduler sync_tmdb_appearances fix. Scraper error-to-warning log level changes with smart escalation (5+ failures escalate to error)
  • Security Hardening & Message Read Tracking (v12.13.0): 12 security fixes across backend routers (auth, CSRF, DB patterns, path traversal, rate limiting). Messages mark as read on scroll (IntersectionObserver), auto-scroll to first unread. Message sync downloads attachments automatically. Easynews search fixed (API Content-Type header change). Dashboard dismiss button timezone fix
  • Health Check & Sync Fixes (v12.12.1): Health checks complete in ~1s (was minutes), OnlyFans scheduled sync stops at already-seen posts (was paginating all content), push notifications for new messages during paid content sync, PiP menu desktop-only cleanup
  • Messages & Chat Support (v12.12.0): Direct message viewing for OnlyFans and Fansly creators with two-panel chat UI, conversation list with search, message bubbles with PPV/tip badges, inline media lightbox. OnlyFans Direct credential setup (cookie paste, HAR upload, Cookie-Editor import). Picture-in-Picture in lightboxes. Auto health checks. Multi-word search. Mobile video unmute fix
  • Lightbox Navigation Fixes (v12.11.1): Shuffle-aware arrow/keyboard/swipe navigation in both lightboxes, arrows render above video with auto-hide on hover, wider hover detection area, video autoplay during slideshow, Private Gallery file_type filter now correctly filters media items within posts
  • Config Redesign & Multi-URL Import (v12.11.0): Private Gallery Config page redesigned from sidebar to horizontal underline tabs matching other settings pages. Upload modal now supports pasting multiple URLs (Bunkr, Pixeldrain, Gofile, Cyberdrop, or direct links) to import media from the web with background download, dedup, encryption, and progress tracking
  • Slideshow Mode & Gallery Improvements (v12.10.0): Slideshow mode for Private Gallery and Paid Content lightboxes with auto-advance, configurable interval (3s/5s/8s/10s), shuffle mode (Fisher-Yates), and video-aware advancement (videos play to completion). Context-aware: gallery view slideshows all items, feed/social view slideshows selected post. Server-side total_media counts, smart filter bar labels, auto-loading gallery pages on filter change
  • Reliable 4K HLS Downloads (v12.9.1): Fixed 4K video stalling at ~1.2GB by replacing ffmpeg HLS networking with direct segment downloads (aiohttp, 5 concurrent, per-segment retry). Quality recheck now deletes and re-syncs for clean upgrades. Exact duration matching via playlist trim
  • Thumbnail Progress Modal & Async Gallery (v12.9.0): Visual thumbnail-grid progress tracking with per-file status overlays for Copy to Gallery, Upload, and Manual Import. Private Gallery copy/upload operations now run asynchronously with real-time polling. Added Instagram filename patterns for browser extension formats (video/photo/shortcode). Video thumbnail fallback for short videos
  • Private Gallery Enhancements (v12.8.0): Tags sorted by usage count, redesigned Copy to Gallery modal with thumbnail preview and searchable tag selector, copy from Recycle Bin with original filename preservation, EXIF-aware thumbnails, multi-page selection persistence, feature-gated buttons
  • Private Gallery & Features Management (v12.7.0): Password-protected encrypted media gallery with AES-256 encryption, obfuscated UUID file storage, person/relationship management, auto-generated albums, URL import from social media. New Features Management system allows enabling/disabling sidebar items, custom labels, and drag-and-drop reordering
  • Cloudflare Bypass Fixes (v12.5.1): Fixed critical user-agent mismatch bug causing Cloudflare cookie failures, added rate limiting between post downloads (3-8s) and content types (15-25s), cookie clearing before refresh, debug screenshots on failures
  • Mobile Social View & Bundle Lightbox (v12.5.0): Mobile-only social view with full post modal, video thumbnails with play buttons for performance, Fansly-style Bundle sidebar in lightbox (vertical collapsible thumbnails), pinch-to-zoom for images, fixed infinite scroll on mobile
  • Keyboard Navigation & Video Sync (v12.3.0): Arrow key navigation for posts in Feed/Social views, video state sync between inline and lightbox players (time and play/pause preserved), consistent Fansly-style avatar designs
  • 4K Video & Image Quality (v12.2.4): Fixed CloudFront signed HLS 4K video downloads, real-time progress tracking for streaming downloads, images now download at original full resolution, utility scripts for checking/upgrading to 4K
  • Podcast Notification Fix (v12.2.3): Push notifications now include podcast artwork, Taddy API fallback account support, whitelist-based interview filtering, mobile-optimized appearance detail modal
  • Mobile Touch Fixes (v12.2.2): Feed items no longer open accidentally while scrolling, touch movement detection (20px + 50ms threshold), mobile lightbox shows essential buttons only, proper event isolation
  • Paid Content Notifications (v12.2.1): Full metadata in lightbox (post content, date, platform, resolution), Feed-style thumbnail overlays, scheduler priority fix for most overdue task, YouTube/Twitch sync 10x faster with date filters
  • Error Monitor System (v6.52.11): Dashboard banner for new log errors since last visit, push alerts for unreviewed errors after 24 hours, log page filtering with URL params, automatic error deduplication and 7-day retention
  • Scrapers Settings Page (v6.52.0): Per-scraper proxy configuration, centralized cookie management in database, cookie upload from browser extension, FlareSolverr test connection, proxy URL builder with separate fields for protocol/host/port/user/pass
  • Discovery Page Enhancements (v6.51.0): Recent Activity tab, clickable timeline/heatmap navigation, Smart Folder file counts and preview thumbnails, quick filter builder with dropdowns
  • AI Semantic Search (v6.47.0): CLIP-based natural language search across entire media library, find images by describing them
  • Manual Import Enhancements (v6.45.0): Files properly appear in date sorting, manual date/time entry for custom services, GIFs treated as videos with play icon overlay
  • Forum Downloader Improvements (v6.44.0): ImageTwist rate limiting fix, phun.org support with Cloudflare bypass, scheduler auto-sync with forum config
  • Face Recognition Fallback (v6.43.13): Dual-encoding with InsightFace + face_recognition library fallback, automatic detection when primary detector fails on difficult lighting, improved tolerance (0.20) for better matching
  • Face Reference Storage (v6.43.12): Reference images stored in dedicated directory with UUID filenames, thumbnails cached in database for instant page loading, automatic cleanup on deletion
  • Security Hardening (v6.43.9): Path traversal fixes, regex injection prevention, PowerShell escaping, race condition fixes, configurable secure cookies
  • Gallery Delete & Lightbox (v6.43.5): Delete all files from gallery downloads at once, lightbox shows all gallery files with navigation between images/videos
  • Database & Reliability (v6.43.0): Increased connection pool (5→20), periodic WAL checkpoints, graceful shutdown handler, memory leak fixes, temp cleanup on startup
  • Universal Video Downloader (v6.41.0): Multi-platform video downloads supporting YouTube, Vimeo, Dailymotion, Bilibili, and gallery-dl sites (Erome, Bunkr, Cyberdrop, etc.)
  • Enhanced Notifications Page (v6.40.5): Full Media page parity with action button overlays (Review/Add Reference/Delete), complete lightbox functionality with metadata enrichment, and face recognition integration
  • Content-Hash Thumbnail Caching (v6.39.0): Intelligent thumbnail caching using SHA256 content hash - cache survives file moves between locations (instant loading in recycle bin)
  • RecycleBin Filter Improvements (v6.39.0): Unified dropdown filter UI with accurate pagination, cleaner interface matching Review page design
  • Real-Time Log Updates (v6.38.10): Live log streaming with 2-second auto-refresh, toggle on/off, web UI visibility for all operations including cleanup
  • Enhanced Cleanup Logging (v6.38.10): Log cleanup script uses universal logger - see all cleanup actions in web UI in real-time
  • Advanced Sorting & Filtering (v6.36.0): Multi-field sorting across Downloads, Media, and Review pages with post date/download date, newest/oldest ordering, optimized with database indexes for sub-10ms performance
  • Universal Cloudflare Handler (v6.35.0): Centralized Cloudflare bypass with site status detection and automatic skip logic for down sites
  • File Inventory System (v6.34.0): Database-first architecture with 50-100x performance boost - all pages load in <100ms instead of 5-10 seconds
  • Performance Optimizations (v6.20.0): 10-100x faster metadata searches with indexed columns, Redis caching for instant stats/analytics
  • Two-Factor Authentication (v6.13.0): Complete 2FA with TOTP, Duo Security, and Passkeys/WebAuthn
  • Passkey Support (v6.13.0): Biometric authentication with Face ID, Touch ID, Windows Hello, security keys
  • 2FA Configuration UI (v6.13.0): Manage all 2FA methods, backup codes, and devices from web interface
  • Security Hardening (v6.19.0-6.19.2): CSRF protection, path traversal prevention, input validation, cookie-based auth, rate limiting
  • Recycle Bin (v6.9.0): Soft delete with restore capability, lightbox preview, and statistics
  • Quick Delete (v6.9.0): Delete icon overlay on Media page thumbnails for instant deletion
  • Database Settings (v6.9.0): All configuration migrated from JSON to database
  • Face Recognition Filtering (v6.6.0): Filter downloads by face match status (Matched/No Match/Not Scanned)
  • Face Recognition (v6.5.0): Automatic face detection and matching for images AND videos
  • Review Queue (v6.5.0): Web UI for managing unmatched media with batch operations
  • FlareSolverr Integration: Automatic Cloudflare bypass across all download modules with intelligent error detection
  • Multi-Platform Support: Instagram (5 methods), TikTok, Snapchat (2 methods), YouTube, Coppermine galleries, 7 forum types
  • HEIC Format Support: Full support for Apple HEIC images including EXIF processing
  • Smart Deduplication: SHA256 file hash-based duplicate detection across all platforms
  • Database CLI: Self-service database management via ./db command
  • Quality Merging: Combine FastDL timestamps with Toolzu high-res images
  • Automated Scheduling: Randomized intervals with persistent state
  • Push Notifications: Pushover integration with thumbnail previews and smart platform names
  • Immich Integration: Automatic photo library scanning
  • Browser Automation: Playwright-based scraping with anti-detection
  • Version Control: Automated backups with version tagging

📊 Statistics

  • 67,000+ lines of production Python code
  • 89 database tables across PostgreSQL (migrated from 6 SQLite databases in v13.0.0)
  • 13 platforms supported (Instagram×5, TikTok, Snapchat×2, YouTube, Coppermine, BestEyeCandy, Reddit, Forums)
  • 280+ REST API endpoints across 28 routers
  • 38 frontend pages with TanStack Query data fetching

🚀 Quick Start

Prerequisites

PostgreSQL 16+ (Required, v13.0.0+):

sudo apt install postgresql postgresql-contrib
sudo -u postgres createuser media_downloader -P
sudo -u postgres createdb -O media_downloader media_downloader

FlareSolverr (Required for Cloudflare bypass):

docker run -d \
  --name flaresolverr \
  -p 8191:8191 \
  -e LOG_LEVEL=info \
  --restart unless-stopped \
  ghcr.io/flaresolverr/flaresolverr:latest

Installation

# Run installation script
cd /opt/media-downloader
sudo ./scripts/install.sh

# Or manual setup
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromium firefox

Configuration

All settings are now managed through the Web UI at http://your-server:5173/config. Settings are stored in the database (PostgreSQL, or SQLite fallback) for better reliability and management.

Access the configuration page to set up:

  • Platform credentials (Instagram, TikTok, Snapchat, Forums)
  • Face recognition settings
  • Notification preferences (Pushover)
  • Download directories
  • Scheduler intervals

Run

# Manual run (all platforms)
media-downloader

# With scheduler (continuous mode with random intervals)
media-downloader --scheduler

# Check scheduler status
media-downloader --scheduler-status

# Specific platform only
media-downloader --platform instagram
media-downloader --platform coppermine
media-downloader --platform forums

# Test mode (minimal download for testing)
media-downloader --test --platform instagram

# Database management
media-downloader --db stats              # Show database statistics
media-downloader --db list --limit 20    # List recent downloads
media-downloader --db delete-user user   # Delete all downloads from user
media-downloader --db delete POST_ID     # Delete specific post by ID

# Alternative: use ./db wrapper
./db stats
./db list --limit 20

Face Recognition (v6.5.0+)

Train reference faces and review unmatched media:

# Add reference faces for person
venv/bin/python3 scripts/add_reference_face.py "Person Name" "/path/to/photo.jpg"

# Test face recognition
venv/bin/python3 scripts/test_face_recognition.py "/path/to/test.jpg"

# View review queue via Web UI
# Navigate to http://your-server:5173/review
# - Keep: Move to destination
# - Add Reference: Add face + move
# - Delete: Remove from queue
# - Batch operations: Select multiple

# Manual review queue
ls /opt/immich/review/              # List unmatched media

How it works:

  1. Downloaded media → Hash check (skip duplicates)
  2. Face detection (images: direct, videos: extract frame @ 1s)
  3. Match against reference faces (tolerance: 0.6)
  4. Match → Final destination | No match → Review queue
  5. Web UI for manual review with batch operations

Reference Storage (v6.43.12):

  • Reference images copied to /opt/media-downloader/data/face_references/
  • UUID filenames prevent conflicts (e.g., a1b2c3d4-e5f6-7890-abcd-ef1234567890.jpg)
  • Thumbnails pre-generated and cached in database for instant config page loading
  • Files automatically deleted when references are removed
  • Independent of source files - safe if originals are moved/deleted

See docs/FACE_RECOGNITION.md for full documentation.

A secure, password-protected section for organizing personal media:

Security Features:

  • Password protection with Argon2id key derivation
  • AES-256-GCM encryption for all stored files
  • Obfuscated storage - files stored with UUID names, no personal info in filesystem
  • Session-based authentication with configurable auto-lock timeout
  • Database fields encrypted - unreadable without password

Features:

  • Person Management: Create persons with configurable relationship types (Friend, Family, etc.)
  • Auto-Generated Albums: Each person automatically has an album with cover image and count
  • URL Import: Import from Instagram, YouTube, TikTok, and image hosts (Pixhost, Imagetwist, etc.)
  • Date Auto-Extraction: From EXIF metadata, filename patterns, and social media timestamps
  • Duplicate Detection: File hash comparison during upload/import
  • Batch Export: Download decrypted files with original filenames as ZIP

Access:

  • Navigate to /private-gallery or use keyboard shortcut Ctrl+Shift+P
  • First visit prompts password creation
  • Sidebar link hidden when locked, visible when unlocked

Copy to Private Gallery: Integration available on Downloads, Media, Review, Recycle Bin, and Paid Content pages:

  1. Select items using select mode (selections persist across pages)
  2. Click "Copy to Private Gallery" (hidden when Private Gallery feature is disabled)
  3. Assign person (auto-populates default tags), optional tags via searchable selector
  4. Preview selected files as thumbnail grid
  5. Files are encrypted and stored securely with original filenames preserved

Features Management (v12.7.0+)

Customize the sidebar navigation from Configuration → Features tab:

  • Enable/Disable Features: Toggle sidebar menu items on/off
  • Custom Labels: Rename any menu item to your preference
  • Reorder Items: Drag-and-drop to reorder items within groups
  • Reorder Groups: Drag-and-drop to reorder entire menu groups
  • Route Protection: Disabled features redirect to home (prevents direct URL access)
  • Reset to Defaults: One-click restore of all feature settings

📁 Directory Structure

/opt/media-downloader/
├── media-downloader.py          # Main application
├── db                            # Database CLI wrapper
├── setup.py                      # Installation setup
├── requirements.txt              # Python dependencies
├── VERSION                       # Version number (13.13.1)
├── CHANGELOG.md                  # Release notes
├── README.md                     # This file
├── INSTALL.md                    # Installation guide
│
├── docs/                         # Documentation
│   ├── VERSIONING.md            # Version control & backup guide
│   ├── GUI_DESIGN_PLAN.md       # Web GUI design documentation
│   └── archive/                 # Archived documentation
│
├── config/                       # Legacy directory (settings now in database)
│
├── data/
│   ├── face_references/         # Face recognition reference images (UUID filenames)
│   └── cache/profile_images/    # Locally cached creator avatars/banners (v13.2.0)
│
├── database/
│   └── (PostgreSQL via pg_adapter — was SQLite, migrated in v13.0.0)
│
├── modules/                      # 42 Python modules
│   ├── pg_adapter.py            # PostgreSQL adapter — drop-in sqlite3 replacement (v13.0.0)
│   ├── db_bootstrap.py          # Database backend bootstrap (v13.0.0)
│   ├── unified_database.py      # Database layer with file hash deduplication
│   ├── scheduler.py             # Randomized scheduling engine
│   ├── move_module.py           # File operations + duplicate detection + face recognition
│   ├── face_recognition_module.py # Face detection & matching
│   ├── semantic_search.py       # AI semantic search with CLIP embeddings
│   ├── pushover_notifier.py     # Push notifications
│   ├── download_manager.py      # Multi-threaded downloads
│   ├── date_utils.py            # EXIF timestamp handling
│   ├── instaloader_module.py    # Instagram API method
│   ├── instagram_client_module.py # Instagram direct API client (v13.4.0, curl_cffi)
│   ├── fastdl_module.py         # Instagram web scraper (640×640)
│   ├── imginn_module.py         # Instagram alternative scraper
│   ├── toolzu_module.py         # High-res Instagram (1920×1440)
│   ├── snapchat_scraper.py      # Snapchat direct scraper (Playwright, optional proxy)
│   ├── snapchat_client_module.py # Snapchat direct API client (v13.4.0, curl_cffi)
│   ├── tiktok_module.py         # TikTok via yt-dlp
│   ├── forum_downloader.py      # Multi-forum support
│   ├── forum_db_adapter.py      # Forum database adapter
│   └── tiktok_db_adapter.py     # TikTok database adapter
│
├── scripts/                      # Utility scripts
│   ├── add_reference_face.py    # Add face to recognition database
│   ├── test_face_recognition.py # Test face matching
│   └── create-version-backup.sh # Version control backup
│
├── wrappers/                     # Subprocess isolation wrappers
│   ├── fastdl_subprocess_wrapper.py
│   ├── imginn_subprocess_wrapper.py
│   ├── instagram_client_subprocess_wrapper.py  # v13.4.0
│   ├── toolzu_subprocess_wrapper.py
│   ├── snapchat_subprocess_wrapper.py
│   ├── snapchat_client_subprocess_wrapper.py   # v13.4.0
│   └── forum_subprocess_wrapper.py
│
├── utilities/                    # Maintenance utilities
│   ├── db_manager.py                    # Database CLI management (stats, list, delete)
│   ├── backfill_file_hashes.py         # Calculate hashes for existing files
│   ├── backfill_file_paths.py          # Find and update missing file paths
│   ├── scan_and_hash_files.py          # Scan directory and populate hashes
│   ├── cleanup_database_filenames.py   # Fix filename field (basename only)
│   ├── cleanup_recycle_duplicates.py   # Remove recycle bin duplicates with downloads
│   ├── cleanup_review_duplicates.py    # Remove review queue duplicates with downloads
│   ├── cleanup_recycle_internal_dupes.py # Remove internal recycle bin duplicates (v6.38.9)
│   └── test_recycle_duplicate_prevention.py # Test recycle bin duplicate prevention
│
├── scripts/                      # Shell scripts
│   ├── install.sh               # Installation script
│   ├── uninstall.sh            # Uninstallation script
│   ├── create-version-backup.sh # Create locked version backup
│   ├── add-backup-profile.sh   # Re-add backup-central profile
│   ├── cleanup-old-logs.py     # Log cleanup with universal logger (v6.38.10)
│   ├── cleanup-old-logs.sh     # Log cleanup (deprecated, use .py version)
│   ├── runme.sh                # Quick run script
│   └── run-with-xvfb.sh        # Run with virtual display
│
├── tests/                        # Test scripts
│   ├── test_all_notifications.py
│   ├── test_pushover.py
│   ├── test_instagram_notification.py
│   └── ... (7 test files)
│
├── archive/                      # Archived scripts
│   └── ... (one-time migration and debug scripts)
│
├── cookies/                      # Browser cookies for authentication
├── downloads/                    # Temporary download staging
├── logs/                         # Application logs
├── sessions/                     # Instagram session files
└── venv/                         # Python virtual environment

🛠️ Utilities

File Hash Management

# Scan a directory and populate file hashes
python3 utilities/scan_and_hash_files.py "/opt/immich/md/social media"

# Find duplicate files by hash
python3 utilities/scan_and_hash_files.py --find-duplicates

# Backfill hashes for existing database records
python3 utilities/backfill_file_hashes.py --backfill

# Remove duplicate files (keeps oldest)
python3 utilities/backfill_file_hashes.py --remove-duplicates --dry-run  # Test first
python3 utilities/backfill_file_hashes.py --remove-duplicates             # For real

# Cleanup duplicate files in recycle bin (v6.38.9)
python3 utilities/cleanup_recycle_duplicates.py      # Duplicates with downloads table
python3 utilities/cleanup_review_duplicates.py       # Duplicates with review queue
python3 utilities/cleanup_recycle_internal_dupes.py  # Internal recycle bin duplicates

# Test recycle bin duplicate prevention
python3 utilities/test_recycle_duplicate_prevention.py

Database Cleanup

# Fix filename field (remove paths, keep basename only)
python3 utilities/cleanup_database_filenames.py --dry-run  # Test first
python3 utilities/cleanup_database_filenames.py            # Apply changes

Log Cleanup (v6.38.10)

# Cleanup old logs (removes logs older than 7 days) - Uses universal logger!
python3 scripts/cleanup-old-logs.py

# Run with virtual environment
/opt/media-downloader/venv/bin/python3 scripts/cleanup-old-logs.py

# Add to crontab for automated daily cleanup at midnight
0 0 * * * /opt/media-downloader/venv/bin/python3 /opt/media-downloader/scripts/cleanup-old-logs.py

# Old bash version (deprecated, but still works)
bash scripts/cleanup-old-logs.sh

Benefits of Python version:

  • All cleanup actions logged to web UI (component: 'LogCleanup')
  • See exactly what files were removed and how much space was freed
  • Real-time visibility with live log updates
  • Error tracking and reporting

🎯 Supported Platforms

Instagram (5 Methods)

  1. Instagram Client - Direct API client (v13.4.0, recommended)

    • 10-20x faster than web scraping
    • curl_cffi with browser TLS fingerprinting
    • Posts via public GraphQL, stories/reels/tagged via authenticated REST
    • File naming compatible with ImgInn for dedup
  2. InstaLoader - Native Instagram API (requires login)

    • Highest reliability
    • Session-based authentication
    • TOTP 2FA support
  3. FastDL - Web scraper (no login required)

    • 640×640 resolution
    • Accurate timestamps
    • No authentication needed
  4. ImgInn - Alternative scraper with Cloudflare bypass

    • Backup method
    • 2captcha integration
  5. Toolzu - High-resolution downloads

    • 1920×1440 resolution
    • Can merge with FastDL for best quality + accurate dates

TikTok

  • Via yt-dlp (command-line tool)
  • Supports videos and audio

Snapchat (2 Methods)

  1. Snapchat Client - Direct API client (v13.4.0, recommended)

    • curl_cffi with browser TLS fingerprinting
    • Fast direct API downloads
  2. Snapchat Scraper - Playwright-based scraper

    • Optional proxy support
    • Stories and Spotlight content
    • Highlight stitching (combines multiple story segments)
    • EXIF metadata preservation
    • Per-scraper proxy configuration via Scrapers settings page

Forums (7 Types)

  • XenForo (1.x, 2.x)
  • vBulletin (3.x, 4.x, 5.x)
  • phpBB
  • Discourse
  • Invision Power Board (IPB 4.x)
  • MyBB
  • Simple Machines Forum (SMF)

⚙️ Advanced Features

File Hash Deduplication (NEW! )

Automatically detects and prevents duplicate file downloads:

[WARNING] Skipping duplicate file (same content): photo2.jpg
          [Already exists: photo1.jpg from instagram/username]
[DEBUG] Deleted duplicate: photo2.jpg

How it works:

  1. Calculates SHA256 hash of file before moving
  2. Checks database for existing file with same hash across:
    • Downloads table (active files)
    • Recycle bin (deleted files)
    • File inventory (review queue, media files)
  3. If duplicate found:
    • Logs warning with original file details
    • Deletes duplicate from temp directory
    • Skips the move operation
  4. Tracks duplicates in statistics

Recycle Bin Protection (v6.38.9):

  • Prevents internal duplicates when deleting same file from multiple locations
  • If hash already exists in recycle bin, keeps original and deletes duplicate
  • Automatically cleans up database records
  • Freed 143.80 MB from historical duplicate cleanup

Benefits:

  • Saves disk space across all storage locations
  • Prevents redundant storage in downloads, recycle bin, and review queue
  • Works across platforms (detects same file from different sources)
  • Fast O(1) lookups via indexed file_hash field
  • No manual intervention needed

Quality Merging

Combine the best of both worlds:

  • FastDL: Accurate timestamps (640×640)
  • Toolzu: High resolution (1920×1440)
  • Result: High-res image with correct EXIF timestamp

Scheduler

  • Randomized intervals within time windows (e.g., 3-6 hours)
  • Per-account customization
  • Persistent state across restarts
  • Missed run detection and recovery

Notifications

Pushover push notifications with:

  • Thumbnail image attachments
  • Platform/content type grouping
  • Custom priority levels
  • Per-platform enable/disable

Immich Integration

Automatic photo library scanning after downloads:

  • Direct API integration
  • Configurable scan triggers
  • Library ID targeting

📊 Database Schema

Main Tables

downloads (unified table for all platforms)

  • url_hash (SHA256) - URL deduplication
  • file_hash (SHA256) - File content deduplication NEW
  • platform, source, content_type
  • filename, file_path, file_size
  • post_date, download_date
  • metadata (JSON)

forum_threads, forum_posts

  • Thread tracking and monitoring
  • Post-level granularity

scheduler_state

  • Persistent task scheduling
  • Next run times, intervals

download_queue

  • Queue-based download management
  • Priority levels, retry logic

Indexes

  • 17 optimized indexes for fast queries
  • file_hash index for O(1) duplicate detection
  • Composite indexes for common queries

🌐 Web GUI

Stack: React 18 + TypeScript + Tailwind CSS + FastAPI + TanStack Query

35 Pages including:

  • Dashboard with real-time stats and scheduler monitoring
  • Media, Downloads, and Review pages with batch operations
  • Paid Content: Dashboard, Feed, Creators, Messages, Settings, Queue, RecycleBin, Notifications
  • Private Gallery: Encrypted media gallery with person management
  • Configuration, Logs, Internet Discovery, and more
  • Dark/Light theme, mobile responsive, PWA support

🔧 Configuration

Main Config File

config/settings.json - 100+ configurable parameters

Key Sections:

  • instagram, fastdl, imginn, toolzu - Instagram methods
  • snapchat, tiktok - Other platforms
  • forums - Forum configurations
  • pushover - Notification settings
  • immich - Photo library integration
  • scheduler - Scheduling parameters

Environment Variables

# Optional OMDB API key for date extraction
OMDB_API_KEY=your_api_key

# Display for headless browser automation
DISPLAY=:100

📝 Logging

Logs are written to logs/ directory with timestamps:

logs/
├── media-downloader_20251025.log
├── media-downloader_20251024.log
└── ...

Log Levels:

  • DEBUG: Verbose operational details
  • INFO: Normal operations
  • WARNING: Potential issues (duplicates, skipped files)
  • ERROR: Failures and exceptions

Format:

[2025-10-25 12:34:56] [Module] [LEVEL] Message

🔐 Security

Authentication

  • Instagram: Session-based with TOTP 2FA support
  • Forums: Cookie-based authentication
  • 2captcha: API key for CAPTCHA solving

Files

  • Cookies stored in cookies/ directory
  • Sessions stored in sessions/ directory
  • NEVER commit these to version control

Browser Automation

  • Playwright with stealth mode
  • Virtual display (Xvfb) for headless-headed mode
  • Custom browser path: .playwright/

🐛 Troubleshooting

Common Issues

"Database is locked"

  • The database uses WAL mode with connection pooling
  • Retry logic handles temporary locks
  • Check for long-running queries

"Playwright browser not found"

playwright install chromium

"Import error: No module named 'X'"

pip install -r requirements.txt

"Duplicate detection not working"

  • Ensure file_hash field is populated
  • Run utilities/scan_and_hash_files.py to backfill hashes

📚 Documentation

User Documentation

Developer Documentation


🔄 Version Updates

To update the version number across all files:

# Automated update (recommended)
bash scripts/update-version.sh 6.7.0

# Then manually update:
# - data/changelog.json
# - CHANGELOG.md

# Restart services
sudo systemctl restart media-downloader-api

# Create version backup
bash scripts/create-version-backup.sh

See VERSION_UPDATE.md for quick guide or docs/VERSION_UPDATE_CHECKLIST.md for complete checklist.


🧪 Testing

Test scripts are located in tests/ directory:

# Test Pushover notifications
python3 tests/test_pushover.py

# Test all notifications
python3 tests/test_all_notifications.py

# Test Instagram notifications
python3 tests/test_instagram_notification.py

📦 Dependencies

Core:

  • Python 3.12+
  • instaloader (Instagram API)
  • yt-dlp (TikTok/video downloads)
  • playwright (browser automation)
  • requests, beautifulsoup4 (HTTP/parsing)
  • PostgreSQL 16+ with psycopg2 (database, v13.0.0+)

Optional:

  • 2captcha-python (CAPTCHA solving)
  • pyotp (2FA TOTP)
  • Pillow (image manipulation)
  • ffmpeg (video frame extraction)

See requirements.txt for full list.


🤝 Contributing

This is a personal media archival project. For questions or suggestions, refer to the documentation in this repository.


📄 License

Private use. Not licensed for redistribution.


🔄 Version History

v5.0 (October 25, 2025)

  • NEW: File hash-based deduplication (SHA256)
  • NEW: Automatic duplicate detection and deletion
  • NEW: Comprehensive GUI design documentation
  • 🧹 IMPROVED: Directory structure organization
  • 🧹 IMPROVED: Database schema with file_hash field
  • 🧹 IMPROVED: Utilities for hash management
  • 📊 STATS: 213 files hashed, 30 duplicate groups found

v4.x (Previous)

  • Multi-platform support
  • Quality merging
  • Scheduler with randomization
  • Pushover notifications
  • Immich integration

📞 Support

For issues or questions:

  1. Check logs in logs/ directory
  2. Review configuration in config/settings.json
  3. Refer to GUI_DESIGN_PLAN.md for GUI development
  4. Check archive/ for historical documentation

Happy Archiving! 📸🎥

Description
No description provided
Readme 2.4 MiB
Languages
Python 63.6%
TypeScript 34.7%
Shell 0.9%
JavaScript 0.5%
CSS 0.3%