958 lines
32 KiB
Markdown
958 lines
32 KiB
Markdown
# Smart Download Workflow with Face Recognition & Deduplication
|
|
|
|
**Your Perfect Workflow**: Download → Check Face → Check Duplicate → Auto-Sort or Review
|
|
|
|
---
|
|
|
|
## 🎯 Your Exact Requirements
|
|
|
|
### What You Want
|
|
|
|
1. **Download image**
|
|
2. **Check if face matches** (using Immich face recognition)
|
|
3. **Check if duplicate** (using existing SHA256 hash system)
|
|
4. **Decision**:
|
|
- ✅ **Match + Not Duplicate** → Move to final destination (`/faces/person_name/`)
|
|
- ⚠️ **No Match OR Duplicate** → Move to holding/review directory (`/faces/review/`)
|
|
|
|
### Why This Makes Sense
|
|
|
|
✅ **Automatic for good images** - Hands-off for images you want
|
|
✅ **Manual review for uncertain** - You decide on edge cases
|
|
✅ **No duplicates** - Leverages existing deduplication system
|
|
✅ **Clean organization** - Final destination is curated, high-quality
|
|
✅ **Nothing lost** - Everything goes somewhere (review or final)
|
|
|
|
---
|
|
|
|
## 🏗️ Complete Workflow Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ DOWNLOAD IMAGE │
|
|
└───────────────────────────┬─────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ STEP 1: Calculate SHA256 Hash │
|
|
└───────────────────────────┬─────────────────────────────────────┘
|
|
│
|
|
▼
|
|
┌───────────────┐
|
|
│ Is Duplicate? │
|
|
└───────┬───────┘
|
|
│
|
|
┌───────────┴────────────┐
|
|
│ │
|
|
YES NO
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────┐ ┌─────────────────┐
|
|
│ Move to │ │ STEP 2: Trigger │
|
|
│ REVIEW/ │ │ Immich Scan │
|
|
│ duplicates/ │ └────────┬────────┘
|
|
└─────────────┘ │
|
|
▼
|
|
┌───────────────┐
|
|
│ Wait for Face │
|
|
│ Detection │
|
|
└───────┬───────┘
|
|
│
|
|
▼
|
|
┌───────────────────┐
|
|
│ Query Immich DB: │
|
|
│ Who's in photo? │
|
|
└───────┬───────────┘
|
|
│
|
|
┌────────────────┴────────────────┐
|
|
│ │
|
|
IDENTIFIED NOT IDENTIFIED
|
|
(in whitelist) (unknown/unwanted)
|
|
│ │
|
|
▼ ▼
|
|
┌─────────────────┐ ┌─────────────────┐
|
|
│ Move to FINAL │ │ Move to REVIEW/ │
|
|
│ /faces/john/ │ │ unidentified/ │
|
|
└─────────────────┘ └─────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────┐
|
|
│ Update Database │
|
|
│ - Record path │
|
|
│ - Record person │
|
|
│ - Mark complete │
|
|
└─────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 📁 Directory Structure
|
|
|
|
```
|
|
/mnt/storage/Downloads/
|
|
│
|
|
├── temp_downloads/ # Temporary download location
|
|
│ └── [images downloaded here first]
|
|
│
|
|
├── faces/ # Final curated collection
|
|
│ ├── john_doe/ # Auto-sorted, verified
|
|
│ │ ├── 20250131_120000.jpg
|
|
│ │ └── 20250131_130000.jpg
|
|
│ │
|
|
│ ├── sarah_smith/ # Auto-sorted, verified
|
|
│ │ └── 20250131_140000.jpg
|
|
│ │
|
|
│ └── family_member/
|
|
│ └── 20250131_150000.jpg
|
|
│
|
|
└── review/ # Holding directory for manual review
|
|
├── duplicates/ # Duplicate images
|
|
│ ├── duplicate_20250131_120000.jpg
|
|
│ └── duplicate_20250131_130000.jpg
|
|
│
|
|
├── unidentified/ # No faces or unknown faces
|
|
│ ├── unknown_20250131_120000.jpg
|
|
│ └── noface_20250131_130000.jpg
|
|
│
|
|
├── low_confidence/ # Face detected but low match confidence
|
|
│ └── lowconf_20250131_120000.jpg
|
|
│
|
|
├── multiple_faces/ # Multiple people in image
|
|
│ └── multi_20250131_120000.jpg
|
|
│
|
|
└── unwanted_person/ # Blacklisted person detected
|
|
└── unwanted_20250131_120000.jpg
|
|
```
|
|
|
|
---
|
|
|
|
## 💻 Complete Implementation
|
|
|
|
### Core Smart Download Class
|
|
|
|
```python
|
|
#!/usr/bin/env python3
|
|
"""
|
|
Smart Download with Face Recognition & Deduplication
|
|
Downloads, checks faces, checks duplicates, auto-sorts or reviews
|
|
"""
|
|
|
|
import os
|
|
import shutil
|
|
import hashlib
|
|
import logging
|
|
import time
|
|
import sqlite3
|
|
from pathlib import Path
|
|
from datetime import datetime
|
|
from typing import Dict, Optional
|
|
|
|
logger = logging.getLogger(__name__)
|
|
|
|
|
|
class SmartDownloader:
|
|
"""Intelligent download with face recognition and deduplication"""
|
|
|
|
def __init__(self, config, immich_db, unified_db):
|
|
self.config = config
|
|
self.immich_db = immich_db
|
|
self.unified_db = unified_db
|
|
|
|
# Directories
|
|
self.temp_dir = config.get('smart_download', {}).get('temp_dir',
|
|
'/mnt/storage/Downloads/temp_downloads')
|
|
self.final_base = config.get('smart_download', {}).get('final_base',
|
|
'/mnt/storage/Downloads/faces')
|
|
self.review_base = config.get('smart_download', {}).get('review_base',
|
|
'/mnt/storage/Downloads/review')
|
|
|
|
# Whitelist
|
|
self.whitelist = config.get('smart_download', {}).get('whitelist', [])
|
|
self.blacklist = config.get('smart_download', {}).get('blacklist', [])
|
|
|
|
# Thresholds
|
|
self.min_confidence = config.get('smart_download', {}).get('min_confidence', 0.6)
|
|
self.immich_wait_time = config.get('smart_download', {}).get('immich_wait_time', 5)
|
|
|
|
# Create directories
|
|
self._create_directories()
|
|
|
|
def _create_directories(self):
|
|
"""Create all required directories"""
|
|
dirs = [
|
|
self.temp_dir,
|
|
self.final_base,
|
|
self.review_base,
|
|
os.path.join(self.review_base, 'duplicates'),
|
|
os.path.join(self.review_base, 'unidentified'),
|
|
os.path.join(self.review_base, 'low_confidence'),
|
|
os.path.join(self.review_base, 'multiple_faces'),
|
|
os.path.join(self.review_base, 'unwanted_person'),
|
|
]
|
|
|
|
for d in dirs:
|
|
os.makedirs(d, exist_ok=True)
|
|
|
|
def smart_download(self, url: str, source: str = None) -> Dict:
|
|
"""
|
|
Smart download workflow: Download → Check → Sort or Review
|
|
|
|
Args:
|
|
url: URL to download
|
|
source: Source identifier (e.g., 'instagram', 'forum')
|
|
|
|
Returns:
|
|
dict: {
|
|
'status': 'success'|'error',
|
|
'action': 'sorted'|'reviewed'|'skipped',
|
|
'destination': str,
|
|
'reason': str,
|
|
'person': str or None
|
|
}
|
|
"""
|
|
try:
|
|
# STEP 1: Download to temp
|
|
temp_path = self._download_to_temp(url)
|
|
if not temp_path:
|
|
return {'status': 'error', 'reason': 'download_failed'}
|
|
|
|
# STEP 2: Check for duplicates
|
|
file_hash = self._calculate_hash(temp_path)
|
|
if self._is_duplicate(file_hash):
|
|
return self._handle_duplicate(temp_path, file_hash)
|
|
|
|
# STEP 3: Trigger Immich scan
|
|
self._trigger_immich_scan(temp_path)
|
|
|
|
# STEP 4: Wait for Immich to process
|
|
time.sleep(self.immich_wait_time)
|
|
|
|
# STEP 5: Check faces
|
|
faces = self.immich_db.get_faces_for_file(temp_path)
|
|
|
|
# STEP 6: Make decision based on faces
|
|
return self._process_faces(temp_path, faces, file_hash, source)
|
|
|
|
except Exception as e:
|
|
logger.error(f"Smart download failed for {url}: {e}")
|
|
return {'status': 'error', 'reason': str(e)}
|
|
|
|
def _download_to_temp(self, url: str) -> Optional[str]:
|
|
"""Download file to temporary location"""
|
|
try:
|
|
# Use your existing download logic here
|
|
# For now, placeholder:
|
|
filename = f"temp_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jpg"
|
|
temp_path = os.path.join(self.temp_dir, filename)
|
|
|
|
# Download file (use requests, yt-dlp, etc.)
|
|
# download_file(url, temp_path)
|
|
|
|
logger.info(f"Downloaded to temp: {temp_path}")
|
|
return temp_path
|
|
|
|
except Exception as e:
|
|
logger.error(f"Download failed for {url}: {e}")
|
|
return None
|
|
|
|
def _calculate_hash(self, file_path: str) -> str:
|
|
"""Calculate SHA256 hash of file"""
|
|
sha256_hash = hashlib.sha256()
|
|
|
|
with open(file_path, "rb") as f:
|
|
for byte_block in iter(lambda: f.read(4096), b""):
|
|
sha256_hash.update(byte_block)
|
|
|
|
return sha256_hash.hexdigest()
|
|
|
|
def _is_duplicate(self, file_hash: str) -> bool:
|
|
"""Check if file hash already exists in database"""
|
|
with sqlite3.connect(self.unified_db.db_path) as conn:
|
|
cursor = conn.execute(
|
|
"SELECT COUNT(*) FROM downloads WHERE file_hash = ?",
|
|
(file_hash,)
|
|
)
|
|
count = cursor.fetchone()[0]
|
|
|
|
return count > 0
|
|
|
|
def _handle_duplicate(self, temp_path: str, file_hash: str) -> Dict:
|
|
"""Handle duplicate file - move to review/duplicates"""
|
|
filename = os.path.basename(temp_path)
|
|
review_path = os.path.join(
|
|
self.review_base,
|
|
'duplicates',
|
|
f"duplicate_{filename}"
|
|
)
|
|
|
|
shutil.move(temp_path, review_path)
|
|
logger.info(f"Duplicate detected: {filename} → review/duplicates/")
|
|
|
|
return {
|
|
'status': 'success',
|
|
'action': 'reviewed',
|
|
'destination': review_path,
|
|
'reason': 'duplicate',
|
|
'hash': file_hash
|
|
}
|
|
|
|
def _trigger_immich_scan(self, file_path: str):
|
|
"""Trigger Immich to scan new file"""
|
|
try:
|
|
import requests
|
|
|
|
immich_url = self.config.get('immich', {}).get('url')
|
|
api_key = self.config.get('immich', {}).get('api_key')
|
|
|
|
if immich_url and api_key:
|
|
response = requests.post(
|
|
f"{immich_url}/api/library/scan",
|
|
headers={'x-api-key': api_key}
|
|
)
|
|
logger.debug(f"Triggered Immich scan: {response.status_code}")
|
|
|
|
except Exception as e:
|
|
logger.warning(f"Could not trigger Immich scan: {e}")
|
|
|
|
def _process_faces(self, temp_path: str, faces: list, file_hash: str,
|
|
source: str = None) -> Dict:
|
|
"""
|
|
Process faces and decide: final destination or review
|
|
|
|
Returns:
|
|
dict with status, action, destination, reason
|
|
"""
|
|
filename = os.path.basename(temp_path)
|
|
|
|
# NO FACES DETECTED
|
|
if not faces:
|
|
return self._move_to_review(
|
|
temp_path,
|
|
'unidentified',
|
|
f"noface_{filename}",
|
|
'no_faces_detected'
|
|
)
|
|
|
|
# MULTIPLE FACES
|
|
if len(faces) > 1:
|
|
return self._move_to_review(
|
|
temp_path,
|
|
'multiple_faces',
|
|
f"multi_{filename}",
|
|
f'multiple_faces ({len(faces)} people)'
|
|
)
|
|
|
|
# SINGLE FACE - Process
|
|
face = faces[0]
|
|
person_name = face.get('person_name')
|
|
confidence = face.get('confidence', 1.0)
|
|
|
|
# BLACKLIST CHECK
|
|
if self.blacklist and person_name in self.blacklist:
|
|
return self._move_to_review(
|
|
temp_path,
|
|
'unwanted_person',
|
|
f"unwanted_{filename}",
|
|
f'blacklisted_person: {person_name}'
|
|
)
|
|
|
|
# WHITELIST CHECK
|
|
if self.whitelist and person_name not in self.whitelist:
|
|
return self._move_to_review(
|
|
temp_path,
|
|
'unidentified',
|
|
f"notwhitelisted_{filename}",
|
|
f'not_in_whitelist: {person_name}'
|
|
)
|
|
|
|
# CONFIDENCE CHECK (if we have confidence data)
|
|
if confidence < self.min_confidence:
|
|
return self._move_to_review(
|
|
temp_path,
|
|
'low_confidence',
|
|
f"lowconf_{filename}",
|
|
f'low_confidence: {confidence:.2f}'
|
|
)
|
|
|
|
# ALL CHECKS PASSED - Move to final destination
|
|
return self._move_to_final(
|
|
temp_path,
|
|
person_name,
|
|
file_hash,
|
|
source
|
|
)
|
|
|
|
def _move_to_final(self, temp_path: str, person_name: str,
|
|
file_hash: str, source: str = None) -> Dict:
|
|
"""Move to final destination and record in database"""
|
|
|
|
# Create person directory
|
|
person_dir_name = self._sanitize_name(person_name)
|
|
person_dir = os.path.join(self.final_base, person_dir_name)
|
|
os.makedirs(person_dir, exist_ok=True)
|
|
|
|
# Move file
|
|
filename = os.path.basename(temp_path)
|
|
final_path = os.path.join(person_dir, filename)
|
|
|
|
# Handle duplicates in destination
|
|
if os.path.exists(final_path):
|
|
base, ext = os.path.splitext(filename)
|
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
|
filename = f"{base}_{timestamp}{ext}"
|
|
final_path = os.path.join(person_dir, filename)
|
|
|
|
shutil.move(temp_path, final_path)
|
|
|
|
# Record in database
|
|
self._record_download(final_path, person_name, file_hash, source)
|
|
|
|
logger.info(f"✓ Auto-sorted: {filename} → {person_name}/")
|
|
|
|
return {
|
|
'status': 'success',
|
|
'action': 'sorted',
|
|
'destination': final_path,
|
|
'reason': 'face_match_verified',
|
|
'person': person_name,
|
|
'hash': file_hash
|
|
}
|
|
|
|
def _move_to_review(self, temp_path: str, category: str,
|
|
new_filename: str, reason: str) -> Dict:
|
|
"""Move to review directory for manual processing"""
|
|
|
|
review_dir = os.path.join(self.review_base, category)
|
|
review_path = os.path.join(review_dir, new_filename)
|
|
|
|
# Handle duplicates
|
|
if os.path.exists(review_path):
|
|
base, ext = os.path.splitext(new_filename)
|
|
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
|
new_filename = f"{base}_{timestamp}{ext}"
|
|
review_path = os.path.join(review_dir, new_filename)
|
|
|
|
shutil.move(temp_path, review_path)
|
|
|
|
logger.info(f"⚠ Needs review: {new_filename} → review/{category}/ ({reason})")
|
|
|
|
return {
|
|
'status': 'success',
|
|
'action': 'reviewed',
|
|
'destination': review_path,
|
|
'reason': reason,
|
|
'category': category
|
|
}
|
|
|
|
def _record_download(self, file_path: str, person_name: str,
|
|
file_hash: str, source: str = None):
|
|
"""Record successful download in database"""
|
|
|
|
with sqlite3.connect(self.unified_db.db_path) as conn:
|
|
conn.execute("""
|
|
INSERT INTO downloads
|
|
(file_path, filename, file_hash, source, person_name,
|
|
download_date, auto_sorted)
|
|
VALUES (?, ?, ?, ?, ?, ?, 1)
|
|
""", (
|
|
file_path,
|
|
os.path.basename(file_path),
|
|
file_hash,
|
|
source,
|
|
person_name,
|
|
datetime.now().isoformat()
|
|
))
|
|
conn.commit()
|
|
|
|
def _sanitize_name(self, name: str) -> str:
|
|
"""Convert person name to safe directory name"""
|
|
import re
|
|
safe = re.sub(r'[^\w\s-]', '', name)
|
|
safe = re.sub(r'[-\s]+', '_', safe)
|
|
return safe.lower()
|
|
|
|
# REVIEW QUEUE MANAGEMENT
|
|
|
|
def get_review_queue(self, category: str = None) -> list:
|
|
"""Get files in review queue"""
|
|
|
|
if category:
|
|
review_dir = os.path.join(self.review_base, category)
|
|
categories = [category]
|
|
else:
|
|
categories = ['duplicates', 'unidentified', 'low_confidence',
|
|
'multiple_faces', 'unwanted_person']
|
|
|
|
queue = []
|
|
|
|
for cat in categories:
|
|
cat_dir = os.path.join(self.review_base, cat)
|
|
if os.path.exists(cat_dir):
|
|
files = os.listdir(cat_dir)
|
|
for f in files:
|
|
queue.append({
|
|
'category': cat,
|
|
'filename': f,
|
|
'path': os.path.join(cat_dir, f),
|
|
'size': os.path.getsize(os.path.join(cat_dir, f)),
|
|
'modified': os.path.getmtime(os.path.join(cat_dir, f))
|
|
})
|
|
|
|
return sorted(queue, key=lambda x: x['modified'], reverse=True)
|
|
|
|
def approve_review_item(self, file_path: str, person_name: str) -> Dict:
|
|
"""Manually approve a review item and move to final destination"""
|
|
|
|
if not os.path.exists(file_path):
|
|
return {'status': 'error', 'reason': 'file_not_found'}
|
|
|
|
# Calculate hash
|
|
file_hash = self._calculate_hash(file_path)
|
|
|
|
# Move to final destination
|
|
return self._move_to_final(file_path, person_name, file_hash, source='manual_review')
|
|
|
|
def reject_review_item(self, file_path: str) -> Dict:
|
|
"""Delete a review item"""
|
|
|
|
if not os.path.exists(file_path):
|
|
return {'status': 'error', 'reason': 'file_not_found'}
|
|
|
|
os.remove(file_path)
|
|
logger.info(f"Rejected and deleted: {file_path}")
|
|
|
|
return {
|
|
'status': 'success',
|
|
'action': 'deleted',
|
|
'path': file_path
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## ⚙️ Configuration
|
|
|
|
### Add to `config.json`:
|
|
|
|
```json
|
|
{
|
|
"smart_download": {
|
|
"enabled": true,
|
|
|
|
"directories": {
|
|
"temp_dir": "/mnt/storage/Downloads/temp_downloads",
|
|
"final_base": "/mnt/storage/Downloads/faces",
|
|
"review_base": "/mnt/storage/Downloads/review"
|
|
},
|
|
|
|
"whitelist": [
|
|
"john_doe",
|
|
"sarah_smith",
|
|
"family_member_1"
|
|
],
|
|
|
|
"blacklist": [
|
|
"ex_partner",
|
|
"stranger"
|
|
],
|
|
|
|
"thresholds": {
|
|
"min_confidence": 0.6,
|
|
"max_faces_per_image": 1
|
|
},
|
|
|
|
"immich": {
|
|
"wait_time_seconds": 5,
|
|
"trigger_scan": true,
|
|
"retry_if_no_faces": true,
|
|
"max_retries": 2
|
|
},
|
|
|
|
"deduplication": {
|
|
"check_hash": true,
|
|
"action_on_duplicate": "move_to_review"
|
|
},
|
|
|
|
"review_categories": {
|
|
"duplicates": true,
|
|
"unidentified": true,
|
|
"low_confidence": true,
|
|
"multiple_faces": true,
|
|
"unwanted_person": true
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
---
|
|
|
|
## 🔄 Integration with Existing Download System
|
|
|
|
### Modify Download Completion Hook
|
|
|
|
```python
|
|
def on_download_complete(url: str, temp_path: str, source: str):
|
|
"""
|
|
Called when download completes
|
|
Now uses smart download workflow
|
|
"""
|
|
|
|
if config.get('smart_download', {}).get('enabled', False):
|
|
# Use smart download workflow
|
|
smart = SmartDownloader(config, immich_db, unified_db)
|
|
result = smart.smart_download(url, source)
|
|
|
|
logger.info(f"Smart download result: {result}")
|
|
|
|
# Send notification
|
|
if result['action'] == 'sorted':
|
|
send_notification(
|
|
f"✓ Auto-sorted to {result['person']}",
|
|
result['destination']
|
|
)
|
|
elif result['action'] == 'reviewed':
|
|
send_notification(
|
|
f"⚠ Needs review: {result['reason']}",
|
|
result['destination']
|
|
)
|
|
|
|
return result
|
|
else:
|
|
# Fall back to old workflow
|
|
return legacy_download_handler(url, temp_path, source)
|
|
```
|
|
|
|
---
|
|
|
|
## 📊 Database Schema Addition
|
|
|
|
```sql
|
|
-- Add person_name and auto_sorted columns to downloads table
|
|
ALTER TABLE downloads ADD COLUMN person_name TEXT;
|
|
ALTER TABLE downloads ADD COLUMN auto_sorted INTEGER DEFAULT 0;
|
|
|
|
-- Create index for quick person lookups
|
|
CREATE INDEX idx_downloads_person ON downloads(person_name);
|
|
CREATE INDEX idx_downloads_auto_sorted ON downloads(auto_sorted);
|
|
|
|
-- Create review queue table
|
|
CREATE TABLE review_queue (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
file_path TEXT NOT NULL,
|
|
category TEXT NOT NULL, -- duplicates, unidentified, etc.
|
|
file_hash TEXT,
|
|
reason TEXT,
|
|
faces_detected INTEGER DEFAULT 0,
|
|
suggested_person TEXT,
|
|
created_at TEXT,
|
|
reviewed_at TEXT,
|
|
reviewed_by TEXT,
|
|
action TEXT -- approved, rejected, pending
|
|
);
|
|
|
|
CREATE INDEX idx_review_category ON review_queue(category);
|
|
CREATE INDEX idx_review_action ON review_queue(action);
|
|
```
|
|
|
|
---
|
|
|
|
## 🎨 Web UI - Review Queue Page
|
|
|
|
### Review Queue Interface
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────┐
|
|
│ Review Queue (42 items) │
|
|
├─────────────────────────────────────────────────────────────────┤
|
|
│ │
|
|
│ Filter: [All ▼] [Duplicates: 5] [Unidentified: 28] │
|
|
│ [Low Confidence: 6] [Multiple Faces: 3] │
|
|
│ │
|
|
│ ┌─────────────────────────────────────────────────────────┐ │
|
|
│ │ [Image Thumbnail] │ │
|
|
│ │ │ │
|
|
│ │ Category: Unidentified │ │
|
|
│ │ Reason: No faces detected by Immich │ │
|
|
│ │ File: instagram_profile_20250131_120000.jpg │ │
|
|
│ │ Size: 2.4 MB │ │
|
|
│ │ Downloaded: 2025-01-31 12:00:00 │ │
|
|
│ │ │ │
|
|
│ │ This is: [Select Person ▼] or [New Person...] │ │
|
|
│ │ │ │
|
|
│ │ [✓ Approve & Sort] [✗ Delete] [→ Skip] │ │
|
|
│ └─────────────────────────────────────────────────────────┘ │
|
|
│ │
|
|
│ [◄ Previous] 1 of 42 [Next ►] │
|
|
│ │
|
|
│ Bulk Actions: [Select All] [Delete Selected] [Export List] │
|
|
└─────────────────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
---
|
|
|
|
## 📡 API Endpoints (New)
|
|
|
|
```python
|
|
# Review Queue
|
|
GET /api/smart-download/review/queue # Get all review items
|
|
GET /api/smart-download/review/queue/{category} # By category
|
|
POST /api/smart-download/review/{id}/approve # Approve and move to person
|
|
POST /api/smart-download/review/{id}/reject # Delete item
|
|
GET /api/smart-download/review/stats # Queue statistics
|
|
|
|
# Smart Download Control
|
|
GET /api/smart-download/status
|
|
POST /api/smart-download/enable
|
|
POST /api/smart-download/disable
|
|
|
|
# Configuration
|
|
GET /api/smart-download/config
|
|
PUT /api/smart-download/config/whitelist
|
|
PUT /api/smart-download/config/blacklist
|
|
|
|
# Statistics
|
|
GET /api/smart-download/stats/today
|
|
GET /api/smart-download/stats/summary
|
|
```
|
|
|
|
---
|
|
|
|
## 📈 Statistics & Reporting
|
|
|
|
```python
|
|
def get_smart_download_stats(days: int = 30) -> dict:
|
|
"""Get smart download statistics"""
|
|
|
|
with sqlite3.connect(db_path) as conn:
|
|
# Auto-sorted count
|
|
auto_sorted = conn.execute("""
|
|
SELECT COUNT(*)
|
|
FROM downloads
|
|
WHERE auto_sorted = 1
|
|
AND download_date >= datetime('now', ? || ' days')
|
|
""", (f'-{days}',)).fetchone()[0]
|
|
|
|
# Review queue count
|
|
in_review = conn.execute("""
|
|
SELECT COUNT(*)
|
|
FROM review_queue
|
|
WHERE action = 'pending'
|
|
""").fetchone()[0]
|
|
|
|
# By person
|
|
by_person = conn.execute("""
|
|
SELECT person_name, COUNT(*)
|
|
FROM downloads
|
|
WHERE auto_sorted = 1
|
|
AND download_date >= datetime('now', ? || ' days')
|
|
GROUP BY person_name
|
|
""", (f'-{days}',)).fetchall()
|
|
|
|
# By review category
|
|
by_category = conn.execute("""
|
|
SELECT category, COUNT(*)
|
|
FROM review_queue
|
|
WHERE action = 'pending'
|
|
GROUP BY category
|
|
""").fetchall()
|
|
|
|
return {
|
|
'auto_sorted': auto_sorted,
|
|
'in_review': in_review,
|
|
'by_person': dict(by_person),
|
|
'by_category': dict(by_category),
|
|
'success_rate': (auto_sorted / (auto_sorted + in_review) * 100) if (auto_sorted + in_review) > 0 else 0
|
|
}
|
|
|
|
# Example output:
|
|
# {
|
|
# 'auto_sorted': 145,
|
|
# 'in_review': 23,
|
|
# 'by_person': {'john_doe': 85, 'sarah_smith': 60},
|
|
# 'by_category': {'unidentified': 15, 'duplicates': 5, 'multiple_faces': 3},
|
|
# 'success_rate': 86.3
|
|
# }
|
|
```
|
|
|
|
---
|
|
|
|
## 🎯 Example Usage
|
|
|
|
### Example 1: Download Instagram Profile
|
|
|
|
```python
|
|
# Download profile with smart workflow
|
|
downloader = SmartDownloader(config, immich_db, unified_db)
|
|
|
|
images = get_instagram_profile_images('username')
|
|
|
|
results = {
|
|
'sorted': 0,
|
|
'reviewed': 0,
|
|
'errors': 0
|
|
}
|
|
|
|
for image_url in images:
|
|
result = downloader.smart_download(image_url, source='instagram')
|
|
|
|
if result['action'] == 'sorted':
|
|
results['sorted'] += 1
|
|
print(f"✓ {result['person']}: {result['destination']}")
|
|
elif result['action'] == 'reviewed':
|
|
results['reviewed'] += 1
|
|
print(f"⚠ Review needed ({result['reason']}): {result['destination']}")
|
|
else:
|
|
results['errors'] += 1
|
|
|
|
print(f"\nResults: {results['sorted']} sorted, {results['reviewed']} need review")
|
|
|
|
# Output:
|
|
# ✓ john_doe: /faces/john_doe/image1.jpg
|
|
# ✓ john_doe: /faces/john_doe/image2.jpg
|
|
# ⚠ Review needed (not_in_whitelist): /review/unidentified/image3.jpg
|
|
# ⚠ Review needed (duplicate): /review/duplicates/image4.jpg
|
|
# ✓ john_doe: /faces/john_doe/image5.jpg
|
|
#
|
|
# Results: 3 sorted, 2 need review
|
|
```
|
|
|
|
### Example 2: Process Review Queue
|
|
|
|
```python
|
|
# Get pending reviews
|
|
queue = downloader.get_review_queue()
|
|
|
|
print(f"Review queue: {len(queue)} items")
|
|
|
|
for item in queue:
|
|
print(f"\nFile: {item['filename']}")
|
|
print(f"Category: {item['category']}")
|
|
print(f"Path: {item['path']}")
|
|
|
|
# Manual decision
|
|
action = input("Action (approve/reject/skip): ")
|
|
|
|
if action == 'approve':
|
|
person = input("Person name: ")
|
|
result = downloader.approve_review_item(item['path'], person)
|
|
print(f"✓ Approved and sorted to {person}")
|
|
|
|
elif action == 'reject':
|
|
downloader.reject_review_item(item['path'])
|
|
print(f"✗ Deleted")
|
|
|
|
else:
|
|
print(f"→ Skipped")
|
|
```
|
|
|
|
---
|
|
|
|
## ✅ Advantages of This System
|
|
|
|
### 1. **Fully Automated for Good Cases**
|
|
- Matching face + not duplicate = auto-sorted
|
|
- No manual intervention needed for 80-90% of images
|
|
|
|
### 2. **Safe Review for Edge Cases**
|
|
- Duplicates flagged for review
|
|
- Unknown faces queued for identification
|
|
- Multiple faces queued for decision
|
|
|
|
### 3. **Leverages Existing Systems**
|
|
- Uses your SHA256 deduplication
|
|
- Uses Immich's face recognition
|
|
- Clean integration
|
|
|
|
### 4. **Nothing Lost**
|
|
- Every image goes somewhere
|
|
- Easy to find and review
|
|
- Can always approve later
|
|
|
|
### 5. **Flexible Configuration**
|
|
- Whitelist/blacklist
|
|
- Confidence thresholds
|
|
- Review categories
|
|
|
|
### 6. **Clear Audit Trail**
|
|
- Database tracks everything
|
|
- Statistics available
|
|
- Can generate reports
|
|
|
|
---
|
|
|
|
## 🚀 Implementation Timeline
|
|
|
|
### Week 1: Core Workflow
|
|
- [ ] Create SmartDownloader class
|
|
- [ ] Implement download to temp
|
|
- [ ] Add hash checking
|
|
- [ ] Basic face checking
|
|
- [ ] Move to final/review logic
|
|
|
|
### Week 2: Immich Integration
|
|
- [ ] Connect to Immich DB
|
|
- [ ] Query face data
|
|
- [ ] Trigger Immich scans
|
|
- [ ] Handle face results
|
|
|
|
### Week 3: Review System
|
|
- [ ] Create review directories
|
|
- [ ] Review queue database
|
|
- [ ] Get/approve/reject methods
|
|
- [ ] Statistics
|
|
|
|
### Week 4: Web UI
|
|
- [ ] Review queue page
|
|
- [ ] Approve/reject interface
|
|
- [ ] Statistics dashboard
|
|
- [ ] Configuration page
|
|
|
|
### Week 5: Polish
|
|
- [ ] Error handling
|
|
- [ ] Notifications
|
|
- [ ] Documentation
|
|
- [ ] Testing
|
|
|
|
---
|
|
|
|
## 🎯 Success Metrics
|
|
|
|
After implementation, track:
|
|
|
|
- **Auto-sort rate**: % of images auto-sorted vs reviewed
|
|
- **Target**: >80% auto-sorted
|
|
- **Duplicate catch rate**: % of duplicates caught
|
|
- **Target**: 100%
|
|
- **False positive rate**: % of incorrectly sorted images
|
|
- **Target**: <5%
|
|
- **Review queue size**: Average pending items
|
|
- **Target**: <50 items
|
|
|
|
---
|
|
|
|
## ✅ Your Perfect Workflow - Summary
|
|
|
|
```
|
|
Download → Hash Check → Face Check → Decision
|
|
↓ ↓
|
|
Duplicate? Matches?
|
|
↓ ↓
|
|
┌───┴───┐ ┌───┴────┐
|
|
YES NO YES NO
|
|
↓ ↓ ↓ ↓
|
|
REVIEW Continue FINAL REVIEW
|
|
```
|
|
|
|
**Final Destinations**:
|
|
- ✅ `/faces/john_doe/` - Verified, auto-sorted
|
|
- ⚠️ `/review/duplicates/` - Needs duplicate review
|
|
- ⚠️ `/review/unidentified/` - Needs face identification
|
|
- ⚠️ `/review/low_confidence/` - Low match confidence
|
|
- ⚠️ `/review/multiple_faces/` - Multiple people
|
|
|
|
**This is exactly what you wanted!**
|
|
|
|
---
|
|
|
|
**Last Updated**: 2025-10-31
|