957
docs/archive/AI_SMART_DOWNLOAD_WORKFLOW.md
Normal file
957
docs/archive/AI_SMART_DOWNLOAD_WORKFLOW.md
Normal file
@@ -0,0 +1,957 @@
|
||||
# Smart Download Workflow with Face Recognition & Deduplication
|
||||
|
||||
**Your Perfect Workflow**: Download → Check Face → Check Duplicate → Auto-Sort or Review
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Your Exact Requirements
|
||||
|
||||
### What You Want
|
||||
|
||||
1. **Download image**
|
||||
2. **Check if face matches** (using Immich face recognition)
|
||||
3. **Check if duplicate** (using existing SHA256 hash system)
|
||||
4. **Decision**:
|
||||
- ✅ **Match + Not Duplicate** → Move to final destination (`/faces/person_name/`)
|
||||
- ⚠️ **No Match OR Duplicate** → Move to holding/review directory (`/faces/review/`)
|
||||
|
||||
### Why This Makes Sense
|
||||
|
||||
✅ **Automatic for good images** - Hands-off for images you want
|
||||
✅ **Manual review for uncertain** - You decide on edge cases
|
||||
✅ **No duplicates** - Leverages existing deduplication system
|
||||
✅ **Clean organization** - Final destination is curated, high-quality
|
||||
✅ **Nothing lost** - Everything goes somewhere (review or final)
|
||||
|
||||
---
|
||||
|
||||
## 🏗️ Complete Workflow Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ DOWNLOAD IMAGE │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ STEP 1: Calculate SHA256 Hash │
|
||||
└───────────────────────────┬─────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Is Duplicate? │
|
||||
└───────┬───────┘
|
||||
│
|
||||
┌───────────┴────────────┐
|
||||
│ │
|
||||
YES NO
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────────┐ ┌─────────────────┐
|
||||
│ Move to │ │ STEP 2: Trigger │
|
||||
│ REVIEW/ │ │ Immich Scan │
|
||||
│ duplicates/ │ └────────┬────────┘
|
||||
└─────────────┘ │
|
||||
▼
|
||||
┌───────────────┐
|
||||
│ Wait for Face │
|
||||
│ Detection │
|
||||
└───────┬───────┘
|
||||
│
|
||||
▼
|
||||
┌───────────────────┐
|
||||
│ Query Immich DB: │
|
||||
│ Who's in photo? │
|
||||
└───────┬───────────┘
|
||||
│
|
||||
┌────────────────┴────────────────┐
|
||||
│ │
|
||||
IDENTIFIED NOT IDENTIFIED
|
||||
(in whitelist) (unknown/unwanted)
|
||||
│ │
|
||||
▼ ▼
|
||||
┌─────────────────┐ ┌─────────────────┐
|
||||
│ Move to FINAL │ │ Move to REVIEW/ │
|
||||
│ /faces/john/ │ │ unidentified/ │
|
||||
└─────────────────┘ └─────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────┐
|
||||
│ Update Database │
|
||||
│ - Record path │
|
||||
│ - Record person │
|
||||
│ - Mark complete │
|
||||
└─────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 Directory Structure
|
||||
|
||||
```
|
||||
/mnt/storage/Downloads/
|
||||
│
|
||||
├── temp_downloads/ # Temporary download location
|
||||
│ └── [images downloaded here first]
|
||||
│
|
||||
├── faces/ # Final curated collection
|
||||
│ ├── john_doe/ # Auto-sorted, verified
|
||||
│ │ ├── 20250131_120000.jpg
|
||||
│ │ └── 20250131_130000.jpg
|
||||
│ │
|
||||
│ ├── sarah_smith/ # Auto-sorted, verified
|
||||
│ │ └── 20250131_140000.jpg
|
||||
│ │
|
||||
│ └── family_member/
|
||||
│ └── 20250131_150000.jpg
|
||||
│
|
||||
└── review/ # Holding directory for manual review
|
||||
├── duplicates/ # Duplicate images
|
||||
│ ├── duplicate_20250131_120000.jpg
|
||||
│ └── duplicate_20250131_130000.jpg
|
||||
│
|
||||
├── unidentified/ # No faces or unknown faces
|
||||
│ ├── unknown_20250131_120000.jpg
|
||||
│ └── noface_20250131_130000.jpg
|
||||
│
|
||||
├── low_confidence/ # Face detected but low match confidence
|
||||
│ └── lowconf_20250131_120000.jpg
|
||||
│
|
||||
├── multiple_faces/ # Multiple people in image
|
||||
│ └── multi_20250131_120000.jpg
|
||||
│
|
||||
└── unwanted_person/ # Blacklisted person detected
|
||||
└── unwanted_20250131_120000.jpg
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 💻 Complete Implementation
|
||||
|
||||
### Core Smart Download Class
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
"""
|
||||
Smart Download with Face Recognition & Deduplication
|
||||
Downloads, checks faces, checks duplicates, auto-sorts or reviews
|
||||
"""
|
||||
|
||||
import os
|
||||
import shutil
|
||||
import hashlib
|
||||
import logging
|
||||
import time
|
||||
import sqlite3
|
||||
from pathlib import Path
|
||||
from datetime import datetime
|
||||
from typing import Dict, Optional
|
||||
|
||||
logger = logging.getLogger(__name__)
|
||||
|
||||
|
||||
class SmartDownloader:
|
||||
"""Intelligent download with face recognition and deduplication"""
|
||||
|
||||
def __init__(self, config, immich_db, unified_db):
|
||||
self.config = config
|
||||
self.immich_db = immich_db
|
||||
self.unified_db = unified_db
|
||||
|
||||
# Directories
|
||||
self.temp_dir = config.get('smart_download', {}).get('temp_dir',
|
||||
'/mnt/storage/Downloads/temp_downloads')
|
||||
self.final_base = config.get('smart_download', {}).get('final_base',
|
||||
'/mnt/storage/Downloads/faces')
|
||||
self.review_base = config.get('smart_download', {}).get('review_base',
|
||||
'/mnt/storage/Downloads/review')
|
||||
|
||||
# Whitelist
|
||||
self.whitelist = config.get('smart_download', {}).get('whitelist', [])
|
||||
self.blacklist = config.get('smart_download', {}).get('blacklist', [])
|
||||
|
||||
# Thresholds
|
||||
self.min_confidence = config.get('smart_download', {}).get('min_confidence', 0.6)
|
||||
self.immich_wait_time = config.get('smart_download', {}).get('immich_wait_time', 5)
|
||||
|
||||
# Create directories
|
||||
self._create_directories()
|
||||
|
||||
def _create_directories(self):
|
||||
"""Create all required directories"""
|
||||
dirs = [
|
||||
self.temp_dir,
|
||||
self.final_base,
|
||||
self.review_base,
|
||||
os.path.join(self.review_base, 'duplicates'),
|
||||
os.path.join(self.review_base, 'unidentified'),
|
||||
os.path.join(self.review_base, 'low_confidence'),
|
||||
os.path.join(self.review_base, 'multiple_faces'),
|
||||
os.path.join(self.review_base, 'unwanted_person'),
|
||||
]
|
||||
|
||||
for d in dirs:
|
||||
os.makedirs(d, exist_ok=True)
|
||||
|
||||
def smart_download(self, url: str, source: str = None) -> Dict:
|
||||
"""
|
||||
Smart download workflow: Download → Check → Sort or Review
|
||||
|
||||
Args:
|
||||
url: URL to download
|
||||
source: Source identifier (e.g., 'instagram', 'forum')
|
||||
|
||||
Returns:
|
||||
dict: {
|
||||
'status': 'success'|'error',
|
||||
'action': 'sorted'|'reviewed'|'skipped',
|
||||
'destination': str,
|
||||
'reason': str,
|
||||
'person': str or None
|
||||
}
|
||||
"""
|
||||
try:
|
||||
# STEP 1: Download to temp
|
||||
temp_path = self._download_to_temp(url)
|
||||
if not temp_path:
|
||||
return {'status': 'error', 'reason': 'download_failed'}
|
||||
|
||||
# STEP 2: Check for duplicates
|
||||
file_hash = self._calculate_hash(temp_path)
|
||||
if self._is_duplicate(file_hash):
|
||||
return self._handle_duplicate(temp_path, file_hash)
|
||||
|
||||
# STEP 3: Trigger Immich scan
|
||||
self._trigger_immich_scan(temp_path)
|
||||
|
||||
# STEP 4: Wait for Immich to process
|
||||
time.sleep(self.immich_wait_time)
|
||||
|
||||
# STEP 5: Check faces
|
||||
faces = self.immich_db.get_faces_for_file(temp_path)
|
||||
|
||||
# STEP 6: Make decision based on faces
|
||||
return self._process_faces(temp_path, faces, file_hash, source)
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Smart download failed for {url}: {e}")
|
||||
return {'status': 'error', 'reason': str(e)}
|
||||
|
||||
def _download_to_temp(self, url: str) -> Optional[str]:
|
||||
"""Download file to temporary location"""
|
||||
try:
|
||||
# Use your existing download logic here
|
||||
# For now, placeholder:
|
||||
filename = f"temp_{datetime.now().strftime('%Y%m%d_%H%M%S')}.jpg"
|
||||
temp_path = os.path.join(self.temp_dir, filename)
|
||||
|
||||
# Download file (use requests, yt-dlp, etc.)
|
||||
# download_file(url, temp_path)
|
||||
|
||||
logger.info(f"Downloaded to temp: {temp_path}")
|
||||
return temp_path
|
||||
|
||||
except Exception as e:
|
||||
logger.error(f"Download failed for {url}: {e}")
|
||||
return None
|
||||
|
||||
def _calculate_hash(self, file_path: str) -> str:
|
||||
"""Calculate SHA256 hash of file"""
|
||||
sha256_hash = hashlib.sha256()
|
||||
|
||||
with open(file_path, "rb") as f:
|
||||
for byte_block in iter(lambda: f.read(4096), b""):
|
||||
sha256_hash.update(byte_block)
|
||||
|
||||
return sha256_hash.hexdigest()
|
||||
|
||||
def _is_duplicate(self, file_hash: str) -> bool:
|
||||
"""Check if file hash already exists in database"""
|
||||
with sqlite3.connect(self.unified_db.db_path) as conn:
|
||||
cursor = conn.execute(
|
||||
"SELECT COUNT(*) FROM downloads WHERE file_hash = ?",
|
||||
(file_hash,)
|
||||
)
|
||||
count = cursor.fetchone()[0]
|
||||
|
||||
return count > 0
|
||||
|
||||
def _handle_duplicate(self, temp_path: str, file_hash: str) -> Dict:
|
||||
"""Handle duplicate file - move to review/duplicates"""
|
||||
filename = os.path.basename(temp_path)
|
||||
review_path = os.path.join(
|
||||
self.review_base,
|
||||
'duplicates',
|
||||
f"duplicate_{filename}"
|
||||
)
|
||||
|
||||
shutil.move(temp_path, review_path)
|
||||
logger.info(f"Duplicate detected: {filename} → review/duplicates/")
|
||||
|
||||
return {
|
||||
'status': 'success',
|
||||
'action': 'reviewed',
|
||||
'destination': review_path,
|
||||
'reason': 'duplicate',
|
||||
'hash': file_hash
|
||||
}
|
||||
|
||||
def _trigger_immich_scan(self, file_path: str):
|
||||
"""Trigger Immich to scan new file"""
|
||||
try:
|
||||
import requests
|
||||
|
||||
immich_url = self.config.get('immich', {}).get('url')
|
||||
api_key = self.config.get('immich', {}).get('api_key')
|
||||
|
||||
if immich_url and api_key:
|
||||
response = requests.post(
|
||||
f"{immich_url}/api/library/scan",
|
||||
headers={'x-api-key': api_key}
|
||||
)
|
||||
logger.debug(f"Triggered Immich scan: {response.status_code}")
|
||||
|
||||
except Exception as e:
|
||||
logger.warning(f"Could not trigger Immich scan: {e}")
|
||||
|
||||
def _process_faces(self, temp_path: str, faces: list, file_hash: str,
|
||||
source: str = None) -> Dict:
|
||||
"""
|
||||
Process faces and decide: final destination or review
|
||||
|
||||
Returns:
|
||||
dict with status, action, destination, reason
|
||||
"""
|
||||
filename = os.path.basename(temp_path)
|
||||
|
||||
# NO FACES DETECTED
|
||||
if not faces:
|
||||
return self._move_to_review(
|
||||
temp_path,
|
||||
'unidentified',
|
||||
f"noface_{filename}",
|
||||
'no_faces_detected'
|
||||
)
|
||||
|
||||
# MULTIPLE FACES
|
||||
if len(faces) > 1:
|
||||
return self._move_to_review(
|
||||
temp_path,
|
||||
'multiple_faces',
|
||||
f"multi_{filename}",
|
||||
f'multiple_faces ({len(faces)} people)'
|
||||
)
|
||||
|
||||
# SINGLE FACE - Process
|
||||
face = faces[0]
|
||||
person_name = face.get('person_name')
|
||||
confidence = face.get('confidence', 1.0)
|
||||
|
||||
# BLACKLIST CHECK
|
||||
if self.blacklist and person_name in self.blacklist:
|
||||
return self._move_to_review(
|
||||
temp_path,
|
||||
'unwanted_person',
|
||||
f"unwanted_{filename}",
|
||||
f'blacklisted_person: {person_name}'
|
||||
)
|
||||
|
||||
# WHITELIST CHECK
|
||||
if self.whitelist and person_name not in self.whitelist:
|
||||
return self._move_to_review(
|
||||
temp_path,
|
||||
'unidentified',
|
||||
f"notwhitelisted_{filename}",
|
||||
f'not_in_whitelist: {person_name}'
|
||||
)
|
||||
|
||||
# CONFIDENCE CHECK (if we have confidence data)
|
||||
if confidence < self.min_confidence:
|
||||
return self._move_to_review(
|
||||
temp_path,
|
||||
'low_confidence',
|
||||
f"lowconf_{filename}",
|
||||
f'low_confidence: {confidence:.2f}'
|
||||
)
|
||||
|
||||
# ALL CHECKS PASSED - Move to final destination
|
||||
return self._move_to_final(
|
||||
temp_path,
|
||||
person_name,
|
||||
file_hash,
|
||||
source
|
||||
)
|
||||
|
||||
def _move_to_final(self, temp_path: str, person_name: str,
|
||||
file_hash: str, source: str = None) -> Dict:
|
||||
"""Move to final destination and record in database"""
|
||||
|
||||
# Create person directory
|
||||
person_dir_name = self._sanitize_name(person_name)
|
||||
person_dir = os.path.join(self.final_base, person_dir_name)
|
||||
os.makedirs(person_dir, exist_ok=True)
|
||||
|
||||
# Move file
|
||||
filename = os.path.basename(temp_path)
|
||||
final_path = os.path.join(person_dir, filename)
|
||||
|
||||
# Handle duplicates in destination
|
||||
if os.path.exists(final_path):
|
||||
base, ext = os.path.splitext(filename)
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
filename = f"{base}_{timestamp}{ext}"
|
||||
final_path = os.path.join(person_dir, filename)
|
||||
|
||||
shutil.move(temp_path, final_path)
|
||||
|
||||
# Record in database
|
||||
self._record_download(final_path, person_name, file_hash, source)
|
||||
|
||||
logger.info(f"✓ Auto-sorted: {filename} → {person_name}/")
|
||||
|
||||
return {
|
||||
'status': 'success',
|
||||
'action': 'sorted',
|
||||
'destination': final_path,
|
||||
'reason': 'face_match_verified',
|
||||
'person': person_name,
|
||||
'hash': file_hash
|
||||
}
|
||||
|
||||
def _move_to_review(self, temp_path: str, category: str,
|
||||
new_filename: str, reason: str) -> Dict:
|
||||
"""Move to review directory for manual processing"""
|
||||
|
||||
review_dir = os.path.join(self.review_base, category)
|
||||
review_path = os.path.join(review_dir, new_filename)
|
||||
|
||||
# Handle duplicates
|
||||
if os.path.exists(review_path):
|
||||
base, ext = os.path.splitext(new_filename)
|
||||
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||
new_filename = f"{base}_{timestamp}{ext}"
|
||||
review_path = os.path.join(review_dir, new_filename)
|
||||
|
||||
shutil.move(temp_path, review_path)
|
||||
|
||||
logger.info(f"⚠ Needs review: {new_filename} → review/{category}/ ({reason})")
|
||||
|
||||
return {
|
||||
'status': 'success',
|
||||
'action': 'reviewed',
|
||||
'destination': review_path,
|
||||
'reason': reason,
|
||||
'category': category
|
||||
}
|
||||
|
||||
def _record_download(self, file_path: str, person_name: str,
|
||||
file_hash: str, source: str = None):
|
||||
"""Record successful download in database"""
|
||||
|
||||
with sqlite3.connect(self.unified_db.db_path) as conn:
|
||||
conn.execute("""
|
||||
INSERT INTO downloads
|
||||
(file_path, filename, file_hash, source, person_name,
|
||||
download_date, auto_sorted)
|
||||
VALUES (?, ?, ?, ?, ?, ?, 1)
|
||||
""", (
|
||||
file_path,
|
||||
os.path.basename(file_path),
|
||||
file_hash,
|
||||
source,
|
||||
person_name,
|
||||
datetime.now().isoformat()
|
||||
))
|
||||
conn.commit()
|
||||
|
||||
def _sanitize_name(self, name: str) -> str:
|
||||
"""Convert person name to safe directory name"""
|
||||
import re
|
||||
safe = re.sub(r'[^\w\s-]', '', name)
|
||||
safe = re.sub(r'[-\s]+', '_', safe)
|
||||
return safe.lower()
|
||||
|
||||
# REVIEW QUEUE MANAGEMENT
|
||||
|
||||
def get_review_queue(self, category: str = None) -> list:
|
||||
"""Get files in review queue"""
|
||||
|
||||
if category:
|
||||
review_dir = os.path.join(self.review_base, category)
|
||||
categories = [category]
|
||||
else:
|
||||
categories = ['duplicates', 'unidentified', 'low_confidence',
|
||||
'multiple_faces', 'unwanted_person']
|
||||
|
||||
queue = []
|
||||
|
||||
for cat in categories:
|
||||
cat_dir = os.path.join(self.review_base, cat)
|
||||
if os.path.exists(cat_dir):
|
||||
files = os.listdir(cat_dir)
|
||||
for f in files:
|
||||
queue.append({
|
||||
'category': cat,
|
||||
'filename': f,
|
||||
'path': os.path.join(cat_dir, f),
|
||||
'size': os.path.getsize(os.path.join(cat_dir, f)),
|
||||
'modified': os.path.getmtime(os.path.join(cat_dir, f))
|
||||
})
|
||||
|
||||
return sorted(queue, key=lambda x: x['modified'], reverse=True)
|
||||
|
||||
def approve_review_item(self, file_path: str, person_name: str) -> Dict:
|
||||
"""Manually approve a review item and move to final destination"""
|
||||
|
||||
if not os.path.exists(file_path):
|
||||
return {'status': 'error', 'reason': 'file_not_found'}
|
||||
|
||||
# Calculate hash
|
||||
file_hash = self._calculate_hash(file_path)
|
||||
|
||||
# Move to final destination
|
||||
return self._move_to_final(file_path, person_name, file_hash, source='manual_review')
|
||||
|
||||
def reject_review_item(self, file_path: str) -> Dict:
|
||||
"""Delete a review item"""
|
||||
|
||||
if not os.path.exists(file_path):
|
||||
return {'status': 'error', 'reason': 'file_not_found'}
|
||||
|
||||
os.remove(file_path)
|
||||
logger.info(f"Rejected and deleted: {file_path}")
|
||||
|
||||
return {
|
||||
'status': 'success',
|
||||
'action': 'deleted',
|
||||
'path': file_path
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Configuration
|
||||
|
||||
### Add to `config.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"smart_download": {
|
||||
"enabled": true,
|
||||
|
||||
"directories": {
|
||||
"temp_dir": "/mnt/storage/Downloads/temp_downloads",
|
||||
"final_base": "/mnt/storage/Downloads/faces",
|
||||
"review_base": "/mnt/storage/Downloads/review"
|
||||
},
|
||||
|
||||
"whitelist": [
|
||||
"john_doe",
|
||||
"sarah_smith",
|
||||
"family_member_1"
|
||||
],
|
||||
|
||||
"blacklist": [
|
||||
"ex_partner",
|
||||
"stranger"
|
||||
],
|
||||
|
||||
"thresholds": {
|
||||
"min_confidence": 0.6,
|
||||
"max_faces_per_image": 1
|
||||
},
|
||||
|
||||
"immich": {
|
||||
"wait_time_seconds": 5,
|
||||
"trigger_scan": true,
|
||||
"retry_if_no_faces": true,
|
||||
"max_retries": 2
|
||||
},
|
||||
|
||||
"deduplication": {
|
||||
"check_hash": true,
|
||||
"action_on_duplicate": "move_to_review"
|
||||
},
|
||||
|
||||
"review_categories": {
|
||||
"duplicates": true,
|
||||
"unidentified": true,
|
||||
"low_confidence": true,
|
||||
"multiple_faces": true,
|
||||
"unwanted_person": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔄 Integration with Existing Download System
|
||||
|
||||
### Modify Download Completion Hook
|
||||
|
||||
```python
|
||||
def on_download_complete(url: str, temp_path: str, source: str):
|
||||
"""
|
||||
Called when download completes
|
||||
Now uses smart download workflow
|
||||
"""
|
||||
|
||||
if config.get('smart_download', {}).get('enabled', False):
|
||||
# Use smart download workflow
|
||||
smart = SmartDownloader(config, immich_db, unified_db)
|
||||
result = smart.smart_download(url, source)
|
||||
|
||||
logger.info(f"Smart download result: {result}")
|
||||
|
||||
# Send notification
|
||||
if result['action'] == 'sorted':
|
||||
send_notification(
|
||||
f"✓ Auto-sorted to {result['person']}",
|
||||
result['destination']
|
||||
)
|
||||
elif result['action'] == 'reviewed':
|
||||
send_notification(
|
||||
f"⚠ Needs review: {result['reason']}",
|
||||
result['destination']
|
||||
)
|
||||
|
||||
return result
|
||||
else:
|
||||
# Fall back to old workflow
|
||||
return legacy_download_handler(url, temp_path, source)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📊 Database Schema Addition
|
||||
|
||||
```sql
|
||||
-- Add person_name and auto_sorted columns to downloads table
|
||||
ALTER TABLE downloads ADD COLUMN person_name TEXT;
|
||||
ALTER TABLE downloads ADD COLUMN auto_sorted INTEGER DEFAULT 0;
|
||||
|
||||
-- Create index for quick person lookups
|
||||
CREATE INDEX idx_downloads_person ON downloads(person_name);
|
||||
CREATE INDEX idx_downloads_auto_sorted ON downloads(auto_sorted);
|
||||
|
||||
-- Create review queue table
|
||||
CREATE TABLE review_queue (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
file_path TEXT NOT NULL,
|
||||
category TEXT NOT NULL, -- duplicates, unidentified, etc.
|
||||
file_hash TEXT,
|
||||
reason TEXT,
|
||||
faces_detected INTEGER DEFAULT 0,
|
||||
suggested_person TEXT,
|
||||
created_at TEXT,
|
||||
reviewed_at TEXT,
|
||||
reviewed_by TEXT,
|
||||
action TEXT -- approved, rejected, pending
|
||||
);
|
||||
|
||||
CREATE INDEX idx_review_category ON review_queue(category);
|
||||
CREATE INDEX idx_review_action ON review_queue(action);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 Web UI - Review Queue Page
|
||||
|
||||
### Review Queue Interface
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────────────────┐
|
||||
│ Review Queue (42 items) │
|
||||
├─────────────────────────────────────────────────────────────────┤
|
||||
│ │
|
||||
│ Filter: [All ▼] [Duplicates: 5] [Unidentified: 28] │
|
||||
│ [Low Confidence: 6] [Multiple Faces: 3] │
|
||||
│ │
|
||||
│ ┌─────────────────────────────────────────────────────────┐ │
|
||||
│ │ [Image Thumbnail] │ │
|
||||
│ │ │ │
|
||||
│ │ Category: Unidentified │ │
|
||||
│ │ Reason: No faces detected by Immich │ │
|
||||
│ │ File: instagram_profile_20250131_120000.jpg │ │
|
||||
│ │ Size: 2.4 MB │ │
|
||||
│ │ Downloaded: 2025-01-31 12:00:00 │ │
|
||||
│ │ │ │
|
||||
│ │ This is: [Select Person ▼] or [New Person...] │ │
|
||||
│ │ │ │
|
||||
│ │ [✓ Approve & Sort] [✗ Delete] [→ Skip] │ │
|
||||
│ └─────────────────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ [◄ Previous] 1 of 42 [Next ►] │
|
||||
│ │
|
||||
│ Bulk Actions: [Select All] [Delete Selected] [Export List] │
|
||||
└─────────────────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📡 API Endpoints (New)
|
||||
|
||||
```python
|
||||
# Review Queue
|
||||
GET /api/smart-download/review/queue # Get all review items
|
||||
GET /api/smart-download/review/queue/{category} # By category
|
||||
POST /api/smart-download/review/{id}/approve # Approve and move to person
|
||||
POST /api/smart-download/review/{id}/reject # Delete item
|
||||
GET /api/smart-download/review/stats # Queue statistics
|
||||
|
||||
# Smart Download Control
|
||||
GET /api/smart-download/status
|
||||
POST /api/smart-download/enable
|
||||
POST /api/smart-download/disable
|
||||
|
||||
# Configuration
|
||||
GET /api/smart-download/config
|
||||
PUT /api/smart-download/config/whitelist
|
||||
PUT /api/smart-download/config/blacklist
|
||||
|
||||
# Statistics
|
||||
GET /api/smart-download/stats/today
|
||||
GET /api/smart-download/stats/summary
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📈 Statistics & Reporting
|
||||
|
||||
```python
|
||||
def get_smart_download_stats(days: int = 30) -> dict:
|
||||
"""Get smart download statistics"""
|
||||
|
||||
with sqlite3.connect(db_path) as conn:
|
||||
# Auto-sorted count
|
||||
auto_sorted = conn.execute("""
|
||||
SELECT COUNT(*)
|
||||
FROM downloads
|
||||
WHERE auto_sorted = 1
|
||||
AND download_date >= datetime('now', ? || ' days')
|
||||
""", (f'-{days}',)).fetchone()[0]
|
||||
|
||||
# Review queue count
|
||||
in_review = conn.execute("""
|
||||
SELECT COUNT(*)
|
||||
FROM review_queue
|
||||
WHERE action = 'pending'
|
||||
""").fetchone()[0]
|
||||
|
||||
# By person
|
||||
by_person = conn.execute("""
|
||||
SELECT person_name, COUNT(*)
|
||||
FROM downloads
|
||||
WHERE auto_sorted = 1
|
||||
AND download_date >= datetime('now', ? || ' days')
|
||||
GROUP BY person_name
|
||||
""", (f'-{days}',)).fetchall()
|
||||
|
||||
# By review category
|
||||
by_category = conn.execute("""
|
||||
SELECT category, COUNT(*)
|
||||
FROM review_queue
|
||||
WHERE action = 'pending'
|
||||
GROUP BY category
|
||||
""").fetchall()
|
||||
|
||||
return {
|
||||
'auto_sorted': auto_sorted,
|
||||
'in_review': in_review,
|
||||
'by_person': dict(by_person),
|
||||
'by_category': dict(by_category),
|
||||
'success_rate': (auto_sorted / (auto_sorted + in_review) * 100) if (auto_sorted + in_review) > 0 else 0
|
||||
}
|
||||
|
||||
# Example output:
|
||||
# {
|
||||
# 'auto_sorted': 145,
|
||||
# 'in_review': 23,
|
||||
# 'by_person': {'john_doe': 85, 'sarah_smith': 60},
|
||||
# 'by_category': {'unidentified': 15, 'duplicates': 5, 'multiple_faces': 3},
|
||||
# 'success_rate': 86.3
|
||||
# }
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Example Usage
|
||||
|
||||
### Example 1: Download Instagram Profile
|
||||
|
||||
```python
|
||||
# Download profile with smart workflow
|
||||
downloader = SmartDownloader(config, immich_db, unified_db)
|
||||
|
||||
images = get_instagram_profile_images('username')
|
||||
|
||||
results = {
|
||||
'sorted': 0,
|
||||
'reviewed': 0,
|
||||
'errors': 0
|
||||
}
|
||||
|
||||
for image_url in images:
|
||||
result = downloader.smart_download(image_url, source='instagram')
|
||||
|
||||
if result['action'] == 'sorted':
|
||||
results['sorted'] += 1
|
||||
print(f"✓ {result['person']}: {result['destination']}")
|
||||
elif result['action'] == 'reviewed':
|
||||
results['reviewed'] += 1
|
||||
print(f"⚠ Review needed ({result['reason']}): {result['destination']}")
|
||||
else:
|
||||
results['errors'] += 1
|
||||
|
||||
print(f"\nResults: {results['sorted']} sorted, {results['reviewed']} need review")
|
||||
|
||||
# Output:
|
||||
# ✓ john_doe: /faces/john_doe/image1.jpg
|
||||
# ✓ john_doe: /faces/john_doe/image2.jpg
|
||||
# ⚠ Review needed (not_in_whitelist): /review/unidentified/image3.jpg
|
||||
# ⚠ Review needed (duplicate): /review/duplicates/image4.jpg
|
||||
# ✓ john_doe: /faces/john_doe/image5.jpg
|
||||
#
|
||||
# Results: 3 sorted, 2 need review
|
||||
```
|
||||
|
||||
### Example 2: Process Review Queue
|
||||
|
||||
```python
|
||||
# Get pending reviews
|
||||
queue = downloader.get_review_queue()
|
||||
|
||||
print(f"Review queue: {len(queue)} items")
|
||||
|
||||
for item in queue:
|
||||
print(f"\nFile: {item['filename']}")
|
||||
print(f"Category: {item['category']}")
|
||||
print(f"Path: {item['path']}")
|
||||
|
||||
# Manual decision
|
||||
action = input("Action (approve/reject/skip): ")
|
||||
|
||||
if action == 'approve':
|
||||
person = input("Person name: ")
|
||||
result = downloader.approve_review_item(item['path'], person)
|
||||
print(f"✓ Approved and sorted to {person}")
|
||||
|
||||
elif action == 'reject':
|
||||
downloader.reject_review_item(item['path'])
|
||||
print(f"✗ Deleted")
|
||||
|
||||
else:
|
||||
print(f"→ Skipped")
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## ✅ Advantages of This System
|
||||
|
||||
### 1. **Fully Automated for Good Cases**
|
||||
- Matching face + not duplicate = auto-sorted
|
||||
- No manual intervention needed for 80-90% of images
|
||||
|
||||
### 2. **Safe Review for Edge Cases**
|
||||
- Duplicates flagged for review
|
||||
- Unknown faces queued for identification
|
||||
- Multiple faces queued for decision
|
||||
|
||||
### 3. **Leverages Existing Systems**
|
||||
- Uses your SHA256 deduplication
|
||||
- Uses Immich's face recognition
|
||||
- Clean integration
|
||||
|
||||
### 4. **Nothing Lost**
|
||||
- Every image goes somewhere
|
||||
- Easy to find and review
|
||||
- Can always approve later
|
||||
|
||||
### 5. **Flexible Configuration**
|
||||
- Whitelist/blacklist
|
||||
- Confidence thresholds
|
||||
- Review categories
|
||||
|
||||
### 6. **Clear Audit Trail**
|
||||
- Database tracks everything
|
||||
- Statistics available
|
||||
- Can generate reports
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Implementation Timeline
|
||||
|
||||
### Week 1: Core Workflow
|
||||
- [ ] Create SmartDownloader class
|
||||
- [ ] Implement download to temp
|
||||
- [ ] Add hash checking
|
||||
- [ ] Basic face checking
|
||||
- [ ] Move to final/review logic
|
||||
|
||||
### Week 2: Immich Integration
|
||||
- [ ] Connect to Immich DB
|
||||
- [ ] Query face data
|
||||
- [ ] Trigger Immich scans
|
||||
- [ ] Handle face results
|
||||
|
||||
### Week 3: Review System
|
||||
- [ ] Create review directories
|
||||
- [ ] Review queue database
|
||||
- [ ] Get/approve/reject methods
|
||||
- [ ] Statistics
|
||||
|
||||
### Week 4: Web UI
|
||||
- [ ] Review queue page
|
||||
- [ ] Approve/reject interface
|
||||
- [ ] Statistics dashboard
|
||||
- [ ] Configuration page
|
||||
|
||||
### Week 5: Polish
|
||||
- [ ] Error handling
|
||||
- [ ] Notifications
|
||||
- [ ] Documentation
|
||||
- [ ] Testing
|
||||
|
||||
---
|
||||
|
||||
## 🎯 Success Metrics
|
||||
|
||||
After implementation, track:
|
||||
|
||||
- **Auto-sort rate**: % of images auto-sorted vs reviewed
|
||||
- **Target**: >80% auto-sorted
|
||||
- **Duplicate catch rate**: % of duplicates caught
|
||||
- **Target**: 100%
|
||||
- **False positive rate**: % of incorrectly sorted images
|
||||
- **Target**: <5%
|
||||
- **Review queue size**: Average pending items
|
||||
- **Target**: <50 items
|
||||
|
||||
---
|
||||
|
||||
## ✅ Your Perfect Workflow - Summary
|
||||
|
||||
```
|
||||
Download → Hash Check → Face Check → Decision
|
||||
↓ ↓
|
||||
Duplicate? Matches?
|
||||
↓ ↓
|
||||
┌───┴───┐ ┌───┴────┐
|
||||
YES NO YES NO
|
||||
↓ ↓ ↓ ↓
|
||||
REVIEW Continue FINAL REVIEW
|
||||
```
|
||||
|
||||
**Final Destinations**:
|
||||
- ✅ `/faces/john_doe/` - Verified, auto-sorted
|
||||
- ⚠️ `/review/duplicates/` - Needs duplicate review
|
||||
- ⚠️ `/review/unidentified/` - Needs face identification
|
||||
- ⚠️ `/review/low_confidence/` - Low match confidence
|
||||
- ⚠️ `/review/multiple_faces/` - Multiple people
|
||||
|
||||
**This is exactly what you wanted!**
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2025-10-31
|
||||
Reference in New Issue
Block a user