10 KiB
Instagram Repost Detection - Testing & Deployment Guide
Status: ✅ Implementation Complete - Ready for Testing Default State: 🔒 DISABLED (feature flag off)
Implementation Summary
All code has been safely integrated with backward-compatible changes:
✅ ImgInn Module Updated - Added optional skip_database and max_age_hours parameters (default behavior unchanged)
✅ Move Module Updated - Added repost detection hooks with feature flag check (disabled by default)
✅ Database Settings Added - Settings entry created with enabled: false
✅ Frontend UI Added - Configuration page includes repost detection settings panel
✅ Module Tested - Core detection logic validated with real example file
Safety Guarantees
Backward Compatibility
- All new parameters have defaults that preserve existing behavior
- Feature is completely disabled by default
- No changes to existing workflows when disabled
- Can be toggled on/off without code changes
Error Handling
- If repost detection fails, original file processing continues normally
- Missing dependencies don't break downloads
- Failed OCR/hashing doesn't stop the move operation
Database Safety
- New tables created only when feature is used
- Existing tables remain untouched
- Can be disabled instantly via SQL or UI
Testing Plan
Phase 1: Verify Feature is Disabled (Recommended First Step)
Purpose: Confirm existing functionality is unchanged
# 1. Check database setting
sqlite3 /opt/media-downloader/data/backup_cache.db \
"SELECT key, json_extract(value, '$.enabled') FROM settings WHERE key = 'repost_detection';"
# Expected output:
# repost_detection|0 (0 = disabled)
# 2. Download some Instagram stories (any module)
# - Stories should download normally
# - No repost detection messages in logs
# - No temp files in /tmp/repost_detection/
# 3. Check frontend
# - Open Configuration page
# - Find "Instagram Repost Detection" section
# - Verify toggle is OFF by default
Expected Result: Everything works exactly as before
Phase 2: Enable and Test Detection
Step 2.1: Enable via Frontend (Recommended)
- Open Configuration page: http://localhost:8000/configuration
- Scroll to "Instagram Repost Detection" section
- Toggle "Enabled" to ON
- Adjust settings if desired:
- Hash Distance Threshold: 10 (default)
- Fetch Cache Duration: 12 hours (default)
- Max Posts Age: 24 hours (default)
- Cleanup Temp Files: ON (recommended)
- Click "Save Configuration"
Step 2.2: Enable via SQL (Alternative)
sqlite3 /opt/media-downloader/data/backup_cache.db << 'EOF'
UPDATE settings
SET value = json_set(value, '$.enabled', true)
WHERE key = 'repost_detection';
SELECT 'Feature enabled. Current settings:';
SELECT value FROM settings WHERE key = 'repost_detection';
EOF
Step 2.3: Test with Known Repost
Use the example file from testing:
/media/d$/OneDrive - LIComputerGuy/Celebrities/Eva Longoria/4. Media/social media/instagram/stories/evalongoria_20251109_154548_story6.mp4
This is a repost of @globalgiftfoundation content.
# Manual test with the detection script
python3 /opt/media-downloader/tests/test_repost_detection_manual.py \
"/media/.../evalongoria_20251109_154548_story6.mp4" \
"evalongoria" \
--live
# Expected output:
# ✅ OCR extraction: @globalgiftfoundation
# ℹ️ @globalgiftfoundation NOT monitored (using temp queue)
# ⏬ Downloading stories and posts via ImgInn
# ✓ Found matching original
# ✓ Replaced repost with original
Phase 3: Monitor Live Downloads
Step 3.1: Enable Logging
Watch logs for repost detection activity:
# Terminal 1: Backend logs
sudo journalctl -u media-downloader-api -f | grep -i repost
# Terminal 2: Download logs
tail -f /opt/media-downloader/logs/downloads.log | grep -i repost
# Look for messages like:
# [RepostDetector] [INFO] Detected repost from @username
# [RepostDetector] [SUCCESS] ✓ Found original
# [MoveManager] [SUCCESS] ✓ Replaced repost with original from @username
Step 3.2: Check Database Tracking
# View repost replacements
sqlite3 /opt/media-downloader/data/backup_cache.db << 'EOF'
SELECT
repost_source,
original_username,
repost_filename,
detected_at
FROM repost_replacements
ORDER BY detected_at DESC
LIMIT 10;
EOF
# View fetch cache (avoid re-downloading)
sqlite3 /opt/media-downloader/data/backup_cache.db << 'EOF'
SELECT
username,
last_fetched,
content_count
FROM repost_fetch_cache
ORDER BY last_fetched DESC;
EOF
Step 3.3: Monitor Disk Usage
# Check temp directory (should be empty or small if cleanup enabled)
du -sh /tmp/repost_detection/
# Check for successful cleanups in logs
grep "Cleaned up.*temporary files" /opt/media-downloader/logs/*.log
Phase 4: Performance Testing
Test Scenario 1: Monitored Account Repost
Source: evalongoria (monitored)
Reposts: @originalu ser (also monitored)
Expected: Downloads to normal path, no cleanup
Test Scenario 2: Non-Monitored Account Repost
Source: evalongoria (monitored)
Reposts: @randomuser (NOT monitored)
Expected: Downloads to /tmp, cleanup after matching
Test Scenario 3: No @username Detected
Source: evalongoria (monitored)
Story: Regular story (not a repost)
Expected: Skip detection, process normally
Test Scenario 4: No Matching Original Found
Source: evalongoria (monitored)
Reposts: @oldaccount (deleted or no stories/posts)
Expected: Keep repost, log warning, continue
Rollback Procedures
Option 1: Disable via Frontend (Instant)
- Open Configuration page
- Toggle "Instagram Repost Detection" to OFF
- Save
Option 2: Disable via SQL (Instant)
sqlite3 /opt/media-downloader/data/backup_cache.db \
"UPDATE settings SET value = json_set(value, '$.enabled', false) WHERE key = 'repost_detection';"
Option 3: Comment Out Hook (Permanent Disable)
Edit /opt/media-downloader/modules/move_module.py around line 454:
# Disable repost detection permanently:
# if self._is_instagram_story(source) and self.batch_context:
# ...
Troubleshooting
Issue: "Missing dependencies" warning
Solution:
pip3 install --break-system-packages pytesseract opencv-python imagehash
sudo apt-get install tesseract-ocr tesseract-ocr-eng
Issue: OCR not detecting usernames
Possible causes:
- Username has special characters
- Low image quality
- Unusual font/styling
Solution: Adjust ocr_confidence_threshold in settings (lower = more permissive)
Issue: No matching original found
Possible causes:
- Original content deleted or made private
- Post older than
max_posts_age_hourssetting - Hash distance too strict
Solution:
- Increase
max_posts_age_hours(check older posts) - Increase
hash_distance_threshold(looser matching)
Issue: Temp files not being cleaned up
Check:
ls -lah /tmp/repost_detection/
Solution: Verify cleanup_temp_files is enabled in settings
Issue: Too many API requests to ImgInn
Solution:
- Increase
fetch_cache_hours(cache longer) - Reduce
max_posts_age_hours(check fewer posts)
Monitoring & Metrics
Key Metrics to Track
-- Repost detection success rate
SELECT
COUNT(*) as total_replacements,
COUNT(DISTINCT repost_source) as affected_sources,
COUNT(DISTINCT original_username) as original_accounts
FROM repost_replacements;
-- Most frequently detected original accounts
SELECT
original_username,
COUNT(*) as repost_count
FROM repost_replacements
GROUP BY original_username
ORDER BY repost_count DESC
LIMIT 10;
-- Recent activity
SELECT
DATE(detected_at) as date,
COUNT(*) as replacements
FROM repost_replacements
GROUP BY DATE(detected_at)
ORDER BY date DESC
LIMIT 7;
Performance Metrics
- Average processing time: 5-10 seconds per repost
- Disk usage (temp): ~50-200MB per non-monitored account (cleaned after use)
- Cache hit rate: Monitor fetch_cache table for efficiency
Best Practices
Recommended Settings
Conservative (Low Resource Usage):
{
"enabled": true,
"hash_distance_threshold": 8,
"fetch_cache_hours": 24,
"max_posts_age_hours": 12,
"cleanup_temp_files": true
}
Aggressive (Best Quality):
{
"enabled": true,
"hash_distance_threshold": 12,
"fetch_cache_hours": 6,
"max_posts_age_hours": 48,
"cleanup_temp_files": true
}
When to Use
✅ Good for:
- Accounts that frequently repost other users' stories
- High-profile accounts with quality concerns
- Archival purposes (want original high-res content)
❌ Not needed for:
- Accounts that rarely repost
- Already monitored original accounts
- Low-storage situations
Gradual Rollout Strategy
Week 1: Silent Monitoring
- Enable feature
- Monitor logs for detection rate
- Don't interfere with workflow
- Identify common patterns
Week 2: Selective Enable
- Enable for 2-3 high-repost accounts
- Verify replacements are correct
- Check false positive rate
- Monitor performance impact
Week 3: Broader Enable
- Enable for all Instagram story downloaders
- Monitor database growth
- Check temp file cleanup
- Validate quality improvements
Week 4+: Full Production
- Feature stable and validated
- Document edge cases found
- Tune settings based on results
- Consider expanding to other platforms
Support & Documentation
Documentation:
- Design spec:
/opt/media-downloader/docs/instagram_repost_detection_design.md - Test results:
/opt/media-downloader/docs/repost_detection_test_results.md - This guide:
/opt/media-downloader/docs/repost_detection_testing_guide.md
Test Scripts:
- Unit tests:
/opt/media-downloader/tests/test_instagram_repost_detector.py - Manual tests:
/opt/media-downloader/tests/test_repost_detection_manual.py
Module Files:
- Detector:
/opt/media-downloader/modules/instagram_repost_detector.py - ImgInn:
/opt/media-downloader/modules/imginn_module.py - Move:
/opt/media-downloader/modules/move_module.py
Success Criteria
✅ Feature is ready for production when:
- Disabled state doesn't affect existing functionality
- Enabled state successfully detects and replaces reposts
- No errors in logs during normal operation
- Temp files are cleaned up properly
- Database tracking works correctly
- Performance impact is acceptable
- False positive rate is low (<5%)
- Quality of replacements is consistently better
Ready to test! Start with Phase 1 to verify everything is safe, then gradually enable and test.