Files
media-downloader/docs/archive/WEB_GUI_LIVE_SCREENSHOTS.md
Todd 0d7b2b1aab Initial commit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-29 22:42:55 -04:00

16 KiB

Live Screenshot Streaming Feature

Overview

Stream live browser screenshots from Playwright scrapers to the web UI in real-time, providing visual insight into scraping progress.


Technical Implementation

1. Backend - Screenshot Capture

Modify Download Workers:

# backend/workers/download_worker.py
from backend.core.websocket_manager import broadcast_screenshot
import base64
import asyncio

@celery_app.task(bind=True)
def download_instagram_posts(self, queue_item_id: int, config: dict):
    """Background task with live screenshot streaming"""

    # Create screenshot callback
    async def screenshot_callback(page, action: str):
        """Called periodically during scraping"""
        try:
            # Take screenshot
            screenshot_bytes = await page.screenshot(type='jpeg', quality=60)

            # Encode to base64
            screenshot_b64 = base64.b64encode(screenshot_bytes).decode('utf-8')

            # Broadcast via WebSocket
            await broadcast_screenshot({
                'type': 'scraper_screenshot',
                'queue_id': queue_item_id,
                'platform': 'instagram',
                'action': action,
                'screenshot': screenshot_b64,
                'timestamp': datetime.now().isoformat()
            })
        except Exception as e:
            logger.debug(f"Screenshot capture error: {e}")

    # Initialize downloader with screenshot callback
    downloader = FastDLDownloader(
        unified_db=get_unified_db(),
        log_callback=log_callback,
        screenshot_callback=screenshot_callback  # New parameter
    )

    # Rest of download logic...

Update Downloader Modules:

# modules/fastdl_module.py
class FastDLDownloader:
    def __init__(self, ..., screenshot_callback=None):
        self.screenshot_callback = screenshot_callback

    async def _run_download(self):
        """Download with screenshot streaming"""
        with sync_playwright() as p:
            browser = p.firefox.launch(headless=self.headless)
            page = browser.new_page()

            # Take screenshot at key points
            await self._capture_screenshot(page, "Navigating to Instagram")

            page.goto("https://fastdl.app/en/instagram-download")

            await self._capture_screenshot(page, "Filling username field")

            input_box.fill(self.username)

            await self._capture_screenshot(page, "Waiting for results")

            # During scroll and download
            for i, card in enumerate(download_cards):
                if i % 3 == 0:  # Screenshot every 3 items
                    await self._capture_screenshot(
                        page,
                        f"Downloading item {i+1}/{len(download_cards)}"
                    )

                # Download logic...

    async def _capture_screenshot(self, page, action: str):
        """Capture and stream screenshot"""
        if self.screenshot_callback:
            try:
                await self.screenshot_callback(page, action)
            except Exception as e:
                logger.debug(f"Screenshot callback error: {e}")

2. WebSocket Manager Enhancement

Add Screenshot Broadcasting:

# backend/core/websocket_manager.py
class ConnectionManager:
    def __init__(self):
        self.active_connections: List[WebSocket] = []
        self.screenshot_subscribers: Dict[int, List[WebSocket]] = {}

    async def subscribe_screenshots(self, websocket: WebSocket, queue_id: int):
        """Subscribe to screenshots for specific queue item"""
        if queue_id not in self.screenshot_subscribers:
            self.screenshot_subscribers[queue_id] = []
        self.screenshot_subscribers[queue_id].append(websocket)

    async def unsubscribe_screenshots(self, websocket: WebSocket, queue_id: int):
        """Unsubscribe from screenshots"""
        if queue_id in self.screenshot_subscribers:
            if websocket in self.screenshot_subscribers[queue_id]:
                self.screenshot_subscribers[queue_id].remove(websocket)

    async def broadcast_screenshot(self, message: dict):
        """Broadcast screenshot to subscribed clients only"""
        queue_id = message.get('queue_id')
        if queue_id and queue_id in self.screenshot_subscribers:
            disconnected = []
            for connection in self.screenshot_subscribers[queue_id]:
                try:
                    await connection.send_json(message)
                except:
                    disconnected.append(connection)

            # Clean up disconnected
            for conn in disconnected:
                self.screenshot_subscribers[queue_id].remove(conn)

# Global function
async def broadcast_screenshot(message: dict):
    await manager.broadcast_screenshot(message)

3. API Endpoint for Screenshot Control

Add Screenshot Subscription:

# backend/api/routes/websocket.py
@router.websocket("/ws/screenshots/{queue_id}")
async def websocket_screenshots(
    websocket: WebSocket,
    queue_id: int,
    user_id: int = Depends(get_current_user_ws)
):
    """WebSocket endpoint for live screenshot streaming"""
    await manager.connect(websocket, user_id)
    await manager.subscribe_screenshots(websocket, queue_id)

    try:
        while True:
            # Keep connection alive
            data = await websocket.receive_text()

            if data == "ping":
                await websocket.send_text("pong")
            elif data == "stop":
                # Client wants to stop receiving screenshots
                await manager.unsubscribe_screenshots(websocket, queue_id)
                break

    except Exception:
        manager.disconnect(websocket, user_id)
        await manager.unsubscribe_screenshots(websocket, queue_id)

4. Frontend Implementation

Screenshot Viewer Component:

<!-- frontend/src/components/LiveScreenshotViewer.vue -->
<template>
  <div class="screenshot-viewer">
    <v-card>
      <v-card-title>
        Live Scraper View - {{ platform }}
        <v-spacer></v-spacer>
        <v-chip :color="isLive ? 'success' : 'grey'" small>
          <v-icon small left>{{ isLive ? 'mdi-circle' : 'mdi-circle-outline' }}</v-icon>
          {{ isLive ? 'LIVE' : 'Offline' }}
        </v-chip>
      </v-card-title>

      <v-card-text>
        <!-- Screenshot Display -->
        <div class="screenshot-container" v-if="screenshot">
          <img
            :src="`data:image/jpeg;base64,${screenshot}`"
            alt="Live scraper screenshot"
            class="screenshot-image"
          />

          <!-- Action Overlay -->
          <div class="action-overlay">
            <v-chip color="primary" dark>
              {{ currentAction }}
            </v-chip>
          </div>

          <!-- Timestamp -->
          <div class="timestamp-overlay">
            Updated {{ timeSince }} ago
          </div>
        </div>

        <!-- Placeholder when no screenshot -->
        <div v-else class="screenshot-placeholder">
          <v-icon size="64" color="grey lighten-2">mdi-camera-off</v-icon>
          <div class="mt-4">Waiting for scraper to start...</div>
        </div>
      </v-card-text>

      <v-card-actions>
        <v-btn
          :color="enabled ? 'error' : 'success'"
          @click="toggleScreenshots"
          outlined
          small
        >
          <v-icon left small>
            {{ enabled ? 'mdi-pause' : 'mdi-play' }}
          </v-icon>
          {{ enabled ? 'Pause Screenshots' : 'Resume Screenshots' }}
        </v-btn>

        <v-btn
          color="primary"
          @click="downloadScreenshot"
          :disabled="!screenshot"
          outlined
          small
        >
          <v-icon left small>mdi-download</v-icon>
          Save Screenshot
        </v-btn>

        <v-spacer></v-spacer>

        <v-chip small outlined>
          FPS: {{ fps }}
        </v-chip>
      </v-card-actions>
    </v-card>
  </div>
</template>

<script>
import { ref, computed, onMounted, onUnmounted } from 'vue';
import websocketService from '@/services/websocket';

export default {
  name: 'LiveScreenshotViewer',
  props: {
    queueId: {
      type: Number,
      required: true
    },
    platform: {
      type: String,
      required: true
    }
  },
  setup(props) {
    const screenshot = ref(null);
    const currentAction = ref('Initializing...');
    const lastUpdate = ref(null);
    const enabled = ref(true);
    const isLive = ref(false);
    const fps = ref(0);

    let wsConnection = null;
    let frameCount = 0;
    let fpsInterval = null;

    const timeSince = computed(() => {
      if (!lastUpdate.value) return 'never';
      const seconds = Math.floor((Date.now() - lastUpdate.value) / 1000);
      if (seconds < 60) return `${seconds}s`;
      return `${Math.floor(seconds / 60)}m`;
    });

    const connectWebSocket = () => {
      wsConnection = websocketService.connectScreenshots(props.queueId);

      wsConnection.on('scraper_screenshot', (data) => {
        if (enabled.value) {
          screenshot.value = data.screenshot;
          currentAction.value = data.action;
          lastUpdate.value = Date.now();
          isLive.value = true;
          frameCount++;
        }
      });

      wsConnection.on('download_completed', () => {
        isLive.value = false;
        currentAction.value = 'Download completed';
      });

      wsConnection.on('download_failed', () => {
        isLive.value = false;
        currentAction.value = 'Download failed';
      });
    };

    const toggleScreenshots = () => {
      enabled.value = !enabled.value;
      if (!enabled.value) {
        isLive.value = false;
      }
    };

    const downloadScreenshot = () => {
      if (!screenshot.value) return;

      const link = document.createElement('a');
      link.href = `data:image/jpeg;base64,${screenshot.value}`;
      link.download = `screenshot_${props.queueId}_${Date.now()}.jpg`;
      link.click();
    };

    onMounted(() => {
      connectWebSocket();

      // Calculate FPS
      fpsInterval = setInterval(() => {
        fps.value = frameCount;
        frameCount = 0;
      }, 1000);
    });

    onUnmounted(() => {
      if (wsConnection) {
        wsConnection.send('stop');
        wsConnection.disconnect();
      }
      clearInterval(fpsInterval);
    });

    return {
      screenshot,
      currentAction,
      timeSince,
      enabled,
      isLive,
      fps,
      toggleScreenshots,
      downloadScreenshot
    };
  }
};
</script>

<style scoped>
.screenshot-viewer {
  margin: 16px 0;
}

.screenshot-container {
  position: relative;
  width: 100%;
  background: #000;
  border-radius: 4px;
  overflow: hidden;
}

.screenshot-image {
  width: 100%;
  height: auto;
  display: block;
}

.action-overlay {
  position: absolute;
  top: 16px;
  left: 16px;
  z-index: 10;
}

.timestamp-overlay {
  position: absolute;
  bottom: 16px;
  right: 16px;
  background: rgba(0, 0, 0, 0.7);
  color: white;
  padding: 4px 8px;
  border-radius: 4px;
  font-size: 12px;
  z-index: 10;
}

.screenshot-placeholder {
  display: flex;
  flex-direction: column;
  align-items: center;
  justify-content: center;
  min-height: 400px;
  background: #f5f5f5;
  border-radius: 4px;
  color: #999;
}
</style>

WebSocket Service Enhancement:

// frontend/src/services/websocket.js
class WebSocketClient {
  // ... existing code ...

  connectScreenshots(queueId) {
    const token = localStorage.getItem('access_token');
    const ws = new WebSocket(
      `ws://localhost:8000/ws/screenshots/${queueId}?token=${token}`
    );

    const listeners = new Map();

    ws.onmessage = (event) => {
      const message = JSON.parse(event.data);
      this.notifyListeners(listeners, message);
    };

    return {
      on: (type, callback) => {
        if (!listeners.has(type)) {
          listeners.set(type, []);
        }
        listeners.get(type).push(callback);
      },
      send: (message) => {
        if (ws.readyState === WebSocket.OPEN) {
          ws.send(message);
        }
      },
      disconnect: () => {
        ws.close();
      }
    };
  }

  notifyListeners(listeners, message) {
    const { type, data } = message;
    if (listeners.has(type)) {
      listeners.get(type).forEach(callback => callback(data));
    }
  }
}

Usage in Queue Manager:

<!-- frontend/src/views/QueueManager.vue -->
<template>
  <v-container>
    <v-row>
      <!-- Queue List -->
      <v-col cols="12" md="6">
        <v-card>
          <v-card-title>Download Queue</v-card-title>
          <v-list>
            <v-list-item
              v-for="item in queueItems"
              :key="item.id"
              @click="selectedQueueId = item.id"
              :class="{ 'selected': selectedQueueId === item.id }"
            >
              <!-- Queue item details -->
            </v-list-item>
          </v-list>
        </v-card>
      </v-col>

      <!-- Live Screenshot Viewer -->
      <v-col cols="12" md="6">
        <LiveScreenshotViewer
          v-if="selectedQueueId"
          :queue-id="selectedQueueId"
          :platform="selectedItem.platform"
        />
      </v-col>
    </v-row>
  </v-container>
</template>

<script>
import LiveScreenshotViewer from '@/components/LiveScreenshotViewer.vue';

export default {
  components: {
    LiveScreenshotViewer
  },
  // ... rest of component
};
</script>

Performance Optimizations

1. Screenshot Quality & Size Control

# Adjustable quality based on bandwidth
screenshot_bytes = page.screenshot(
    type='jpeg',
    quality=60,  # 60% quality = smaller size
    full_page=False  # Only visible area
)

2. Frame Rate Limiting

# Only send screenshot every 2-3 seconds, not every action
last_screenshot_time = 0
screenshot_interval = 2.0  # seconds

async def _capture_screenshot_throttled(self, page, action: str):
    current_time = time.time()
    if current_time - self.last_screenshot_time >= self.screenshot_interval:
        await self._capture_screenshot(page, action)
        self.last_screenshot_time = current_time

3. Client-Side Caching

// Only update DOM if screenshot actually changed
const screenshotHash = simpleHash(data.screenshot);
if (screenshotHash !== lastScreenshotHash.value) {
  screenshot.value = data.screenshot;
  lastScreenshotHash.value = screenshotHash;
}

4. Opt-in Feature

# Only capture screenshots if client is subscribed
if len(self.screenshot_subscribers.get(queue_id, [])) > 0:
    await self._capture_screenshot(page, action)
# Otherwise skip to save resources

User Settings

Add to Settings Page:

{
  "live_screenshots": {
    "enabled": true,
    "quality": 60,
    "frame_rate": 0.5,  // screenshots per second
    "auto_enable": false  // enable by default for new downloads
  }
}

Benefits

  1. Visual Debugging - See exactly what's happening during scraping
  2. Confidence - Know the scraper is working correctly
  3. Entertainment - Watch downloads happen in real-time
  4. Troubleshooting - Immediately spot issues (CAPTCHA, layout changes)
  5. Learning - Understand how scrapers navigate sites

Bandwidth Considerations

Typical Screenshot:

  • Size: 50-150 KB (JPEG 60% quality)
  • Frequency: 0.5 FPS (1 screenshot every 2 seconds)
  • Bandwidth: ~25-75 KB/s per active download

With 4 concurrent downloads:

  • Total: ~100-300 KB/s = 0.8-2.4 Mbps

This is very reasonable for modern internet connections.


Advanced Features (Future)

1. Element Highlighting

# Highlight the element being scraped
await page.evaluate("""
    (selector) => {
        const element = document.querySelector(selector);
        if (element) {
            element.style.outline = '3px solid red';
        }
    }
""", current_selector)

# Then take screenshot
screenshot = await page.screenshot()

2. Recording Mode

# Option to save all screenshots as video
ffmpeg -framerate 0.5 -i screenshot_%04d.jpg -c:v libx264 scraping_video.mp4

3. Comparison Mode

// Show before/after for quality upgrade
<div class="comparison">
  <img src="fastdl_screenshot" label="FastDL (640x640)" />
  <img src="toolzu_screenshot" label="Toolzu (1920x1440)" />
</div>

Implementation Priority

This feature should be added in Phase 4 (Advanced Features) since it's not critical for core functionality but provides excellent user experience.

Estimated Development Time: 3-4 days

  • Backend: 1 day
  • Frontend component: 1 day
  • WebSocket integration: 1 day
  • Testing & optimization: 1 day