Don't Melt the Laptop: backpressure for local AI media pipelines
A cloud worker can ask for a bigger box. A local AI app is sharing one laptop with everything else the user is doing. The job system has to say “not all at once,” keep progress visible, and recover honestly when work stalls.
MediaFind's heaviest paths are not polite little request handlers. Indexing can decode media with ffmpeg, transcribe audio, embed transcript chunks, sample video frames, run OCR, detect faces, classify scenes, and rebuild a vector index. Model downloads can pull multi-gigabyte weights. Rendering and export can keep ffmpeg busy for minutes. If every button click spawned an unbounded thread, the app would not feel powerful. It would feel like a space heater with a UI.
The answer is backpressure: make the system admit how much work it can actually do, queue the rest, and surface the wait as part of the product instead of pretending everything is instant. MediaFind's implementation is intentionally small: no Redis, no Celery, no external broker. The core lives in mediafind/jobs.py, backed by SQLite rows and a fixed worker pool.
The pipeline pressure points
Local media AI creates several kinds of pressure at once:
- CPU pressure from decoding, OCR, embeddings, clustering, and Python glue code.
- Accelerator pressure when transcription, local LLMs, image models, or vision models compete for the same GPU/Neural Engine budget.
- Memory pressure because model weights and decoded media buffers are large, and two “reasonable” jobs can become unreasonable together.
- SQLite write pressure from progress updates, transcript rows, embeddings, facets, thumbnails, and job state.
- Attention pressure in the UI: if the user cannot tell what is running or cancel it, background work feels broken even when it is technically progressing.
That mix is why MediaFind treats “start a long operation” as an explicit job lifecycle, not a fire-and-forget helper thread.
A brokerless queue with durable job state
Every long operation is submitted as a job: index, download_batch, model_download, ann_rebuild, people_reindex, render jobs, export jobs, and more. Submission writes a row to a local SQLite jobs table with pending status, progress 0.0, a JSON payload, timestamps, and eventual error/result fields. The HTTP endpoint returns a job_id immediately, and the UI polls that id instead of keeping the request open.
SQLite is not pretending to be a distributed queue here. It is the durable job ledger: what was submitted, what is running, what finished, what failed, and what message the UI should show. Dispatch itself is an in-memory queue.Queue, which is fine because this is a single-process desktop app. On startup, any job left in running state from a process crash is marked failed rather than magically resumed from an unknown point.
The first backpressure valve: a fixed worker pool
The bluntest, most important control is MAX_CONCURRENT_JOBS. It defaults to 2 and can be overridden with MEDIAFIND_MAX_JOBS. That number caps the fixed worker pool and the number of handlers running at once. Extra submissions stay pending in the dispatch queue; they do not become one parked daemon thread per job.
This is less glamorous than an adaptive scheduler, but it solves the failure mode that matters most in a local app: bursts. A user can kick off indexing, a model download, a cleanup scan, and a render, but the app should not eagerly run all of them together just because the clicks arrived close together.
Progress is also the cancellation checkpoint
Every handler receives a callback shaped like progress(pct, msg). It writes a bounded percentage and a human-readable status message into the job row. It also checks the queue's cancellation set. If the user cancelled that job, the callback raises InterruptedError; the worker catches it, marks the row cancelled, and runs any registered cleanup.
That design makes cancellation cooperative. It is not a fantasy “kill this thread right now” button. It stops at the next safe checkpoint: between files during batch indexing, between progress lines from ffmpeg, or between measurable download polls. For model downloads, MediaFind is especially honest: the job is registered as non-cancellable because the underlying model fetch may be inside code that does not expose a safe cancellation hook.
The same callback drives better progress math. Indexing enumerates files up front, then each per-file stage reports its fraction within the file. The global percentage is computed as:
(file_index + stage_fraction) / total_files
The percentage is capped below 1.0 until final whole-library work finishes, such as rebuilding the fast search index. That small discipline avoids the classic “100% but still working” lie.
SQLite concurrency: enough, not infinite
Each operation opens its own short-lived SQLite connection with check_same_thread=False, PRAGMA journal_mode=WAL, and a busy_timeout. WAL lets the UI keep reading job status while worker threads write progress. The busy timeout turns transient write contention into waiting instead of “database is locked” failures.
But this is still SQLite. There is one writer at a time. The point of the worker cap is partly to protect the database from the app's own enthusiasm. Backpressure at the worker layer reduces write contention before the storage layer has to complain.
The UI backs off too
The activity dock is the user-visible half of the contract. It owns one client-side job store, one renderer, and one poller. When a job starts, the page polls /api/jobs?limit=20 every 1.5 seconds. Watched jobs that fall outside that newest-20 window are fetched directly by id, so a busy job list cannot starve the completion callback for a job the current page started.
When everything is done, the poller stops after two idle ticks. That is frontend backpressure: no WebSocket server, no always-hot timer, no background page burning cycles to rediscover that nothing is happening. The dock remains visible as an idle pill with recent activity, and the next tracked job kicks the poller back on.
The dock also turns backend progress messages into a useful mental model. For indexing, it parses messages like Embedding transcript for clip.mp4 (3/40), shows the current file, estimates remaining time, and lights up stage chips such as Transcript, Speakers, Visual, Scene text, Faces, and Summaries. The backend still sends plain strings; the UI gives them structure.
Stuck work is failed, not hidden
A watchdog thread periodically looks for jobs still marked running whose updated_at timestamp has gone stale. The timeout is configurable with MEDIAFIND_JOB_STUCK_AFTER_S. When a job crosses that threshold, the queue marks it failed with a WatchdogTimeout reason and runs registered cancel cleanup.
This is deliberately modest. The watchdog does not terminate an arbitrary OS thread stuck inside native code. Python cannot safely do that. What it can do is stop lying to the UI. A stalled job becomes visible as failed instead of sitting forever at 42%, and the user can decide whether to retry or restart.
The tradeoffs, in one table
| Pressure | MediaFind's control | Honest caveat |
|---|---|---|
| Too many heavy jobs | Fixed worker pool, default 2 | Not adaptive to thermals or model memory yet |
| Thread explosion | Pending ids wait in queue.Queue | Dispatch queue is in-memory, not crash-durable |
| Lost progress on reload | Job state stored in SQLite | Crash recovery marks running jobs failed; it does not resume mid-file |
| Unsafe cancellation | Cooperative checkpoint in progress() | C-level work stops only when control returns |
| SQLite lock contention | WAL, busy timeout, short connections, concurrency cap | SQLite still has a single writer |
| Polling overhead | One shared poller, self-throttles when idle | It is polling, not push |
| Stuck jobs | Watchdog marks stale running jobs failed | It does not kill a wedged native thread |
The design principle
Backpressure is not one trick. It is a chain of small refusals: do not run every job immediately, do not spawn a thread per click, do not hold the HTTP request open, do not let a cancelled row be overwritten by a late success, do not poll forever when idle, and do not pretend stuck work is still healthy.
That is the local AI difference. On a server, overload becomes an SRE problem. On a laptop, overload becomes the user's fan, battery, and frozen UI. Respecting that machine is part of the product.
Run the pipeline without handing it your whole machine
Transcribe, index, search, and export locally with progress you can see and work you can control.
Download for macOS