Infrastructure

Don't Melt the Laptop: backpressure for local AI media pipelines

A cloud worker can ask for a bigger box. A local AI app is sharing one laptop with everything else the user is doing. The job system has to say “not all at once,” keep progress visible, and recover honestly when work stalls.

MediaFind's heaviest paths are not polite little request handlers. Indexing can decode media with ffmpeg, transcribe audio, embed transcript chunks, sample video frames, run OCR, detect faces, classify scenes, and rebuild a vector index. Model downloads can pull multi-gigabyte weights. Rendering and export can keep ffmpeg busy for minutes. If every button click spawned an unbounded thread, the app would not feel powerful. It would feel like a space heater with a UI.

The answer is backpressure: make the system admit how much work it can actually do, queue the rest, and surface the wait as part of the product instead of pretending everything is instant. MediaFind's implementation is intentionally small: no Redis, no Celery, no external broker. The core lives in mediafind/jobs.py, backed by SQLite rows and a fixed worker pool.

The pipeline pressure points

Local media AI creates several kinds of pressure at once:

That mix is why MediaFind treats “start a long operation” as an explicit job lifecycle, not a fire-and-forget helper thread.

A brokerless queue with durable job state

Every long operation is submitted as a job: index, download_batch, model_download, ann_rebuild, people_reindex, render jobs, export jobs, and more. Submission writes a row to a local SQLite jobs table with pending status, progress 0.0, a JSON payload, timestamps, and eventual error/result fields. The HTTP endpoint returns a job_id immediately, and the UI polls that id instead of keeping the request open.

SQLite is not pretending to be a distributed queue here. It is the durable job ledger: what was submitted, what is running, what finished, what failed, and what message the UI should show. Dispatch itself is an in-memory queue.Queue, which is fine because this is a single-process desktop app. On startup, any job left in running state from a process crash is marked failed rather than magically resumed from an unknown point.

POST action returns job_id SQLite jobs row pending · progress · result WAL + busy timeout dispatch queue ids wait here worker 1 worker 2 progress() heartbeat + cancel check activity dock polls /api/jobs The row is durable; the dispatch queue is intentionally local and in-memory.
MediaFind uses SQLite as a durable job ledger and a fixed in-process worker pool as the execution boundary. That keeps the desktop app simple while still making long work observable.

The first backpressure valve: a fixed worker pool

The bluntest, most important control is MAX_CONCURRENT_JOBS. It defaults to 2 and can be overridden with MEDIAFIND_MAX_JOBS. That number caps the fixed worker pool and the number of handlers running at once. Extra submissions stay pending in the dispatch queue; they do not become one parked daemon thread per job.

This is less glamorous than an adaptive scheduler, but it solves the failure mode that matters most in a local app: bursts. A user can kick off indexing, a model download, a cleanup scan, and a render, but the app should not eagerly run all of them together just because the clicks arrived close together.

The deadlock rule: a job handler must not submit another job and block waiting for it. With a fixed pool, nested blocking jobs can consume every worker and leave the sub-jobs stranded behind them. The queue stays simple because handlers are leaf work.

Progress is also the cancellation checkpoint

Every handler receives a callback shaped like progress(pct, msg). It writes a bounded percentage and a human-readable status message into the job row. It also checks the queue's cancellation set. If the user cancelled that job, the callback raises InterruptedError; the worker catches it, marks the row cancelled, and runs any registered cleanup.

That design makes cancellation cooperative. It is not a fantasy “kill this thread right now” button. It stops at the next safe checkpoint: between files during batch indexing, between progress lines from ffmpeg, or between measurable download polls. For model downloads, MediaFind is especially honest: the job is registered as non-cancellable because the underlying model fetch may be inside code that does not expose a safe cancellation hook.

The same callback drives better progress math. Indexing enumerates files up front, then each per-file stage reports its fraction within the file. The global percentage is computed as:

(file_index + stage_fraction) / total_files

The percentage is capped below 1.0 until final whole-library work finishes, such as rebuilding the fast search index. That small discipline avoids the classic “100% but still working” lie.

SQLite concurrency: enough, not infinite

Each operation opens its own short-lived SQLite connection with check_same_thread=False, PRAGMA journal_mode=WAL, and a busy_timeout. WAL lets the UI keep reading job status while worker threads write progress. The busy timeout turns transient write contention into waiting instead of “database is locked” failures.

But this is still SQLite. There is one writer at a time. The point of the worker cap is partly to protect the database from the app's own enthusiasm. Backpressure at the worker layer reduces write contention before the storage layer has to complain.

The UI backs off too

The activity dock is the user-visible half of the contract. It owns one client-side job store, one renderer, and one poller. When a job starts, the page polls /api/jobs?limit=20 every 1.5 seconds. Watched jobs that fall outside that newest-20 window are fetched directly by id, so a busy job list cannot starve the completion callback for a job the current page started.

When everything is done, the poller stops after two idle ticks. That is frontend backpressure: no WebSocket server, no always-hot timer, no background page burning cycles to rediscover that nothing is happening. The dock remains visible as an idle pill with recent activity, and the next tracked job kicks the poller back on.

The dock also turns backend progress messages into a useful mental model. For indexing, it parses messages like Embedding transcript for clip.mp4 (3/40), shows the current file, estimates remaining time, and lights up stage chips such as Transcript, Speakers, Visual, Scene text, Faces, and Summaries. The backend still sends plain strings; the UI gives them structure.

Stuck work is failed, not hidden

A watchdog thread periodically looks for jobs still marked running whose updated_at timestamp has gone stale. The timeout is configurable with MEDIAFIND_JOB_STUCK_AFTER_S. When a job crosses that threshold, the queue marks it failed with a WatchdogTimeout reason and runs registered cancel cleanup.

This is deliberately modest. The watchdog does not terminate an arbitrary OS thread stuck inside native code. Python cannot safely do that. What it can do is stop lying to the UI. A stalled job becomes visible as failed instead of sitting forever at 42%, and the user can decide whether to retry or restart.

The tradeoffs, in one table

PressureMediaFind's controlHonest caveat
Too many heavy jobsFixed worker pool, default 2Not adaptive to thermals or model memory yet
Thread explosionPending ids wait in queue.QueueDispatch queue is in-memory, not crash-durable
Lost progress on reloadJob state stored in SQLiteCrash recovery marks running jobs failed; it does not resume mid-file
Unsafe cancellationCooperative checkpoint in progress()C-level work stops only when control returns
SQLite lock contentionWAL, busy timeout, short connections, concurrency capSQLite still has a single writer
Polling overheadOne shared poller, self-throttles when idleIt is polling, not push
Stuck jobsWatchdog marks stale running jobs failedIt does not kill a wedged native thread

The design principle

Backpressure is not one trick. It is a chain of small refusals: do not run every job immediately, do not spawn a thread per click, do not hold the HTTP request open, do not let a cancelled row be overwritten by a late success, do not poll forever when idle, and do not pretend stuck work is still healthy.

That is the local AI difference. On a server, overload becomes an SRE problem. On a laptop, overload becomes the user's fan, battery, and frozen UI. Respecting that machine is part of the product.

Run the pipeline without handing it your whole machine

Transcribe, index, search, and export locally with progress you can see and work you can control.

Download for macOS