Don't Melt the Laptop: backpressure for local AI media pipelines

MediaFind's heaviest paths are not polite little request handlers. Indexing can decode media with ffmpeg, transcribe audio, embed transcript chunks, sample video frames, run OCR, detect faces, classify scenes, and rebuild a vector index. Model downloads can pull multi-gigabyte weights. Rendering and export can keep ffmpeg busy for minutes. If every button click spawned an unbounded thread, the app would not feel powerful. It would feel like a space heater with a UI.

The answer is backpressure: make the system admit how much work it can actually do, queue the rest, and surface the wait as part of the product instead of pretending everything is instant. MediaFind's implementation is intentionally small: no Redis, no Celery, no external broker. The core lives in mediafind/jobs.py, backed by SQLite rows and a fixed worker pool.

The pipeline pressure points

Local media AI creates several kinds of pressure at once:

CPU pressure from decoding, OCR, embeddings, clustering, and Python glue code.
Accelerator pressure when transcription, local LLMs, image models, or vision models compete for the same GPU/Neural Engine budget.
Memory pressure because model weights and decoded media buffers are large, and two “reasonable” jobs can become unreasonable together.
SQLite write pressure from progress updates, transcript rows, embeddings, facets, thumbnails, and job state.
Attention pressure in the UI: if the user cannot tell what is running or cancel it, background work feels broken even when it is technically progressing.

That mix is why MediaFind treats “start a long operation” as an explicit job lifecycle, not a fire-and-forget helper thread.

A brokerless queue with durable job state

Every long operation is submitted as a job: index, download_batch, model_download, ann_rebuild, people_reindex, render jobs, export jobs, and more. Submission writes a row to a local SQLite jobs table with pending status, progress 0.0, a JSON payload, timestamps, and eventual error/result fields. The HTTP endpoint returns a job_id immediately, and the UI polls that id instead of keeping the request open.

SQLite is not pretending to be a distributed queue here. It is the durable job ledger: what was submitted, what is running, what finished, what failed, and what message the UI should show. Dispatch itself is an in-memory queue.Queue, which is fine because this is a single-process desktop app. On startup, any job left in running state from a process crash is marked failed rather than magically resumed from an unknown point.

MediaFind uses SQLite as a durable job ledger and a fixed in-process worker pool as the execution boundary. That keeps the desktop app simple while still making long work observable.

The first backpressure valve: a fixed worker pool

The bluntest, most important control is MAX_CONCURRENT_JOBS. It defaults to 2 and can be overridden with MEDIAFIND_MAX_JOBS. That number caps the fixed worker pool and the number of handlers running at once. Extra submissions stay pending in the dispatch queue; they do not become one parked daemon thread per job.

This is less glamorous than an adaptive scheduler, but it solves the failure mode that matters most in a local app: bursts. A user can kick off indexing, a model download, a cleanup scan, and a render, but the app should not eagerly run all of them together just because the clicks arrived close together.

The deadlock rule: a job handler must not submit another job and block waiting for it. With a fixed pool, nested blocking jobs can consume every worker and leave the sub-jobs stranded behind them. The queue stays simple because handlers are leaf work.

Progress is also the cancellation checkpoint

Every handler receives a callback shaped like progress(pct, msg). It writes a bounded percentage and a human-readable status message into the job row. It also checks the queue's cancellation set. If the user cancelled that job, the callback raises InterruptedError; the worker catches it, marks the row cancelled, and runs any registered cleanup.

That design makes cancellation cooperative. It is not a fantasy “kill this thread right now” button. It stops at the next safe checkpoint: between files during batch indexing, between progress lines from ffmpeg, or between measurable download polls. For model downloads, MediaFind is especially honest: the job is registered as non-cancellable because the underlying model fetch may be inside code that does not expose a safe cancellation hook.

The same callback drives better progress math. Indexing enumerates files up front, then each per-file stage reports its fraction within the file. The global percentage is computed as:

(file_index + stage_fraction) / total_files

The percentage is capped below 1.0 until final whole-library work finishes, such as rebuilding the fast search index. That small discipline avoids the classic “100% but still working” lie.

SQLite concurrency: enough, not infinite

Each operation opens its own short-lived SQLite connection with check_same_thread=False, PRAGMA journal_mode=WAL, and a busy_timeout. WAL lets the UI keep reading job status while worker threads write progress. The busy timeout turns transient write contention into waiting instead of “database is locked” failures.

But this is still SQLite. There is one writer at a time. The point of the worker cap is partly to protect the database from the app's own enthusiasm. Backpressure at the worker layer reduces write contention before the storage layer has to complain.

The UI backs off too

The activity dock is the user-visible half of the contract. It owns one client-side job store, one renderer, and one poller. When a job starts, the page polls /api/jobs?limit=20 every 1.5 seconds. Watched jobs that fall outside that newest-20 window are fetched directly by id, so a busy job list cannot starve the completion callback for a job the current page started.

When everything is done, the poller stops after two idle ticks. That is frontend backpressure: no WebSocket server, no always-hot timer, no background page burning cycles to rediscover that nothing is happening. The dock remains visible as an idle pill with recent activity, and the next tracked job kicks the poller back on.

The dock also turns backend progress messages into a useful mental model. For indexing, it parses messages like Embedding transcript for clip.mp4 (3/40), shows the current file, estimates remaining time, and lights up stage chips such as Transcript, Speakers, Visual, Scene text, Faces, and Summaries. The backend still sends plain strings; the UI gives them structure.

Stuck work is failed, not hidden

A watchdog thread periodically looks for jobs still marked running whose updated_at timestamp has gone stale. The timeout is configurable with MEDIAFIND_JOB_STUCK_AFTER_S. When a job crosses that threshold, the queue marks it failed with a WatchdogTimeout reason and runs registered cancel cleanup.

This is deliberately modest. The watchdog does not terminate an arbitrary OS thread stuck inside native code. Python cannot safely do that. What it can do is stop lying to the UI. A stalled job becomes visible as failed instead of sitting forever at 42%, and the user can decide whether to retry or restart.

The tradeoffs, in one table

Pressure	MediaFind's control	Honest caveat
Too many heavy jobs	Fixed worker pool, default `2`	Not adaptive to thermals or model memory yet
Thread explosion	Pending ids wait in `queue.Queue`	Dispatch queue is in-memory, not crash-durable
Lost progress on reload	Job state stored in SQLite	Crash recovery marks running jobs failed; it does not resume mid-file
Unsafe cancellation	Cooperative checkpoint in `progress()`	C-level work stops only when control returns
SQLite lock contention	WAL, busy timeout, short connections, concurrency cap	SQLite still has a single writer
Polling overhead	One shared poller, self-throttles when idle	It is polling, not push
Stuck jobs	Watchdog marks stale running jobs failed	It does not kill a wedged native thread

The design principle

Backpressure is not one trick. It is a chain of small refusals: do not run every job immediately, do not spawn a thread per click, do not hold the HTTP request open, do not let a cancelled row be overwritten by a late success, do not poll forever when idle, and do not pretend stuck work is still healthy.

That is the local AI difference. On a server, overload becomes an SRE problem. On a laptop, overload becomes the user's fan, battery, and frozen UI. Respecting that machine is part of the product.

Run the pipeline without handing it your whole machine

Transcribe, index, search, and export locally with progress you can see and work you can control.

Download for macOS

Keep reading

How a folder becomes a searchable index — and stays fresh · Architecture Staying fast at 10,000 clips: the local-performance playbook · Performance It works on my Mac: shipping ML in a frozen app without silent failures · Packaging