Staying fast at 10,000 clips

When your backend is the user's laptop, you can't paper over a slow algorithm with a bigger instance. Every inefficiency is felt directly, on a machine that's also running their browser, their music, and forty other tabs. So MediaFind's performance work is mostly about complexity, not horsepower: making sure the cost of an operation grows slowly as the library grows. A handful of patterns do most of the work.

1 · Search that doesn't read the whole library: ANN

Semantic search compares your query vector against every clip's vector. Done exactly, that's a linear scan — fine at 500 segments, sluggish at 50,000. So beyond a threshold MediaFind switches to an approximate nearest-neighbor index (hnswlib, an HNSW graph). Instead of checking every vector, it walks a small-world graph and visits a tiny fraction of them, trading a sliver of recall for sublinear query time. In practice the top results are identical; the wall-clock isn't.

2 · Rebuilding the index without ever serving a half-built one

An ANN index isn't incrementally perfect — after enough churn it's worth rebuilding. The danger is the rebuild window: if search reads the index while it's being repopulated, you get missing or wrong results. The fix is an atomic swap. Build the new index off to the side, fully, then flip a single pointer. Readers either see the entire old index or the entire new one — never a partial state.

The rebuild happens beside the live index, not inside it. A single pointer flip makes the new index visible at once, so a search is never served mid-rebuild.

3 · Killing the N+1: one query, not a thousand

The home screen shows status for every item in the library. The naïve version asks the database once per item — a thousand items, a thousand round-trips, a screen that hitches as it grows. Classic N+1. The fix is unglamorous and enormously effective: fetch all of it in a single batched query and stitch it together in memory. The page load goes from “linear in library size” to “basically flat.”

4 · Cache the expensive sidebar: facets

The category and people facets — the counts next to each filter — are aggregations over the whole library. Recomputing them on every page view is wasteful, because they barely change between indexing runs. So they're cached and invalidated only when the underlying data actually moves. The common case (you're browsing, not indexing) pays nothing.

5 · Incremental, not quadratic: assigning new faces

Grouping faces into people is clustering, and clustering is tempting to redo from scratch each time. But re-clustering every face whenever one new video arrives is O(n²) — it gets quadratically slower as your people library grows, which is exactly backwards. So new faces are assigned to existing clusters incrementally (nearest existing person, or a new one if nothing's close) instead of reshuffling the whole set. Adding a video costs work proportional to that video, not to your entire history.

A scaling bug that only bites when frozen: the ANN library has to be bundled into the packaged app, or search silently falls back to the linear path and chokes past ~10k segments — looking like a mysterious slowdown rather than a missing dependency. It's the same “silently degraded in the frozen app” trap that haunts the ML stack; that whole bug class gets its own post.

6 · Get heavy work off the request path

Some operations are just expensive — re-categorizing the whole library, rebuilding an index. Those don't belong in the milliseconds between a click and a repaint, so they run as background jobs with progress, leaving the UI responsive. (The job queue itself is covered in the architecture deep dive.)

The playbook, at a glance

Operation	Naïve cost	What MediaFind does
Semantic search	O(n) scan	ANN graph — sublinear
Index rebuild	Readers see partial state	Build aside, atomic swap
Home status	N+1 queries	One batched query
Facet counts	Recompute every view	Cache + invalidate
Add faces	O(n²) re-cluster	Incremental assignment
Re-categorize	Blocks the UI	Background job

The throughline: respect the one machine

There's no autoscaler coming to save a local app. That constraint is freeing, in a way — it forces honest algorithms. Pick data structures whose cost grows slowly, never serve a half-built result, and push anything heavy into the background. Do that, and a ten-thousand-clip library feels the same as a ten-clip one: instant.

Point it at a big folder and feel it stay fast

Thousands of clips, fully local, still instant to search.

Download for macOS

Keep reading

How a folder becomes a searchable index — and stays fresh · Architecture It works on my Mac: shipping ML in a frozen app without silent failures · Packaging Search by meaning: embeddings, CLIP and a local vector index · Search

Staying fast at 10,000 clips: the local-performance playbook

1 · Search that doesn't read the whole library: ANN

2 · Rebuilding the index without ever serving a half-built one

3 · Killing the N+1: one query, not a thousand

4 · Cache the expensive sidebar: facets

5 · Incremental, not quadratic: assigning new faces

6 · Get heavy work off the request path

The playbook, at a glance

The throughline: respect the one machine

Point it at a big folder and feel it stay fast

Keep reading