How a folder becomes a searchable index — and stays fresh
Transcription, embeddings, and faces are the glamorous parts. The unglamorous part — a job queue, a local store, and a size-and-modified-time fingerprint check — is what lets you point MediaFind at a folder of 3,000 files and trust the results an hour later.
The individual models in MediaFind are the easy headline. The hard, boring engineering is everything around them: how thousands of files get processed without melting your laptop, where the results live, and — the part most tools get wrong — how the index stays correct when you add, edit, or delete files. This post is about that machinery.
The shape of the system
At a high level MediaFind is a pipeline with three layers: an intake that discovers work, a per-file pipeline that does the expensive ML, and a local index that everything queries. A small web UI and the CLI sit on top, talking to the same index.
The job queue: parallelism without the meltdown
Indexing is embarrassingly parallel across files but expensive per file. So intake turns each discovered file into a job, and a pool of workers drains the queue — bounded so we saturate your cores without swapping the machine to its knees. Jobs run asynchronously: the UI stays responsive and shows live progress while a backlog churns in the background.
The storage layer: metadata, vectors, and frames
The index isn't one thing; it's a few stores that play to their strengths:
- Structured metadata — files, segments, speakers, chapters, categories — lives in SQLite. It's transactional, file-based, and needs no server.
- Vectors — segment, CLIP, and face embeddings — are stored as compact float32 blobs in that same SQLite index. For a typical library a brute-force scan over them is plenty; past roughly ten thousand segments an optional on-disk ANN index (hnswlib) kicks in to keep nearest-neighbor lookups sublinear.
- Keyframes & thumbnails — extracted images — are cached on disk so the UI is instant and re-tagging never re-decodes video.
It all lives in one application-data directory you own. There is no cloud database and no embeddings API — a property you can verify, not just take on faith.
Staying fresh: the part most tools skip
A library is never static. You drop in new footage, re-export an edit, rename a folder. A naïve indexer either re-processes everything (wasteful) or trusts file paths (wrong the moment you edit a file in place). MediaFind keys on a fast (modified-time + size) fingerprint: if the file's size and modified-time are unchanged, the work is already done; if either moved, only that file is reprocessed.
This is also why a stale state can appear: if embeddings were built by an older model, or a file changed while a worker was busy, the index flags those entries and offers a one-click refresh rather than silently serving outdated results. Honest about what it knows, explicit about what needs redoing.
Why build it this way
Every architectural choice here bends toward the same two goals: scale on a laptop and never phone home. A bounded job queue keeps a big library tractable on consumer hardware. A file-based index keeps the whole thing portable and serverless. A cheap size-and-modified-time fingerprint keeps it correct over months of edits. None of it requires — or permits — a backend.
With a fresh, queryable index in place, MediaFind can do more than find things — it can organize them. Next up: zero-shot categories and a knowledge map that turns your library into a browsable graph.
Index a real library and see
Point it at a folder and watch it work — locally, with live progress. Free trial.
Download for macOS