The MediaFind blog

Building private, on-device media search

How we turn a folder of audio and video into a searchable library — using best-in-class open models that run entirely on your Mac. No cloud, no API keys, no telemetry.

01

System

The engineering of a local-first app — how media gets in, how a folder becomes a searchable index, how it stays fast at scale, and how it ships as a Mac app where everything actually works.

🗂️
Architecture

How a folder becomes a searchable index — and stays fresh

The job queue, the local store, and the content-hash check that keep a 3,000-file library correct without a backend.

Read the deep dive →
Performance

Staying fast at 10,000 clips: the local-performance playbook

ANN search with atomic index swaps, killing N+1 queries, cached facets, and incremental — not quadratic — face assignment.

Read the deep dive →
🧯
Infrastructure

Don't Melt the Laptop: backpressure for local AI media pipelines

Bounded workers, SQLite-backed job state, cooperative cancellation, stuck-job detection, and a self-throttling activity UI.

Read the deep dive →
🖥️
Platform

Native on every platform: how MediaFind ships for Mac, Windows, and Linux

Three OS-specific bugs, a Tauri native shell, and the release pipeline that delivers real installers on all three platforms.

Read the deep dive →
📦
Packaging

It works on my Mac: shipping ML in a frozen app without silent failures

When lazy imports meet PyInstaller's tree-shaking, features quietly return empty. The bug class — and the CI test that ends it.

Read the deep dive →
📊
Quality engineering

Did this commit make search worse? A per-commit eval harness with a regression gate

IR metrics, bootstrap confidence intervals and a permutation test, latency percentiles — and a CI gate that fails the build on a regression.

Read the deep dive →
🔁
Quality engineering

Continuous batch QA: rolling loops, not nightly runs

Parallel inspector roles, a risk-triggered escalation table, and a tight discovery-to-fix loop that surfaces this commit's bugs in under 30 minutes.

Read the deep dive →
02

Understanding your media

The on-device models that turn raw audio and video into something you can search — speech, people, and objects.

🎙️
Transcription

How MediaFind transcribes your media entirely on-device with Whisper

From ffmpeg decode to word-level timestamps — the speech-to-text pipeline that never sends a byte to the cloud.

Read the deep dive →
⚙️
Transcription

Pick your Whisper: model tiers and a CoreML engine for Apple Silicon

Two dials most apps hide — model size and the engine that runs it — surfaced as a one-time picker, with a Metal + Neural-Engine path.

Read the deep dive →
🎚️
Guide

Which Whisper model should you pick? An ASR model-selection guide

tiny…large-v3 as a plain-language decision: the accuracy, speed, RAM and disk trade-offs, and how to match one to your audio, language and Mac.

Read the guide →
🗣️
People & privacy

Who said it, who's in it — diarization & face recognition, privately

Speaker diarization and an opt-in face library that label your media without anything ever leaving the machine.

Read the deep dive →
🏷️
Recognition

Recognizing logos, actions & famous faces with zero training data

One zero-shot trick — CLIP plus a confidence gate and a bundled face gallery — finds brands, activities, and public figures you never tagged.

Read the deep dive →
🎵
Audio understanding

Finding the music: on-device song detection & same-track clustering

Keyless DSP that labels the musical stretches of your clips and clusters the ones sharing a track — no fingerprint API, nothing uploaded.

Read the deep dive →
03

Search, ask & organize

Finding the exact moment you need — by meaning, by question, or by browsing an auto-organized library.

🔤
Search

OCR search: find text on screen across your entire video library

Lower-thirds, slide titles, product labels, street signs — MediaFind reads every frame at index time so on-screen text is searchable in seconds, on your Mac.

Read →
🔍
Search

Search by meaning: embeddings, CLIP and a local vector index

Why “a rocket blasting off” finds the right clip even when nobody said those words — semantic text, visual, and OCR search combined.

Read the deep dive →
🪪
Search

Find every mention of a name: keyless entity search, then open-vocab NER

Semantic search blurs exact names. A literal entity index nails them — keyless gazetteer first, then an optional NER model and Wikidata linking.

Read the deep dive →
🛰️
Search

One search bar, fourteen channels: how MediaFind finds anything

Said, shown, written, who's in it, what's happening, where, which brand, what sound — a tour of every search channel, which run by default, and how they fuse into one ranked list.

Read the deep dive →
🧲
Search

One moment, five channels, one result: deduping with rank fusion

The same frame found by visual, OCR and logo shouldn't be three results. How a fusion key spots duplicates, why merging goes by rank not score, and how every channel's annotations survive.

Read the deep dive →
💬
Ask

Ask your library: local RAG over your own media, with citations

Retrieve the few segments that matter, ground a local model on them, and answer with a link to the exact second — no upload.

Read the deep dive →
🧠
Local AI

A local LLM, downloaded once: fluent Ask & summaries that stay keyless

An opt-in on-device model tier rephrases grounded answers into prose — GGUF weights, run with llama.cpp, with a keyless fallback if it's off.

Read the deep dive →
⚖️
Local AI

Which local LLM should you pick? A plain-language guide to the trade-offs

What download size, RAM, speed and quality really cost — and a one-line rule for choosing the right model tier for your Mac.

Read the guide →
🎞️
Workflow

From search to timeline: exporting to Premiere, Resolve & Final Cut

Turn found moments into a frame-accurate FCPXML or EDL sequence — drop-frame timecode, markers, and a portable proxy bundle.

Read the deep dive →
🕸️
Organization

Auto-organizing a messy library: zero-shot categories & a knowledge map

Tagging with no training data, a confidence gate that kills false positives, and a knowledge graph of your library.

Read the deep dive →
🔗
Organization

Two ways to relate files: shared signals vs. semantic relatedness

The knowledge map links files two ways — by what they literally share, and by what their summaries mean. When each wins, and why both stay on-device.

Read the deep dive →
🎬
Editing

From transcript to highlight reel: chapters, key moments & query reels

Grouping segments into skimmable chapters and stitching a one-take reel — cut on word boundaries, never uploaded.

Read the deep dive →
📏
Evaluation

Is the search any good? Measuring quality — and guarding it

Scoring search & Ask on every commit with the standard IR metrics, confidence intervals, a quality-vs-latency dashboard, and a regression gate that fails the build.

Read the deep dive →
04

Privacy & security

The promises behind “on-device,” made concrete — what stays home, what you can verify, and how the one outbound path is hardened.

🔒
Privacy

Private by default — and a command that proves it

No accounts, no API keys, no telemetry — plus an audit that confirms the core path opens zero external sockets. A check, not a claim.

Read the deep dive →
🗄️
Data control

Your library lives on your disk — and erases on your command

A tour of the local data folder — SQLite index, thumbnails, prefs — and the delete paths that let you wipe the sensitive parts yourself.

Read the deep dive →
🛡️
Security

Pulling video off the web, safely: SSRF guards & a loopback-only server

The one path that touches the internet, hardened: a connect-time IP guard, hostname-verified TLS, and a server that listens only on loopback.

Read the deep dive →
05

Workflow

How real editors and creators use MediaFind — cutting the time between shoot and cut.

⬇️
Import

Bringing the web into your library: the on-device video downloader

Paste a link, get a searchable clip — yt-dlp for known sites, an HTML-scrape fallback for the rest, then the same private pipeline.

Read the deep dive →
🎬
Editing

5 ways editors waste hours finding B-roll (and how to stop)

Scrubbing by eye, vague filenames, no face or voice search — the five habits that quietly eat your day, and a faster approach for each one.

Read →