A local LLM, downloaded once: fluent Ask & summaries that stay keyless
MediaFind answers questions about your library without any model at all — it quotes the segments that matter. But sometimes you want prose, not quotes. So there's an opt-in on-device LLM tier: pick one model, download it once, and get more fluent answers that are still grounded in your own media and still never leave your Mac.
MediaFind's Ask feature is keyless by default, and that's a real design stance, not a limitation we're apologizing for. It retrieves the handful of segments that actually answer your question, sends only that local context to the bundled Mini model, and shows you the exact source moments behind the answer.
That's the right default. Mini is small enough to ship in the app, and it is also intentionally modest. When you want smoother meeting summaries or better synthesis across several clips, MediaFind offers larger local tiers that run after retrieval and stay grounded in the same cited segments.
One choice, not a model zoo
The hardest thing about local LLMs isn't running them — it's choosing one. There are thousands of checkpoints, a dozen quantizations each, and quality-versus-RAM trade-offs that mean nothing to most people. MediaFind collapses all of that into a single decision: pick a tier.
Each tier is a coarse, honest trade-off — download size and RAM on one side, answer fluency on the other — labelled in plain language so a first-run picker can show a one-line description instead of a spec sheet:
Open weights, downloaded once, then offline
The posture here matches the rest of MediaFind: open weights running entirely on your machine. Mini is bundled in the app; larger tiers are fetched once into a local cache and then run with no further network access. The models are quantized GGUF files and they run through llama.cpp — the same battle-tested, CPU-and-Metal engine the local-LLM community standardized on. The download itself is resumable and integrity-checked, so a dropped connection doesn't corrupt anything.
Crucially, the model is a rephraser sitting after retrieval, not an oracle answering from its own memory. The pipeline still does the keyless work first — find the segments that answer the question — and only then hands the model that grounded material to phrase nicely. That ordering is what keeps the fluent answer tethered to what's actually in your media instead of to whatever the model happened to absorb in pretraining.
It flips Ask and summaries together
Selecting a tier does more than unlock a toggle. It re-points the backends that both Ask and the per-file summaries use, so the same local model quietly powers both surfaces. Ask shows the selected tier, and the summaries view shows a badge telling you which engine wrote what. (A legacy Ollama path remains as a fallback for people who already run a local server, but the GGUF tier is the primary, zero-setup route.)
Why keep it optional
Plenty of apps would hide the model details and call it "AI." MediaFind keeps the ladder visible for two reasons. First, honesty: generated prose is grounded only when you can inspect the source moments, so citations stay attached. Second, respect for your machine: Mini is small enough to bundle, while larger models mean real disk, RAM and speed trade-offs. Upgrade when you want more synthesis; stay on Mini when you want the smallest keyless footprint.
A local model makes answers read better; the citations underneath them are what make answers true. Next: how Ask retrieves the right segments and grounds every claim in a link to the exact second.
Ask your library — your way
Keyless on-device answers out of the box, larger local models when you want more synthesis. Everything runs on your Mac. Free trial.
Download for macOS