A local LLM for Ask and summaries

MediaFind's Ask feature is keyless by default, and that's a real design stance, not a limitation we're apologizing for. It retrieves the handful of segments that actually answer your question, sends only that local context to the bundled Mini model, and shows you the exact source moments behind the answer.

That's the right default. Mini is small enough to ship in the app, and it is also intentionally modest. When you want smoother meeting summaries or better synthesis across several clips, MediaFind offers larger local tiers that run after retrieval and stay grounded in the same cited segments.

One choice, not a model zoo

The hardest thing about local LLMs isn't running them — it's choosing one. There are thousands of checkpoints, a dozen quantizations each, and quality-versus-RAM trade-offs that mean nothing to most people. MediaFind collapses all of that into a single decision: pick a tier.

Each tier is a coarse, honest trade-off — download size and RAM on one side, answer fluency on the other — labelled in plain language so a first-run picker can show a one-line description instead of a spec sheet:

The picker offers a ladder, not a catalogue. Mini is bundled and keyless; Small is the one-tap quality upgrade. Higher tiers trade more disk and RAM for better synthesis. The coarseness is the point: one honest line per choice beats scrolling a model hub.

Open weights, downloaded once, then offline

The posture here matches the rest of MediaFind: open weights running entirely on your machine. Mini is bundled in the app; larger tiers are fetched once into a local cache and then run with no further network access. The models are quantized GGUF files and they run through llama.cpp — the same battle-tested, CPU-and-Metal engine the local-LLM community standardized on. The download itself is resumable and integrity-checked, so a dropped connection doesn't corrupt anything.

Crucially, the model is a rephraser sitting after retrieval, not an oracle answering from its own memory. The pipeline still does the keyless work first — find the segments that answer the question — and only then hands the model that grounded material to phrase nicely. That ordering is what keeps the fluent answer tethered to what's actually in your media instead of to whatever the model happened to absorb in pretraining.

Everything stays keyless. Mini ships inside the app, so packaged builds can answer and summarize immediately without a download or account. If a dev build has no model available, the API returns a neutral model-required response instead of inventing an answer.

It flips Ask and summaries together

Selecting a tier does more than unlock a toggle. It re-points the backends that both Ask and the per-file summaries use, so the same local model quietly powers both surfaces. Ask shows the selected tier, and the summaries view shows a badge telling you which engine wrote what. (A legacy Ollama path remains as a fallback for people who already run a local server, but the GGUF tier is the primary, zero-setup route.)

Why keep it optional

Plenty of apps would hide the model details and call it "AI." MediaFind keeps the ladder visible for two reasons. First, honesty: generated prose is grounded only when you can inspect the source moments, so citations stay attached. Second, respect for your machine: Mini is small enough to bundle, while larger models mean real disk, RAM and speed trade-offs. Upgrade when you want more synthesis; stay on Mini when you want the smallest keyless footprint.

A local model makes answers read better; the citations underneath them are what make answers true. Next: how Ask retrieves the right segments and grounds every claim in a link to the exact second.

Ask your library — your way

Keyless on-device answers out of the box, larger local models when you want more synthesis. Everything runs on your Mac. Free trial.

Download for macOS

Keep reading

Ask your library: local RAG over your own media, with citations · Ask How MediaFind transcribes your media entirely on-device with Whisper · Transcription Private by default — and a command that proves it · Privacy

A local LLM, downloaded once: fluent Ask & summaries that stay keyless

One choice, not a model zoo

Open weights, downloaded once, then offline

It flips Ask and summaries together

Why keep it optional

Ask your library — your way

Keep reading