Local AI

Which local LLM should you pick? A plain-language guide to the trade-offs

MediaFind lets you run an on-device model to make Ask and summaries read more fluently — but it asks you to pick a size first. Bigger looks better on paper and isn't always the right call. Here's what each tier actually costs, what it buys, and a one-line rule for choosing.

When you turn on MediaFind's optional on-device LLM, you're asked to choose a tier — from Mini up to Ultra. Most "which model?" advice online is written for people building AI systems, full of benchmark tables and quantization jargon. This isn't that. The only question that matters here is practical: what's the largest model that will run comfortably on your Mac — and do you even need it?

The honest answer is that the recommended middle option is right for most people, and the biggest model is right for fewer people than you'd think. Let's see why.

The four dials behind one choice

Every tier is a single point on four sliders that all move together. Understand these and the whole decision collapses into common sense:

Quality has a ceiling here — and that's by design. In MediaFind the model is a rephraser sitting after retrieval, not an oracle answering from memory. The keyless pipeline finds the segments that answer your question first; the LLM only makes them read nicely. So a bigger model gives you smoother prose and better cross-clip synthesis — but it can't invent facts your media doesn't contain, and it won't make a small library "smarter." The grounding is what keeps answers true; the model size only changes the polish.

The ladder, with real numbers

MediaFind's tiers are all the same well-regarded open model family (Qwen2.5-Instruct), just at different parameter counts, quantized to a compact Q4_K_M GGUF and run with llama.cpp. Here's the actual trade-off:

more download & RAM · more fluent, better synthesis → mini default keyless ~0.4 GB bundled mini 0.5B ~0.4 GB any laptop small 1.5B · ~1 GB recommended most Macs medium 3B · ~2 GB 8 GB+ RAM large 7B · ~4.7 GB 16 GB RAM ultra 14B · ~9 GB 16–32 GB + Metal Mini is bundled. Larger weights download once, then run fully offline. Switch anytime.
A ladder, not a catalogue. Mini is bundled and keyless; Small is the recommended upgrade. Higher tiers buy fluency and synthesis at a real cost in disk, RAM and speed.
TierSize / downloadRAM to keep freeBest for
Mini0.5B · ~0.4 GB bundled~1 GBOld or low-RAM Macs. Works out of the box; fine for short, single-clip answers.
Small1.5B · ~1 GB~2 GBMost people. Fluent, faithful answers; fast on nearly any machine. Start here.
Medium3B · ~2 GB~3 GB8 GB+ Macs that ask questions spanning several clips and want better synthesis.
Large7B · ~4.7 GB~6 GB16 GB Apple Silicon. Nuanced answers; noticeably slow on a plain CPU.
Ultra14B · ~9 GB11 GB+16–32 GB Macs with Metal. The most demanding questions, at the highest cost.

A rule that fits on one line

If you don't want to think about it: start on Small. It's the recommended tier because it's the point where answers become genuinely fluent without asking much of your machine. If answers feel thin when you're asking questions that span many clips, step up one tier. If your Mac feels sluggish while answering, step down one. You can change your mind at any time — switching tiers just downloads (or reuses) a different file.

Two longer rules of thumb behind that:

What you're not trading away

Whichever tier you choose, three things never change: it runs entirely on your Mac, it needs no API key or account, and downloaded tiers fetch once and then work offline. Mini remains the bundled fallback when you want the smallest footprint; larger tiers only change answer quality and speed.

Where to choose: open Settings → Ask & summaries model in MediaFind. Each tier shows its size, RAM hint and a one-line description right next to the button, and marks which ones you've already downloaded — so you're picking against real numbers, not guessing. Selecting a tier re-points both Ask and per-file summaries to it.

Picking the model is half the story; the other half is what it's grounded on. Next: how Ask retrieves the right few segments and ties every claim to a link at the exact second — the part that makes a fluent answer trustworthy.

Try it — pick a tier

Keyless on-device answers out of the box, larger local models when you want more synthesis. Everything runs on your Mac. Free trial.

Download for macOS