Find every name with keyless entity search

MediaFind's flagship trick is semantic search: type a meaning, get the moments that match it, even if they never use your exact words. That fuzziness is the whole point — “the part about budget cuts” should find a clip that says “we're trimming spend,” no keyword overlap required.

But the same fuzziness that makes meaning-search great makes name-search bad. Embeddings place words near other words that mean similar things, and proper nouns don't have meanings — they have referents. “Acme Corp,” “Acme Inc,” and “a generic widget company” all land near each other in vector space, so a cosine-similarity query for one happily returns the others. When you want every clip that mentions exactly Acme Corp — and nothing else, blur is the enemy.

The right tool for the job. Exact-name recall is precisely what embedding search is weakest at and what a literal index is best at. So entities don't ride the semantic channel — they get their own.

A separate channel in the registry

MediaFind's search is a channel registry: transcript, visual/CLIP, OCR, faces, logos, actions, audio events — each a self-contained retriever the dispatcher can scope and blend. Adding entities meant adding one more channel (plus a graph channel for the “who appears with whom” view). The interesting design isn't the plumbing — it's what fills the channel, and we did it in two phases, keyless-first.

Phase 1 matches text against gazetteers MediaFind already trusts — keyless, high-precision. Phase 2 swaps a real NER model in behind the same interface, with optional Wikidata linking. Everything downstream is unchanged.

Phase 1: a keyless gazetteer of names you already know

The dependency-free way to extract entities is not to run a model — it's to match text against lists of names MediaFind already trusts. We have four such lists lying around:

the bundled celebrity gallery (~1,000 notable public figures) — the source that makes the facet useful on a fresh library, before you've named anyone;
named people — clustered faces you've named, or that matched the gallery — each carrying a people.id;
named speakers from diarization, linked by speaker label;
detected logo brands, surfaced as ORG entities.

Match transcript and OCR text against those gazetteers and you get a literal entity index that nails exact-name recall. It trades coverage for two things that matter more here: precision (only real, known names — no half-guessed proper nouns) and free entity linking. Because every extracted entity is a name we already resolved, an ORG hit for a brand links to the logo, and a PERSON hit links straight back to a face cluster.

Search “Acme Corp” → the entity channel returns every clip whose transcript, captions, or on-screen text names it → and the result carries the brand's logo and a link to the face of the CEO who said it.

It also avoids a failure mode we've been bitten by repeatedly: a heavy, lazily-imported model that silently dies in the frozen app and returns nothing with no error. Phase 1 has no model to die.

Phase 2: open-vocabulary NER, behind the same seam

The gazetteer's limit is obvious — it can only find names it was told about. The vendor in your invoice video, the town in a travel clip, the startup nobody's named yet: all invisible to Phase 1. So Phase 2 adds the open-vocabulary half: a transformers token-classification model (dslim/bert-base-NER) that flags any person, organization, or location it reads — including LOC, a type the gazetteers don't produce at all.

Crucially, it lives behind the same build_gazetteer / extract seam. The table, the search channel, the graph — none of them know or care whether an entity came from a list or a model. And its design mirrors the embedder exactly, because we've learned what makes on-device ML survive packaging:

Property	How it behaves
Lazy, locked, single-shot load	Model loads on first use under a lock, then memoizes — later calls are free
Graceful fallback, never an error	If transformers/torch can't import or weights can't fetch, the backend resolves to `"none"` and extraction simply returns nothing
Reuses the bundled stack	It's a BERT model on the same transformers/torch already shipped for the embedder — no new heavy native dep, only (optionally) the weights

NER is additive, so its absence degrades quality — not function. Turn the model off, or run a frozen build that didn't bundle it, and you fall straight back to keyless gazetteer behavior. The feature gets less complete; it never breaks. That's the same default-to-keyless contract every modality in MediaFind honors.

Optional Wikidata linking: cautious by design

“Barack Obama” and “President Obama” are the same person; “Acme Corp” and “ACME” the same company. To recognize an entity across spelling variants, Phase 2 can resolve a surface name to a stable Wikidata QID — but this is the one place that touches the network, so it's wrapped in caution:

Off by default, network-gated. Controlled by ENTITY_LINKING_ENABLED. When off, linking is a no-op returning None — extraction, search, and the graph all work unchanged. Linking only ever adds a QID.
Keyless and stdlib-only. The public Wikidata wbsearchentities API over urllib — no token, no new dependency. A short timeout and a blanket except mean a slow, absent, or rate-limited Wikidata never blocks or breaks indexing; it just yields None.
Disk-cached, hits and misses alike. Every lookup is memoized to a JSON file in the data dir, so a re-index or a second mention costs nothing and we never ask Wikidata the same question twice. The cache is the rate-limit shield.

This is metadata enrichment, not a retrieval dependency. Search works the same whether or not a QID was ever found — linking just lets the same real-world entity dedup across variants and, later, carry a description and type.

The seam is the point

If there's one idea to take from this, it's the seam. Phase 1 shipped a useful, keyless entity facet on day one. Phase 2 made it open-vocabulary without rewriting anything downstream, because the contract — “give me the entities in this text” — never changed. That's how MediaFind ships fast and stays private at the same time: a strong keyless default, and a heavier model that slots in behind the same interface for anyone who opts in.

Names are the exact-recall end of search; meaning is the fuzzy end. Run both as separate channels and blend them, and you get a library you can interrogate either way — “the part about budget cuts” and “every clip that names Acme.”

Search your library by name and by meaning

Free trial. No account, no API keys, nothing uploaded.

Download for macOS

Keep reading

Search by meaning: embeddings, CLIP and a local vector index · Search Recognizing logos, actions & famous faces with zero training data · Recognition Who said it, who's in it — diarization & face recognition, privately · People & privacy

Find every mention of a name: keyless entity search, then open-vocab NER

A separate channel in the registry

Phase 1: a keyless gazetteer of names you already know

Phase 2: open-vocabulary NER, behind the same seam

Optional Wikidata linking: cautious by design

The seam is the point

Search your library by name and by meaning

Keep reading