Search

Find every mention of a name: keyless entity search, then open-vocab NER

“Every clip that mentions Acme Corp.” It sounds like the easiest query in the product — and it's the one semantic search is worst at. Here's why exact names blur under embeddings, and the two-phase entity index MediaFind built to nail them.

MediaFind's flagship trick is semantic search: type a meaning, get the moments that match it, even if they never use your exact words. That fuzziness is the whole point — “the part about budget cuts” should find a clip that says “we're trimming spend,” no keyword overlap required.

But the same fuzziness that makes meaning-search great makes name-search bad. Embeddings place words near other words that mean similar things, and proper nouns don't have meanings — they have referents. “Acme Corp,” “Acme Inc,” and “a generic widget company” all land near each other in vector space, so a cosine-similarity query for one happily returns the others. When you want every clip that mentions exactly Acme Corp — and nothing else, blur is the enemy.

The right tool for the job. Exact-name recall is precisely what embedding search is weakest at and what a literal index is best at. So entities don't ride the semantic channel — they get their own.

A separate channel in the registry

MediaFind's search is a channel registry: transcript, visual/CLIP, OCR, faces, logos, actions, audio events — each a self-contained retriever the dispatcher can scope and blend. Adding entities meant adding one more channel (plus a graph channel for the “who appears with whom” view). The interesting design isn't the plumbing — it's what fills the channel, and we did it in two phases, keyless-first.

transcript + OCR text Phase 1 · Gazetteer match known names keyless · exact recall drawn from celebrity gallery named people named speakers logo brands Phase 2 · NER open-vocab model any PERSON/ORG/LOC Wikidata QID link same seam · additive
Phase 1 matches text against gazetteers MediaFind already trusts — keyless, high-precision. Phase 2 swaps a real NER model in behind the same interface, with optional Wikidata linking. Everything downstream is unchanged.

Phase 1: a keyless gazetteer of names you already know

The dependency-free way to extract entities is not to run a model — it's to match text against lists of names MediaFind already trusts. We have four such lists lying around:

Match transcript and OCR text against those gazetteers and you get a literal entity index that nails exact-name recall. It trades coverage for two things that matter more here: precision (only real, known names — no half-guessed proper nouns) and free entity linking. Because every extracted entity is a name we already resolved, an ORG hit for a brand links to the logo, and a PERSON hit links straight back to a face cluster.

Search “Acme Corp” → the entity channel returns every clip whose transcript, captions, or on-screen text names it → and the result carries the brand's logo and a link to the face of the CEO who said it.

It also avoids a failure mode we've been bitten by repeatedly: a heavy, lazily-imported model that silently dies in the frozen app and returns nothing with no error. Phase 1 has no model to die.

Phase 2: open-vocabulary NER, behind the same seam

The gazetteer's limit is obvious — it can only find names it was told about. The vendor in your invoice video, the town in a travel clip, the startup nobody's named yet: all invisible to Phase 1. So Phase 2 adds the open-vocabulary half: a transformers token-classification model (dslim/bert-base-NER) that flags any person, organization, or location it reads — including LOC, a type the gazetteers don't produce at all.

Crucially, it lives behind the same build_gazetteer / extract seam. The table, the search channel, the graph — none of them know or care whether an entity came from a list or a model. And its design mirrors the embedder exactly, because we've learned what makes on-device ML survive packaging:

PropertyHow it behaves
Lazy, locked, single-shot loadModel loads on first use under a lock, then memoizes — later calls are free
Graceful fallback, never an errorIf transformers/torch can't import or weights can't fetch, the backend resolves to "none" and extraction simply returns nothing
Reuses the bundled stackIt's a BERT model on the same transformers/torch already shipped for the embedder — no new heavy native dep, only (optionally) the weights
NER is additive, so its absence degrades quality — not function. Turn the model off, or run a frozen build that didn't bundle it, and you fall straight back to keyless gazetteer behavior. The feature gets less complete; it never breaks. That's the same default-to-keyless contract every modality in MediaFind honors.

Optional Wikidata linking: cautious by design

“Barack Obama” and “President Obama” are the same person; “Acme Corp” and “ACME” the same company. To recognize an entity across spelling variants, Phase 2 can resolve a surface name to a stable Wikidata QID — but this is the one place that touches the network, so it's wrapped in caution:

This is metadata enrichment, not a retrieval dependency. Search works the same whether or not a QID was ever found — linking just lets the same real-world entity dedup across variants and, later, carry a description and type.

The seam is the point

If there's one idea to take from this, it's the seam. Phase 1 shipped a useful, keyless entity facet on day one. Phase 2 made it open-vocabulary without rewriting anything downstream, because the contract — “give me the entities in this text” — never changed. That's how MediaFind ships fast and stays private at the same time: a strong keyless default, and a heavier model that slots in behind the same interface for anyone who opts in.


Names are the exact-recall end of search; meaning is the fuzzy end. Run both as separate channels and blend them, and you get a library you can interrogate either way — “the part about budget cuts” and “every clip that names Acme.”

Search your library by name and by meaning

Free trial. No account, no API keys, nothing uploaded.

Download for macOS