Find every mention of a name: keyless entity search, then open-vocab NER
“Every clip that mentions Acme Corp.” It sounds like the easiest query in the product — and it's the one semantic search is worst at. Here's why exact names blur under embeddings, and the two-phase entity index MediaFind built to nail them.
MediaFind's flagship trick is semantic search: type a meaning, get the moments that match it, even if they never use your exact words. That fuzziness is the whole point — “the part about budget cuts” should find a clip that says “we're trimming spend,” no keyword overlap required.
But the same fuzziness that makes meaning-search great makes name-search bad. Embeddings place words near other words that mean similar things, and proper nouns don't have meanings — they have referents. “Acme Corp,” “Acme Inc,” and “a generic widget company” all land near each other in vector space, so a cosine-similarity query for one happily returns the others. When you want every clip that mentions exactly Acme Corp — and nothing else, blur is the enemy.
A separate channel in the registry
MediaFind's search is a channel registry: transcript, visual/CLIP, OCR, faces, logos, actions, audio events — each a self-contained retriever the dispatcher can scope and blend. Adding entities meant adding one more channel (plus a graph channel for the “who appears with whom” view). The interesting design isn't the plumbing — it's what fills the channel, and we did it in two phases, keyless-first.
Phase 1: a keyless gazetteer of names you already know
The dependency-free way to extract entities is not to run a model — it's to match text against lists of names MediaFind already trusts. We have four such lists lying around:
- the bundled celebrity gallery (~1,000 notable public figures) — the source that makes the facet useful on a fresh library, before you've named anyone;
- named people — clustered faces you've named, or that matched the gallery — each carrying a
people.id; - named speakers from diarization, linked by speaker label;
- detected logo brands, surfaced as
ORGentities.
Match transcript and OCR text against those gazetteers and you get a literal entity index that nails exact-name recall. It trades coverage for two things that matter more here: precision (only real, known names — no half-guessed proper nouns) and free entity linking. Because every extracted entity is a name we already resolved, an ORG hit for a brand links to the logo, and a PERSON hit links straight back to a face cluster.
Search “Acme Corp” → the entity channel returns every clip whose transcript, captions, or on-screen text names it → and the result carries the brand's logo and a link to the face of the CEO who said it.
It also avoids a failure mode we've been bitten by repeatedly: a heavy, lazily-imported model that silently dies in the frozen app and returns nothing with no error. Phase 1 has no model to die.
Phase 2: open-vocabulary NER, behind the same seam
The gazetteer's limit is obvious — it can only find names it was told about. The vendor in your invoice video, the town in a travel clip, the startup nobody's named yet: all invisible to Phase 1. So Phase 2 adds the open-vocabulary half: a transformers token-classification model (dslim/bert-base-NER) that flags any person, organization, or location it reads — including LOC, a type the gazetteers don't produce at all.
Crucially, it lives behind the same build_gazetteer / extract seam. The table, the search channel, the graph — none of them know or care whether an entity came from a list or a model. And its design mirrors the embedder exactly, because we've learned what makes on-device ML survive packaging:
| Property | How it behaves |
|---|---|
| Lazy, locked, single-shot load | Model loads on first use under a lock, then memoizes — later calls are free |
| Graceful fallback, never an error | If transformers/torch can't import or weights can't fetch, the backend resolves to "none" and extraction simply returns nothing |
| Reuses the bundled stack | It's a BERT model on the same transformers/torch already shipped for the embedder — no new heavy native dep, only (optionally) the weights |
Optional Wikidata linking: cautious by design
“Barack Obama” and “President Obama” are the same person; “Acme Corp” and “ACME” the same company. To recognize an entity across spelling variants, Phase 2 can resolve a surface name to a stable Wikidata QID — but this is the one place that touches the network, so it's wrapped in caution:
- Off by default, network-gated. Controlled by
ENTITY_LINKING_ENABLED. When off, linking is a no-op returningNone— extraction, search, and the graph all work unchanged. Linking only ever adds a QID. - Keyless and stdlib-only. The public Wikidata
wbsearchentitiesAPI overurllib— no token, no new dependency. A short timeout and a blanketexceptmean a slow, absent, or rate-limited Wikidata never blocks or breaks indexing; it just yieldsNone. - Disk-cached, hits and misses alike. Every lookup is memoized to a JSON file in the data dir, so a re-index or a second mention costs nothing and we never ask Wikidata the same question twice. The cache is the rate-limit shield.
This is metadata enrichment, not a retrieval dependency. Search works the same whether or not a QID was ever found — linking just lets the same real-world entity dedup across variants and, later, carry a description and type.
The seam is the point
If there's one idea to take from this, it's the seam. Phase 1 shipped a useful, keyless entity facet on day one. Phase 2 made it open-vocabulary without rewriting anything downstream, because the contract — “give me the entities in this text” — never changed. That's how MediaFind ships fast and stays private at the same time: a strong keyless default, and a heavier model that slots in behind the same interface for anyone who opts in.
Names are the exact-recall end of search; meaning is the fuzzy end. Run both as separate channels and blend them, and you get a library you can interrogate either way — “the part about budget cuts” and “every clip that names Acme.”
Search your library by name and by meaning
Free trial. No account, no API keys, nothing uploaded.
Download for macOS