Two ways to relate files: shared signals vs. semantic relatedness
MediaFind's knowledge map has a toggle most people miss. It can link two files because they literally share something — a speaker, a brand, a topic word — or because their summaries mean the same thing. Those are two different questions, and the answers rarely match. Here's what each one does, when it wins, and why both run entirely on your Mac.
Open the knowledge map and you get a graph of your library: each recording a node, lines between the ones that go together. But “go together” isn't one thing. A flip in the corner of the map switches between two definitions of related — shared signals and semantic relatedness — and they're built from completely different machinery. Knowing which is which turns the map from a pretty picture into a tool.
Shared signals: “these files literally have something in common”
The default mode is lexical. Two files are linked when they share a concrete, already-extracted attribute — and the edge can tell you exactly which one. MediaFind looks at five channels:
- Category — both files fall under the same high-level category.
- Keyword — both surface the same salient word from their summaries (with generic, everywhere-words pruned so “the” doesn't link your whole library).
- Speaker — the same diarized voice appears in both.
- Person — the same clustered face appears in both (opt-in).
- Entity — both mention the same named person, place, or brand in the transcript or on-screen text.
Each shared item adds to the edge weight, and the channels aren't equal: a shared entity counts most, a shared speaker or person next, a shared category less, and a shared keyword least. Two files that share a named brand and a speaker draw a heavier line than two that merely land in the same category. Because every edge carries the items that created it, the map can say why: “linked by Acme, Berlin, and Alice.” It's set overlap — exact, auditable, and impossible to hallucinate.
Semantic relatedness: “these files mean the same thing”
Flip the toggle and the rules change entirely. Now an edge appears when two files' summary embeddings are close in vector space — cosine similarity above a threshold. This reuses the per-file summary vectors MediaFind already stored during indexing; it embeds nothing new and downloads nothing at graph-build time. Run connected-components over the resulting graph and you get rough topic clusters for free.
The payoff is the thing lexical overlap can't do: it links files that are about the same subject even when they share no words. A clip that talks about “quarterly earnings” and one about “the Q3 numbers” have no keyword, category, speaker, or entity in common — the lexical map leaves them unconnected. In embedding space they sit right next to each other, so the semantic map draws the line.
Which one is right?
Neither — they answer different questions, and the gap between them is the point.
- Shared signals is precise and explainable. An edge means a fact you can name and verify. It will never invent a connection. But it's blind to paraphrase: say it a different way and the link disappears.
- Semantic relatedness bridges vocabulary. It connects topics across different words and surfaces clusters you didn't know were there. The cost: it can only tell you “the summaries are similar,” not exactly why — and occasionally it calls two things close that you'd consider unrelated.
A useful rule of thumb: reach for shared signals when you want to trace concrete threads — every recording a person was in, everything that mentions a brand. Reach for semantic relatedness when you're exploring — “what else is about this?” — and don't yet know the right keyword to ask for.
Both, on your Mac, from data you already have
The thing the two modes do share is the part that matters most: neither leaves the device, and neither needs new compute. Lexical edges read the categories, keywords, speakers, faces, and entities MediaFind already extracted. Semantic edges read the summary embeddings it already stored. No model is downloaded, no vector is computed, and not one byte is uploaded when you build either map. It's two questions asked of data you already own — and the answers stay yours.
So next time the knowledge map looks a little sparse, or a little surprising, check the toggle. You might just be asking it the other question.
See your library both ways
The knowledge map and “more like this” come from the same on-device indexing. Free trial.
Download for macOS