Organization

Two ways to relate files: shared signals vs. semantic relatedness

MediaFind's knowledge map has a toggle most people miss. It can link two files because they literally share something — a speaker, a brand, a topic word — or because their summaries mean the same thing. Those are two different questions, and the answers rarely match. Here's what each one does, when it wins, and why both run entirely on your Mac.

Open the knowledge map and you get a graph of your library: each recording a node, lines between the ones that go together. But “go together” isn't one thing. A flip in the corner of the map switches between two definitions of related — shared signals and semantic relatedness — and they're built from completely different machinery. Knowing which is which turns the map from a pretty picture into a tool.

Shared signals: “these files literally have something in common”

The default mode is lexical. Two files are linked when they share a concrete, already-extracted attribute — and the edge can tell you exactly which one. MediaFind looks at five channels:

Each shared item adds to the edge weight, and the channels aren't equal: a shared entity counts most, a shared speaker or person next, a shared category less, and a shared keyword least. Two files that share a named brand and a speaker draw a heavier line than two that merely land in the same category. Because every edge carries the items that created it, the map can say why: “linked by Acme, Berlin, and Alice.” It's set overlap — exact, auditable, and impossible to hallucinate.

Semantic relatedness: “these files mean the same thing”

Flip the toggle and the rules change entirely. Now an edge appears when two files' summary embeddings are close in vector space — cosine similarity above a threshold. This reuses the per-file summary vectors MediaFind already stored during indexing; it embeds nothing new and downloads nothing at graph-build time. Run connected-components over the resulting graph and you get rough topic clusters for free.

The payoff is the thing lexical overlap can't do: it links files that are about the same subject even when they share no words. A clip that talks about “quarterly earnings” and one about “the Q3 numbers” have no keyword, category, speaker, or entity in common — the lexical map leaves them unconnected. In embedding space they sit right next to each other, so the semantic map draws the line.

file A “our quarterly earnings…” file B “the Q3 numbers came in…” shared signals — exact overlap? category · keyword · speaker · person · entity → no item in common no edge semantic relatedness — cosine of summaries cos(vec A, vec B) = 0.71 ≥ 0.35 ✓ edge
The same two files, two verdicts. Shared signals finds no overlapping category, keyword, speaker, or entity — so no line. Semantic relatedness sees their summary vectors sit close together and draws the edge. Different words, same meaning.

Which one is right?

Neither — they answer different questions, and the gap between them is the point.

A useful rule of thumb: reach for shared signals when you want to trace concrete threads — every recording a person was in, everything that mentions a brand. Reach for semantic relatedness when you're exploring — “what else is about this?” — and don't yet know the right keyword to ask for.

The same cosine, reused. Semantic relatedness isn't a one-off. The exact normalization, dimension-grouping, and similarity threshold that draw the map's semantic edges also power the “more like this” button — seed it with one file and get its closest neighbors. They share code on purpose, so “related” on the map and “related” in search agree edge-for-edge.

Both, on your Mac, from data you already have

The thing the two modes do share is the part that matters most: neither leaves the device, and neither needs new compute. Lexical edges read the categories, keywords, speakers, faces, and entities MediaFind already extracted. Semantic edges read the summary embeddings it already stored. No model is downloaded, no vector is computed, and not one byte is uploaded when you build either map. It's two questions asked of data you already own — and the answers stay yours.


So next time the knowledge map looks a little sparse, or a little surprising, check the toggle. You might just be asking it the other question.

See your library both ways

The knowledge map and “more like this” come from the same on-device indexing. Free trial.

Download for macOS