Search

One moment, five channels, one result: deduping search with rank fusion

Type one query and MediaFind asks it of a dozen different search channels at once. That's the point — but it means the same instant in the same file can come back from several of them: the visual channel sees the frame, OCR reads the caption burned into it, the logo channel spots a brand in the corner. Show all three and you've got one moment masquerading as three results. Here's how MediaFind collapses them into one — and why it merges by rank, not score.

MediaFind's search bar fans a single query out across many channels — transcript, semantic, visual/CLIP, OCR, logos, actions, scenes, faces, entities, and more. Each one returns its own ranked list of hits. Run them all and you get broad recall: whatever modality your memory of a moment lives in, some channel finds it. (We wrote about the channels themselves in One search bar, fourteen channels.)

The catch is overlap. A single video frame is fair game for the visual channel, the OCR channel, the logo channel, the action channel and the scene channel simultaneously. A single spoken sentence can surface from both the transcript channel and the phonetic channel. Concatenate the lists and the same moment appears two, three, five times — pushing genuinely different results off the first page. So before anything reaches your screen, the lists are fused: duplicates of the same moment merge into one result. Two questions decide how.

1. When are two hits “the same moment”?

Before you can merge duplicates you need a definition of duplicate — an identity that's stable no matter which channel produced the hit. MediaFind computes a fusion key per hit, and it takes one of three shapes depending on what the hit points at:

The first two are the whole game: they let a frame found five ways, or a clip found two ways, resolve to one key. The third is a deliberate safety floor — when there's no physical anchor to agree on, hits stay separate rather than risk a wrong merge.

2. How do you rank the merged list?

Here's the trap. Each channel scores its hits on its own scale — a CLIP cosine similarity, a BM25-style transcript score, a logo-margin number — and those scales are not comparable. You cannot just take the max score across channels and sort by it; you'd be comparing a temperature to a shoe size.

So MediaFind sorts by rank, not score, using Reciprocal Rank Fusion (RRF). Every hit contributes 1 / (K + rank) from each list it appears in, and those contributions sum. Two consequences fall out of that one line:

When two hits share a fusion key, the one that ranked best in its own channel becomes the representative — the card you actually see, and the modality badge it wears. The loser's RRF contribution still counts toward the merged score; it just doesn't drive the display.

visual rank 0 frame 142 ocr rank 3 frame 142 logo rank 1 frame 142 same key — ("frame", file, 142) score = 1/(K+0) + 1/(K+1) + 1/(K+3) representative = visual (best rank) carries: ocr_text, brand matched in 3 channels
One frame, three channels, one result. The hits share a fusion key, so their RRF contributions sum (corroboration lifts the moment up the list). The best-ranked hit — visual — becomes the card you see, while the merged result still carries OCR's on-screen text and the logo channel's brand, and wears a badge showing all three agreed.

Don't throw away the other channels' work

Picking one representative would be lossy if it stopped there. The visual channel knows the frame is visually relevant, but it's the OCR channel that read the caption and the logo channel that named the brand. Drop those and the result card gets thinner the more channels agreed — backwards.

So merging is additive. As hits fold together, the representative inherits any supplementary field it's missing from the others — on-screen text, detected brand, action and scene labels, object lists, thumbnails, the person's name, and so on. The rule is simple: a value already present on the representative always wins; the merge only ever fills gaps, never overwrites. The result also records which channels matched it, so the UI can show a “matched in N channels” badge — but only when more than one actually agreed, so a plain single-channel hit stays plain.

Why rank beats score, in one sentence. Channel scores measure different things in different units, so they can't be compared directly — but “came first in its list” means the same thing everywhere. Reciprocal rank fusion is what lets a dozen incomparable rankings vote on one merged order.

All of it, on your Mac

None of this is a re-ranking service or a cloud call. The channels run locally over the index MediaFind already built; the fusion is a few dozen lines of arithmetic over their ranked lists. Your query, the per-channel scores, and the merged results never leave the device — fusion is just bookkeeping on data you already own.


So when a single result quietly tells you it “matched in 3 channels,” that's the fusion step showing its work: several independent views of your library agreed on the same moment, and you got it once.

Search your library every way at once

A dozen channels, one ranked list, all on-device. Free trial.

Download for macOS