One moment, five channels, one result: dedup by rank fusion

MediaFind's search bar fans a single query out across many channels — transcript, semantic, visual/CLIP, OCR, logos, actions, scenes, faces, entities, and more. Each one returns its own ranked list of hits. Run them all and you get broad recall: whatever modality your memory of a moment lives in, some channel finds it. (We wrote about the channels themselves in One search bar, fourteen channels.)

The catch is overlap. A single video frame is fair game for the visual channel, the OCR channel, the logo channel, the action channel and the scene channel simultaneously. A single spoken sentence can surface from both the transcript channel and the phonetic channel. Concatenate the lists and the same moment appears two, three, five times — pushing genuinely different results off the first page. So before anything reaches your screen, the lists are fused: duplicates of the same moment merge into one result. Two questions decide how.

1. When are two hits “the same moment”?

Before you can merge duplicates you need a definition of duplicate — an identity that's stable no matter which channel produced the hit. MediaFind computes a fusion key per hit, and it takes one of three shapes depending on what the hit points at:

A frame. The visual, OCR, logo, action, scene, object, and color channels all point at a specific video frame, so they key on (media_path, frame_index). Same file, same frame number → same moment, regardless of which channel got there.
A transcript clip. The transcript, emotion, and phonetic channels point at a span of speech, so they key on (media_path, start, end) with the timestamps rounded to two decimals — close enough that the same sentence from two channels lands on the same key.
Everything else. File-level summaries, plus face and transcript-sourced entity appearances that only carry a synthesized id, keep their own per-channel (modality, id) identity — so unrelated hits never accidentally collapse together.

The first two are the whole game: they let a frame found five ways, or a clip found two ways, resolve to one key. The third is a deliberate safety floor — when there's no physical anchor to agree on, hits stay separate rather than risk a wrong merge.

2. How do you rank the merged list?

Here's the trap. Each channel scores its hits on its own scale — a CLIP cosine similarity, a BM25-style transcript score, a logo-margin number — and those scales are not comparable. You cannot just take the max score across channels and sort by it; you'd be comparing a temperature to a shoe size.

So MediaFind sorts by rank, not score, using Reciprocal Rank Fusion (RRF). Every hit contributes 1 / (K + rank) from each list it appears in, and those contributions sum. Two consequences fall out of that one line:

Corroboration is rewarded. A frame that ranked well in three channels accumulates three contributions and floats to the top — exactly the moment most likely to be what you meant. Agreement across channels is evidence, and RRF spends it.
Scales never collide. Only the position in each list matters, so a channel with tiny scores and one with huge scores combine cleanly. No normalization, no per-channel tuning.

When two hits share a fusion key, the one that ranked best in its own channel becomes the representative — the card you actually see, and the modality badge it wears. The loser's RRF contribution still counts toward the merged score; it just doesn't drive the display.

One frame, three channels, one result. The hits share a fusion key, so their RRF contributions sum (corroboration lifts the moment up the list). The best-ranked hit — visual — becomes the card you see, while the merged result still carries OCR's on-screen text and the logo channel's brand, and wears a badge showing all three agreed.

Don't throw away the other channels' work

Picking one representative would be lossy if it stopped there. The visual channel knows the frame is visually relevant, but it's the OCR channel that read the caption and the logo channel that named the brand. Drop those and the result card gets thinner the more channels agreed — backwards.

So merging is additive. As hits fold together, the representative inherits any supplementary field it's missing from the others — on-screen text, detected brand, action and scene labels, object lists, thumbnails, the person's name, and so on. The rule is simple: a value already present on the representative always wins; the merge only ever fills gaps, never overwrites. The result also records which channels matched it, so the UI can show a “matched in N channels” badge — but only when more than one actually agreed, so a plain single-channel hit stays plain.

Why rank beats score, in one sentence. Channel scores measure different things in different units, so they can't be compared directly — but “came first in its list” means the same thing everywhere. Reciprocal rank fusion is what lets a dozen incomparable rankings vote on one merged order.

All of it, on your Mac

None of this is a re-ranking service or a cloud call. The channels run locally over the index MediaFind already built; the fusion is a few dozen lines of arithmetic over their ranked lists. Your query, the per-channel scores, and the merged results never leave the device — fusion is just bookkeeping on data you already own.

So when a single result quietly tells you it “matched in 3 channels,” that's the fusion step showing its work: several independent views of your library agreed on the same moment, and you got it once.

Search your library every way at once

A dozen channels, one ranked list, all on-device. Free trial.

Download for macOS

Keep reading

One search bar, fourteen channels: how MediaFind finds anything · Search Search by meaning: embeddings, CLIP and a local vector index · Search Is the search any good? Measuring quality — and guarding it · Search

One moment, five channels, one result: deduping search with rank fusion

1. When are two hits “the same moment”?

2. How do you rank the merged list?

Don't throw away the other channels' work

All of it, on your Mac

Search your library every way at once

Keep reading