Workflow

5 ways editors waste hours finding B-roll (and how to stop)

Finding the clip takes longer than cutting it. Here are the five habits that quietly eat your day — and a faster approach for each one.

Ask any editor where their time actually goes and they'll hesitate before answering. Not in the cut. Not in colour. In the hunt — the 20-minute scrub through a folder of 200 clips looking for the one shot of the product on the table, or the moment the interviewee said exactly what the voiceover needs.

Finding footage is the unglamorous tax on every project. The bigger your library, the heavier the tax. These are the five habits that make it worse than it needs to be — and what to do about each one.

Where project time actually goes — estimated per hour of finished video Finding clips 38 min scrubbing, hunting, guessing Assembly cut 22 min Color grade 18 min Sound mix 13 min Export / deliver 9 min ← this is the fixable one
Illustrative estimates for a typical short-form project. The exact numbers vary — but the shape is consistent: finding footage dominates, and it's the only column that doesn't require craft.

1. Scrubbing by eye with no transcript

The most common approach is also the most expensive: open the file, drag the playhead, watch. It's slow because it's linear. You have to watch time pass in roughly real time to find a moment — and if your memory of the clip is wrong, you watch the whole thing before moving on.

The fix is keyword search over transcripts. If your footage has speech, MediaFind transcribes it on your Mac at index time using Whisper. After that, searching for a phrase — "we decided to pivot", "the camera cuts off", "this was in March" — takes a second and returns a timestamp. You jump directly to the line. No scrubbing.

The transcript is also searchable semantically, not just literally. "The moment she explains why the deal fell through" finds the right segment even if those exact words were never said. The model understands meaning, not just string matching.

2. Relying on filenames

IMG_4847.MOV. clip_final_v3_REAL.mp4. 20240312_143022.mov. These names tell you nothing about what's inside the file. So you open them one by one and hope.

The underlying problem is that filenames are written at capture time, when you don't yet know which clip will matter or why. Metadata applied during editing — keywords, bins, markers — helps, but it requires discipline you rarely have mid-shoot.

MediaFind's auto-categorisation handles this retroactively. It analyses every file's content at index time and assigns scene tags and categories: outdoor, indoor, interview, product shot, screen recording, music, and dozens more. You didn't have to name anything correctly. The categories emerge from what's actually in the clip.

Browse by category and your library reorganises itself around content rather than capture date. The footage you shot three months ago is as findable as the footage you shot yesterday.

3. Hunting for a person by face or voice

"The clip where Sarah explains the pricing model." "All the shots with the CEO." "Every time the client's name comes up."

This kind of search is invisible to most tools. You know the person but not the file or the timestamp, so you're back to scrubbing — this time watching faces instead of listening for words.

MediaFind clusters faces across your library into a People panel. Once you've named a person once, every clip they appear in is tagged. Tap their thumbnail, get every moment — across every file in your library. The same logic applies to voices: the speaker diarisation engine labels who speaks when, so searching by speaker name surfaces every segment where that person talks.

Neither of these requires sending footage to a cloud API. Face recognition, clustering, and diarisation all run locally, using your Mac's hardware, with no account required.

4. Can't remember what was on screen

Some of the most useful clips are the ones where something important was written on screen rather than spoken. A slide title. A product label. A lower-third naming a guest. A URL. An error message in a screen recording. A street sign in B-roll from a location shoot.

None of that is in the audio transcript. CLIP visual search can surface it if the image is distinctive enough, but you'd need to describe the visual rather than the text. The right tool for this is OCR.

MediaFind runs OCR over sampled frames at index time — using Apple Vision, locally, without uploading anything. The extracted text goes into the same search index as everything else. Type "Acme Corp" and you'll find both the clip where someone said it and the clip where it appeared on a slide. The result includes a timestamp pointing to the exact frame.

5. Downloading reference clips and losing track of them

Editors keep reference footage. Stock clips you're cutting around. Inspiration pulls from YouTube or Vimeo. Screen recordings of a client's existing product. Tutorial videos you're responding to. These live in a different folder — sometimes several folders — and are always slightly out of reach when you need them.

The problem is that downloaded clips aren't indexed alongside your shot footage, so they exist outside the searchable library. You end up with two workflows: search MediaFind for your originals, then manually hunt through Downloads for the references.

MediaFind's built-in downloader (for YouTube, Vimeo, and other sources) pulls clips straight into your library. Once they're in, they're indexed and searchable alongside everything else — by transcript, visuals, faces, on-screen text, whatever. One library, one search box.

Putting it together

The common thread is that all five of these problems come from the same place: your tools treat footage as opaque files. Open it, watch it, remember it, rename it — that's your job. The search tools you have were built for documents and photos, not hours of video across dozens of shoots.

MediaFind indexes each file's content — spoken words, visual content, faces, voices, on-screen text, scene type, logos, songs — and makes all of it searchable from a single box. You describe the clip you remember. It finds the timestamp.

The results go straight to your NLE. MediaFind exports to Final Cut Pro XML, Premiere EDL, and DaVinci Resolve, so the clips you find land directly in a timeline without a manual import step.

Index your library once. After that, finding the clip takes seconds — not the 20 minutes you were budgeting for it.

The problem MediaFind fix 😩 Scrubbing by eye drag the playhead, watch, repeat Transcript keyword search type the phrase, jump to the timestamp 🗂 Opaque filenames IMG_4847.MOV tells you nothing Auto-categories interview / outdoor / product shot — at index time 👤 Hunting for a face or voice open every clip looking for Sarah People panel + speaker search name once, find everywhere — locally 🔤 Text was on screen, not spoken slide title, lower-third, price tag OCR search frames read at index time, search is instant 📂 Reference clips in a separate folder outside the searchable library Built-in downloader web clips land in the same index
Five independent problems, one library. Each fix compounds — after indexing, every search channel is available simultaneously.

Searching by what's in your footage is one thing. Getting results into your edit is another. See how MediaFind exports directly to Final Cut Pro, Premiere, and DaVinci Resolve.

Stop scrubbing. Start searching.

Point MediaFind at your footage folder and find anything in seconds. Free trial, no account required.

Download for macOS