Bringing the web into your library: the on-device video downloader
Not everything you want to search lives in a folder. A conference talk, a competitor's keynote, a tutorial you keep re-scrubbing — paste the link and MediaFind fetches it, then runs the same private pipeline on it. Here's how “find videos on this page” actually works.
Most of MediaFind operates on a folder you already have. But the most-requested addition was the obvious one: “I don't have the file — I have a URL.” So the importer takes a webpage, finds the video on it, downloads it, and hands it straight to the indexing pipeline. From there it's just another item in your library: transcribed, embedded, searchable, askable.
The hard part isn't the download — it's the discovery. The video on a page is rarely a tidy .mp4 in an <a> tag. It's a streaming manifest, a player config, a blob assembled by JavaScript, or a CDN URL signed five layers deep. So the importer runs a two-tier strategy.
Tier 1: yt-dlp, for the sites someone already solved
For the long tail of well-known video hosts, we lean on yt-dlp — a battle-tested library with extractors for roughly a thousand sites. An extractor encodes the messy, site-specific knowledge of where a given platform hides its real media URLs and how to assemble the best-quality stream. When a URL matches one, we get a clean download and rich metadata (title, duration, thumbnail) for free.
Tier 2: scrape the page when no extractor exists
For the blog post, the corporate site, the random page with an embedded player, there's no extractor. So we fetch the HTML ourselves and look for the video the boring, robust way:
<video>and<source>tags with direct file URLs<meta property="og:video">andtwitter:player:streamvideo tags- streaming manifests —
.m3u8(HLS) and.mpd(DASH) — referenced in the markup or player config
The fetch goes through a hardened HTTPS handler — we validate the hostname and follow redirects carefully, because “go download whatever this page points at” is exactly the kind of feature that needs guard rails. If we find one or more candidates, we present them; if the page genuinely has no video, we say so plainly rather than failing silently.
The download is the easy part — indexing is the point
Once a file is on disk, the importer's job is done and the normal pipeline takes over. The downloaded clip is treated exactly like a file you dragged in: decode → transcribe → embed text and frames → diarize → tag. Within a minute or two a keynote you'd never seen before is fully searchable, askable, and cross-linked with the rest of your library.
Paste a talk's URL → it downloads → it transcribes → you search “the part about retention” and land on the exact second — across a video that was on the open web ten minutes ago.
Where the network boundary sits
This is the one feature that, by definition, touches the internet — you asked it to fetch a URL. The honesty is in the scope. The downloader reaches exactly the host you pasted (and the CDNs that page references) to retrieve the media, and nothing else. Once the file is local, every byte of analysis — transcription, embeddings, faces, tags — runs on-device with no further network access, the same as any file you already own. There's no MediaFind server in the loop, no account, and no telemetry about what you fetched.
The downloader is a front door onto everything else we've written about: it just gets a file onto disk, and then the transcription, search, Ask, and recognition pipelines do what they always do — locally.
Turn a link into a searchable clip
Free trial. No account, no API keys, nothing uploaded.
Download for macOS