Mediabot just got a little smarter about URLs

in Mediabot · started by TeuK · 2mo ago

TeuK · 2mo ago

A new cleanup pass has just landed in mediabot_v3, this time focused on URL title handling.

The goal was not to add flashy features. The goal was to make link detection and title extraction more reliable, more consistent, and far less annoying to debug.

What changed

This round of work improved the way Mediabot handles several kinds of URLs:

YouTube links remain handled through their dedicated path
Instagram links now have a stronger fallback path when simple HTML parsing is not enough
Spotify links are better filtered when the page only returns a useless generic title
Apple Music links now have a more resilient fallback strategy
generic URLs still work through the usual title extraction path

In short, the bot now does a better job deciding which handler should process which URL, while still respecting the channel settings already in place.

Why this mattered

Some platforms no longer expose useful metadata in a simple, static way.

That means a plain HTTP fetch is sometimes enough, but sometimes it only returns things like:

a generic shell page
a login wall
a placeholder title
or metadata that is technically valid but completely useless

So the real work here was not just “parse HTML better”. It was teaching the bot when to stop trusting a weak result and when to try a more capable fallback.

The big step forward

For the difficult cases, Mediabot can now lean on a Chromium headless fallback to render the page and extract a usable title when the lightweight path fails.

That made a real difference, especially for platforms that heavily depend on client-side rendering.

At the same time, the logic was tightened so the bot does not blindly accept junk titles such as generic web player pages or login prompts.

Other useful improvements

A few related fixes also came along for the ride:

better UTF-8 handling for fetched content
clearer debug output when URL parsing fails
more predictable behavior per handler
less guesswork when tracking down bad titles

That may sound minor, but in practice it saves a lot of time when debugging real-world links.

Result

The URL-title system is now in a much better place:

cleaner routing
safer fallbacks
better parsing
fewer misleading titles
easier troubleshooting

There is still room for future refinement, of course, but the base is now much healthier than before.

Sometimes the best kind of progress is not a new spell.

Sometimes it is just finally removing the curses from the old ones.

You must be logged in to reply.