A new cleanup pass has just landed in mediabot_v3, this time focused on URL title handling.
The goal was not to add flashy features. The goal was to make link detection and title extraction more reliable, more consistent, and far less annoying to debug.
This round of work improved the way Mediabot handles several kinds of URLs:
In short, the bot now does a better job deciding which handler should process which URL, while still respecting the channel settings already in place.
Some platforms no longer expose useful metadata in a simple, static way.
That means a plain HTTP fetch is sometimes enough, but sometimes it only returns things like:
So the real work here was not just “parse HTML better”. It was teaching the bot when to stop trusting a weak result and when to try a more capable fallback.
For the difficult cases, Mediabot can now lean on a Chromium headless fallback to render the page and extract a usable title when the lightweight path fails.
That made a real difference, especially for platforms that heavily depend on client-side rendering.
At the same time, the logic was tightened so the bot does not blindly accept junk titles such as generic web player pages or login prompts.
A few related fixes also came along for the ride:
That may sound minor, but in practice it saves a lot of time when debugging real-world links.
The URL-title system is now in a much better place:
There is still room for future refinement, of course, but the base is now much healthier than before.
Sometimes the best kind of progress is not a new spell.
Sometimes it is just finally removing the curses from the old ones.
You must be logged in to reply.