🧙 X Links, YouTube Polish, and Old Spell Repairs

in Mediabot · started by TeuK · 1mo ago

TeuK · 1mo ago

🧙 X Links, YouTube Polish, and Old Spell Repairs

Article

This update is another focused reliability pass. It improves URL title handling, especially for X/Twitter links, and fixes a small but very real legacy YouTube bug.

The goal was to keep the bot practical on real IRC channels, where people paste messy links, old links, redirected links, and sometimes URLs that modern websites do not expose cleanly to simple HTTP clients.

X/Twitter URL titles

X/Twitter links are no longer ignored silently.

The bot now has a dedicated X/Twitter handler for URLs such as:

https://x.com
https://www.x.com/user
https://twitter.com/user/status/123456
https://twitter.com/user/statuses/123456
https://x.com/i/web/status/123456

The new flow is:

X/Twitter URL
→ normalize twitter.com / www.x.com / x.com to a canonical x.com form
→ fetch rendered DOM with Chromium
→ extract og:title / twitter:title / title
→ fallback to an honest URL-based label if needed

This means a blocked or login-shell page can still produce a useful label instead of nothing.

Examples of fallback labels:

[X] X
[X] X profile: @user
[X] X post by @user
[X] X post
[X] X list by @user: listname
[X] X community

The fallback is deliberately conservative. It does not pretend to know the content of a post when X does not expose it. It simply describes the URL shape.

Better X/Twitter normalization

The URL normalization now handles root and path forms consistently:

https://x.com
https://x.com/
https://www.x.com
https://twitter.com
https://www.twitter.com
http://twitter.com/user/status/123

These now normalize cleanly into the same x.com path format before title handling.

This prevents small differences such as a missing trailing slash or a legacy twitter.com domain from causing different behavior.

X/Twitter status variants

The fallback logic now recognizes more real-world status URL variants:

/user/status/123
/user/statuses/123
/i/web/status/123

So even older Twitter-style links or app-generated internal URLs still get labelled as posts.

More robust metadata extraction

The title extraction for X/Twitter was improved.

Previously, the code expected metadata tags in a very strict order, such as:

<meta property="og:title" content="...">

But rendered DOM often includes extra attributes or puts attributes in a different order:

<meta data-rh="true" property="og:title" data-extra="x" content="Post title">
<meta content="Post title" data-extra="x" name="twitter:title">

The extractor now scans each <meta ...> tag flexibly, finds og:title or twitter:title, and then extracts content regardless of attribute order.

The same improvement was applied to Facebook title extraction for og:title, so both modern handlers are less brittle.

YouTube search color consistency

The yt search command now uses the same visual style as automatic YouTube URL title rendering.

That means the output from:

m yt never gonna give you up

now visually matches the output when someone pastes a direct YouTube link.

The command also requests the channel title from the YouTube API so it can display the same kind of metadata:

[YouTube] title - duration - views N - by channel - URL

This keeps search results and direct link previews consistent on IRC.

YouTube query encoding fix

The YouTube search command was also corrected to use the encoding helper already imported by the module:

uri_escape_utf8($query_txt)

instead of calling an undefined helper.

That avoids a runtime failure when using the yt command.

Legacy YouTube title fix

A small but important bug was fixed in the older getYoutubeDetails() helper.

The function was reading the YouTube title into:

$sTitleItem

but later validating and displaying:

$sTitle

which was never assigned.

The fix is intentionally minimal:

$sTitle = $sTitleItem // "";

Nothing else was changed in that sensitive legacy path. The duration logic, colors, output structure, and existing behavior were preserved.

This matters because the legacy helper is still used by helper/radio paths, so a valid YouTube API response could previously appear incomplete.

Regression tests added

This update added regression tests for:

X/Twitter Chromium handler dispatch;
X/Twitter URL root normalization;
X/Twitter status/statuses/i-web-status fallback labels;
flexible X/Twitter meta title extraction;
flexible Facebook meta title extraction;
YouTube search using the correct UTF-8 escaping helper;
YouTube search using the same color style as direct YouTube link previews;
legacy getYoutubeDetails() assigning the title it already extracted.

Why this matters

This is not a flashy feature update, but it improves what people actually notice on IRC.

Links behave better. X/Twitter no longer disappears silently. Facebook and X metadata extraction are less fragile. YouTube search output looks consistent. A legacy YouTube helper no longer loses a title it already had.

That is the kind of polish that makes the bot feel maintained, reliable, and pleasant to use.

Suggested commit message

🧙 Accio Titulus: summon X previews and repair YouTube’s lost title charm

Full validation command

cd /home/mediabot/mediabot_v3 || exit 1

find Mediabot -name '*.pm' -print0 | xargs -0 -n1 runuser -u mediabot -- perl -I. -c

runuser -u mediabot -- perl -c mediabot.pl

runuser -u mediabot -- perl t/test_commands.pl

Suggested smoke tests

From IRC:

https://x.com
https://twitter.com/teuk/status/123456
https://twitter.com/teuk/statuses/123456
https://x.com/i/web/status/123456
https://facebook.com
m yt never gonna give you up

Expected behavior:

X/Twitter links are handled instead of ignored.
X/Twitter falls back to useful labels when no real title is exposed.
Facebook metadata extraction remains robust.
YouTube search uses the same visual style as YouTube link previews.
Legacy YouTube details no longer lose the extracted title.

You must be logged in to reply.