🪄 Cleaner links, safer helpers, and fewer Perl gremlins
This update is another maintenance and reliability pass focused on the parts of the bot that users actually feel every day: URL titles, TMDB searches, admin helpers, and command stability.
The main goal was simple: make the bot more predictable, less noisy, and more robust when the outside world behaves badly.
Facebook needed special treatment.
The generic URL title handler was not good enough for Facebook URLs. In some cases, the bot received HTTP errors or useless login-shell pages instead of a clean title.
A dedicated Facebook handler was added.
The new flow is:
Facebook URL
→ normalize facebook.com to www.facebook.com
→ try HTTP fetch
→ try Chromium rendered DOM if needed
→ extract a useful title if available
→ otherwise use an honest URL-based fallback label
That means links such as:
https://facebook.com
https://www.facebook.com/somepage
https://www.facebook.com/somepage/posts/123456
https://www.facebook.com/groups/testgroup/posts/123456
https://www.facebook.com/reel/123456
https://www.facebook.com/watch/?v=123456
now get handled more cleanly.
If Facebook exposes a real og:title or usable page title, the bot uses it. If Facebook only returns a login shell or blocked/generic content, the bot falls back to a simple label such as:
Facebook
Facebook: somepage
Facebook post by somepage
Facebook group post: testgroup
Facebook reel
Facebook video
The fallback is deliberately honest: it does not pretend to know private or blocked content.
The Chromium --dump-dom fallback was improved.
Previously, if Chromium timed out, the cleanup path could send TERM and then block on a plain waitpid(). That is risky because a hung Chromium process could make the timeout cleanup hang too.
The cleanup now does:
TERM
→ non-blocking wait checks
→ short usleep delay
→ KILL if the process is still alive
This makes the fallback safer and prevents the timeout handler from becoming the new source of blocking.
The Owner-only exec command was hardened.
The command already used /usr/bin/timeout when available, but it had a dangerous fallback: if timeout was missing, it could run the shell command without a hard timeout guard.
That fallback was removed.
Now, if /usr/bin/timeout is missing, exec refuses to run and reports the problem clearly.
A duplicate timeout message was also removed, so the bot no longer repeats:
Command timed out after Ns.
Command timed out after Ns.
It now reports the timeout once.
The status command no longer spawns external uptime and uname commands.
Instead:
/proc/uptime;POSIX::uname();Sys::Hostname.This avoids unnecessary subprocesses for a simple status display.
The visible IRC behavior remains the same in spirit: the bot still reports bot uptime, server info, and server uptime.
The server configuration helper had an old shell-based sed call to update:
CONN_SERVER_NETWORK
That was replaced with pure Perl file handling.
This avoids fragile shell interpolation and sed escaping problems if a network name or config path contains awkward characters.
The log message was also corrected to report the actual network name being written.
Several old external process calls were removed from helper code.
getVersion() no longer shells out to curl to fetch the remote VERSION file from GitHub. It now uses HTTP::Tiny.
whereis() also no longer spawns curl to query api.country.is. It now uses HTTP::Tiny with a short timeout and keeps the existing fallback behavior when the lookup fails.
This reduces dependencies on external binaries and makes the helper code easier to test.
TMDB search received several practical fixes.
First, search queries now use UTF-8-safe URL encoding through uri_escape_utf8(). This matters for French titles and accented searches such as:
Léon
Amélie
Piège de cristal
L’été meurtrier
The TMDB language parameter is also validated and encoded before being placed in the URL.
Second, TMDB result iteration is now defensive. The bot no longer assumes every result entry is a valid hash with a defined media_type. This avoids warnings if the API returns partial or unexpected data.
Third, a real mojibake repair layer was added for TMDB queries.
This fixes cases where the IRC path delivers text like:
piège de cristal
Amélie
L’été meurtrier
before sending the query to TMDB.
The repair is conservative: it only runs when suspicious mojibake markers are present, and it keeps the original text if the repair does not improve the string.
So a user typing or pasting broken text can still get a useful TMDB result.
New tests were added to protect the behavior around:
exec requiring /usr/bin/timeout;exec;uptime and uname;conf_servers.pl;getVersion() using HTTP::Tiny;whereis() using HTTP::Tiny;This is the best kind of cleanup: every fixed behavior now has a regression test standing guard.
This update does not add a shiny new command, but it makes existing behavior better.
The bot now handles messy real-world URLs more gracefully, especially Facebook. It avoids unnecessary external processes. It is less likely to hang on a Chromium timeout. Admin commands are safer. TMDB search behaves better with French titles and broken encodings.
That is the difference between a bot that works in a clean test case and a bot that behaves properly on a real IRC channel.
Mischief Managed: tame Facebook phantoms and banish brittle helper gremlins
cd /home/mediabot/mediabot_v3 || exit 1
find Mediabot -name '*.pm' -print0 | xargs -0 -n1 runuser -u mediabot -- perl -I. -c
runuser -u mediabot -- perl -c mediabot.pl
runuser -u mediabot -- perl t/test_commands.pl
From IRC:
m tmdb piège de cristal
m tmdb L'Été meurtrier
https://facebook.com
https://www.facebook.com/testpage
https://www.facebook.com/testpage/posts/123456
https://www.facebook.com/groups/testgroup/posts/123456
https://www.facebook.com/reel/123456
Expected behavior:
You must be logged in to reply.