Forum teuk.org

🪄 Cleaner Links, Safer Helpers, and Fewer Gremlins

in Mediabot · started by TeuK · 1w ago

TeuK · 1w ago

🪄 Cleaner Links, Safer Helpers, and Fewer Gremlins

Suggested forum title

🪄 Cleaner links, safer helpers, and fewer Perl gremlins

Article

This update is another maintenance and reliability pass focused on the parts of the bot that users actually feel every day: URL titles, TMDB searches, admin helpers, and command stability.

The main goal was simple: make the bot more predictable, less noisy, and more robust when the outside world behaves badly.

Facebook URL handling

Facebook needed special treatment.

The generic URL title handler was not good enough for Facebook URLs. In some cases, the bot received HTTP errors or useless login-shell pages instead of a clean title.

A dedicated Facebook handler was added.

The new flow is:

Facebook URL
→ normalize facebook.com to www.facebook.com
→ try HTTP fetch
→ try Chromium rendered DOM if needed
→ extract a useful title if available
→ otherwise use an honest URL-based fallback label

That means links such as:

https://facebook.com
https://www.facebook.com/somepage
https://www.facebook.com/somepage/posts/123456
https://www.facebook.com/groups/testgroup/posts/123456
https://www.facebook.com/reel/123456
https://www.facebook.com/watch/?v=123456

now get handled more cleanly.

If Facebook exposes a real og:title or usable page title, the bot uses it. If Facebook only returns a login shell or blocked/generic content, the bot falls back to a simple label such as:

Facebook
Facebook: somepage
Facebook post by somepage
Facebook group post: testgroup
Facebook reel
Facebook video

The fallback is deliberately honest: it does not pretend to know private or blocked content.

Chromium timeout hardening

The Chromium --dump-dom fallback was improved.

Previously, if Chromium timed out, the cleanup path could send TERM and then block on a plain waitpid(). That is risky because a hung Chromium process could make the timeout cleanup hang too.

The cleanup now does:

TERM
→ non-blocking wait checks
→ short usleep delay
→ KILL if the process is still alive

This makes the fallback safer and prevents the timeout handler from becoming the new source of blocking.

Admin command safety

The Owner-only exec command was hardened.

The command already used /usr/bin/timeout when available, but it had a dangerous fallback: if timeout was missing, it could run the shell command without a hard timeout guard.

That fallback was removed.

Now, if /usr/bin/timeout is missing, exec refuses to run and reports the problem clearly.

A duplicate timeout message was also removed, so the bot no longer repeats:

Command timed out after Ns.
Command timed out after Ns.

It now reports the timeout once.

Status command cleanup

The status command no longer spawns external uptime and uname commands.

Instead:

  • server uptime is read from /proc/uptime;
  • OS information is built with POSIX::uname();
  • hostname is obtained with Sys::Hostname.

This avoids unnecessary subprocesses for a simple status display.

The visible IRC behavior remains the same in spirit: the bot still reports bot uptime, server info, and server uptime.

Safer installer/server configuration helper

The server configuration helper had an old shell-based sed call to update:

CONN_SERVER_NETWORK

That was replaced with pure Perl file handling.

This avoids fragile shell interpolation and sed escaping problems if a network name or config path contains awkward characters.

The log message was also corrected to report the actual network name being written.

URL helper cleanup

Several old external process calls were removed from helper code.

getVersion() no longer shells out to curl to fetch the remote VERSION file from GitHub. It now uses HTTP::Tiny.

whereis() also no longer spawns curl to query api.country.is. It now uses HTTP::Tiny with a short timeout and keeps the existing fallback behavior when the lookup fails.

This reduces dependencies on external binaries and makes the helper code easier to test.

TMDB improvements

TMDB search received several practical fixes.

First, search queries now use UTF-8-safe URL encoding through uri_escape_utf8(). This matters for French titles and accented searches such as:

Léon
Amélie
Piège de cristal
L’été meurtrier

The TMDB language parameter is also validated and encoded before being placed in the URL.

Second, TMDB result iteration is now defensive. The bot no longer assumes every result entry is a valid hash with a defined media_type. This avoids warnings if the API returns partial or unexpected data.

Third, a real mojibake repair layer was added for TMDB queries.

This fixes cases where the IRC path delivers text like:

piège de cristal
Amélie
L’été meurtrier

before sending the query to TMDB.

The repair is conservative: it only runs when suspicious mojibake markers are present, and it keeps the original text if the repair does not improve the string.

So a user typing or pasting broken text can still get a useful TMDB result.

Facebook and TMDB regression coverage

New tests were added to protect the behavior around:

  • dedicated Facebook dispatch before the generic handler;
  • Facebook root fallback;
  • URL-based Facebook fallback labels;
  • Chromium timeout cleanup;
  • admin exec requiring /usr/bin/timeout;
  • single timeout reporting in exec;
  • status command avoiding external uptime and uname;
  • shell-free config update in conf_servers.pl;
  • getVersion() using HTTP::Tiny;
  • whereis() using HTTP::Tiny;
  • TMDB UTF-8 query encoding;
  • TMDB language safety;
  • defensive TMDB result iteration;
  • TMDB mojibake repair.

This is the best kind of cleanup: every fixed behavior now has a regression test standing guard.

Why this matters

This update does not add a shiny new command, but it makes existing behavior better.

The bot now handles messy real-world URLs more gracefully, especially Facebook. It avoids unnecessary external processes. It is less likely to hang on a Chromium timeout. Admin commands are safer. TMDB search behaves better with French titles and broken encodings.

That is the difference between a bot that works in a clean test case and a bot that behaves properly on a real IRC channel.

Suggested commit message

Mischief Managed: tame Facebook phantoms and banish brittle helper gremlins

Full validation command

cd /home/mediabot/mediabot_v3 || exit 1

find Mediabot -name '*.pm' -print0 | xargs -0 -n1 runuser -u mediabot -- perl -I. -c

runuser -u mediabot -- perl -c mediabot.pl

runuser -u mediabot -- perl t/test_commands.pl

Suggested smoke tests

From IRC:

m tmdb piège de cristal
m tmdb L'Été meurtrier
https://facebook.com
https://www.facebook.com/testpage
https://www.facebook.com/testpage/posts/123456
https://www.facebook.com/groups/testgroup/posts/123456
https://www.facebook.com/reel/123456

Expected behavior:

  • TMDB repairs common mojibake before searching.
  • Facebook links use a real title when possible.
  • Facebook links fall back to honest labels when the page is blocked or generic.
  • The bot does not spam duplicate timeout messages.

You must be logged in to reply.