🪄 Mediabot v3 Polish: Protego Regressions, Cleaner Logs, and Safer IRC Output

in Mediabot · started by TeuK · 3d ago

TeuK · 3d ago

After the big radio integration work, Mediabot v3 went through another important polish pass: mb76.

This was not a database migration and not a schema-changing release.
The goal was simpler and more practical:

clean up subtle regressions, reduce noisy logs, improve IRC output safety, and make the bot nicer to operate before committing the current development state.

In proper Hogwarts terms: this was a Protego pass — not a flashy new spell, but the kind of shield charm that keeps the cauldron from exploding later.

🧙 Context

The previous work already brought a large set of improvements around radio playback, Liquidsoap, MP3 cache-first behavior, non-blocking yt-dlp, and safer operational defaults.

The mb76 polish pass continued in the same spirit:

improve existing behavior;
avoid noisy logs;
harden edge cases;
avoid silent regressions;
keep the database schema untouched;
prepare a cleaner commit.

No DB schema changes were made during this polish pass.

🧹 First sweep: useful improvements from mb76

A broad set of usability and operational improvements landed before the final polish pass.

Highlights include:

anti-repetition cache for URL title display;
private/internal URL guard before fetching titles;
larger karma ring buffer;
configurable Claude persona TTL;
long message splitting instead of hard truncation;
improved karma, reminders, polls, trivia, and cooldown display;
partyline status/metrics improvements;
X/Twitter cache;
colored heatmap output;
better placeholders for dynamic replies.

The spirit of the pass was simple: make the bot more useful without making it more fragile.

🧯 Fixing the discreet bombs

A few subtle issues were found during review and corrected before commit.

X/Twitter cache replay

The first version of the X/Twitter cache avoided relaunching Chromium, but cached only a boolean result.

That meant a cache hit could return success while producing no user-visible IRC output.

The fix: cache the formatted message itself and replay it on cache hit.

$tw_cache->{$tw_cache_key} = { ts => time(), msg => $msg };

This preserves the point of the cache — avoid heavy Chromium calls — without making repeated links mysteriously silent.

Reminder IDs

Some legacy reminder code still referred to id or r.id.

The modern REMINDERS schema uses:

id_reminder

The old references were corrected so list/show/cancel paths use the current column name consistently.

That prevents reminder commands from breaking on the current schema.

UTF-8-safe long message splitting

A previous improvement made botPrivmsg() split long messages instead of truncating them.

Good idea — but the first implementation could split after UTF-8 encoding, which risks cutting a multi-byte character in half.

That is exactly the sort of bug that later turns into broken accents, emojis, or IRC bar characters.

The fix: split while the string is still a Perl character string, then encode each chunk.

Result:

no silent truncation;
no broken UTF-8 characters;
long bot replies survive intact.

🔇 Less noisy IRC debug logs

Several routine IRC numeric replies were too visible in logs.

Connection numerics such as:

001
002
003
004
005
MOTD lines
LIST replies
End of channel list
WHO replies
End of WHO list

are useful when debugging IRC connection state, but they are not worth seeing at normal DEBUG2 level.

They were moved down to DEBUG4 or DEBUG5 depending on usefulness.

This keeps DEBUG2 focused on things that actually help day-to-day debugging.

In short: the bot now stops shouting every time the IRC server says hello.

✉️ Safer outgoing NOTICE handling

botPrivmsg() had already been improved to split long messages.

botNotice() was still truncating long messages.

That was inconsistent, especially because a lot of help/status output goes through NOTICE.

The polish pass aligned botNotice() with botPrivmsg():

sanitize newlines;
split long messages;
avoid cutting UTF-8;
preserve full content;
keep metrics/logging behavior.

This is especially useful for command help, long diagnostics, and admin output.

🛡 Empty outgoing message guards

Another small guard was added to prevent noisy Perl warnings.

Some broken or edge-case handlers may accidentally call:

botPrivmsg(...)
botAction(...)

with an undefined or empty message.

Previously, the function could start doing formatting, badword checks, NoColors handling, or logs before discovering that the message was empty.

Now empty outgoing messages are rejected early.

That keeps the logs clean and prevents harmless upstream failures from becoming noisy warning spam.

🧪 Placeholder polishing

Dynamic command placeholders were also cleaned up.

The newer readable placeholders:

%nick%
%channel%
%date%
%time%

must be processed before legacy short placeholders like:

%n
%c

Otherwise %nick% can be partially eaten by %n, and %channel% can be partially eaten by %c.

The order was corrected.

Also, the old %b true/false placeholder had a condition issue and was fixed so it now behaves properly.

Example dynamic responder:

Bonjour %nick% ! It is %time% on %channel%.

That now expands cleanly.

🌐 Private/internal URL guard polish

The URL title display guard was improved slightly.

It already blocked obvious private targets like:

localhost
127.x.x.x
10.x.x.x
172.16-31.x.x
192.168.x.x
[::1]

The polish pass extended this to cover more obvious literal internal targets such as:

0.0.0.0
169.254.x.x
IPv6 local/private/link-local forms

This is still intentionally lightweight.
It does not pretend to be a complete SSRF firewall, because it does not resolve DNS.

But it is a useful guard against accidental local/internal fetches.

📚 Configuration sample cleanup

The sample configuration was updated to expose new runtime knobs added during mb76.

Examples:

PERSONA_TTL_HOURS=1

OUTPUT_GLOBAL_WINDOW=10
OUTPUT_GLOBAL_MAX_MSG=20
OUTPUT_GLOBAL_SILENCE=15

Defaults exist in code, but documenting them in the sample config matters.
A hidden setting is a spell nobody remembers how to cast.

🧙 No schema change

This is important enough to say clearly:

No database schema change was made during this polish pass.

The work focused on code behavior, output safety, logging, configuration discoverability, and regression cleanup.

That makes the commit safer and easier to review.

🧪 Suggested test checklist

Before committing, the dev instance should pass:

m check
m song
m listeners
m help commands radio
m commands

Reminder paths:

m remind list
m remind show
m remind cancel 999999

Dynamic placeholders:

%nick%
%channel%
%date%
%time%
%b

URL guard checks:

http://127.0.0.1/
http://169.254.169.254/
http://0.0.0.0/
http://[::1]/

Log noise check:

WHO reply
End of WHO list
001 / 002 / 003 / 004 / 005
MOTD
LIST

These should no longer pollute normal DEBUG2 output.

🏁 Final status

This was a proper cleanup pass.

Nothing spectacular on the surface, but a lot of small shields were added:

less noisy logs;
safer UTF-8 output;
better NOTICE behavior;
reminder ID consistency;
X/Twitter cache correctness;
fixed dynamic placeholders;
better empty-message guards;
improved sample config;
no DB schema changes.

In other words: fewer cursed edge cases, more reliable Mediabot.

Suggested commit spell:

🧹 Protego mb76: polish IRC output, caches, and debug noise

You must be logged in to reply.