🧿🕯️ Mediabot v3 and the Order of the Unbroken Event Loop

in Mediabot · started by TeuK · 1w ago

TeuK · 1w ago

The MB325–MB337 hardening campaign

There are two kinds of magic in a long-running IRC bot.

The first kind is visible: commands, colors, radio requests, DCC chats, trivia, calculations, and helpful replies in the right place.

The second kind lives behind the walls: byte limits, cache consistency, reference ownership, process cleanup, parser boundaries, routing context, and a test suite that must tell the truth.

The MB325–MB337 campaign was mostly about that second kind.

No database table was added. No column was changed. No migration was required. Instead, Mediabot v3 went through a broad defensive pass covering IRC framing, timer lifetime, privilege caches, calculator safety, DCC networking, radio cleanup, and the test harness itself.

The result is less dramatic than a dragon escaping Gringotts—but much better for uptime. 🐲

🧵 MB325 — Messages Learn to Count Bytes, Not Characters

IRC servers care about bytes.

UTF-8 users care about characters.

Those two ideas are not interchangeable.

A long message containing accents or emojis could look short enough when counted as Perl characters while exceeding the practical IRC line budget once encoded. A server-side truncation could then cut directly through a multibyte UTF-8 sequence.

MB325 introduced a shared byte-aware splitting path for normal outbound messages and notices.

The splitter:

measures encoded byte length;
avoids cutting a UTF-8 character in half;
prefers clean word boundaries;
keeps each emitted IRC line inside the configured budget;
preserves short messages as a single line.

This became the foundation for the rest of the outbound-message family.

The owl post now counts the weight of the parchment, not merely the number of words. 🦉📜

🎭 MB327 — Long `/me` Actions Keep Their CTCP Robes

The ACTION path was the forgotten sibling.

While normal PRIVMSG and NOTICE output had become byte-safe, botAction() still wrapped the entire message in one CTCP frame:

\1ACTION ...\1

A long accented or emoji-heavy action could be truncated by the IRC server. Worse, truncation could remove the final CTCP delimiter and turn a /me action into malformed raw text.

MB327 reused the shared byte-aware splitter with a smaller budget that accounts for the ACTION wrapper itself.

Each chunk is independently wrapped:

\1ACTION chunk one\1
\1ACTION chunk two\1

The result is several valid actions instead of one broken spell.

With MB325 and MB327, PRIVMSG, NOTICE, and ACTION now follow the same byte-safe discipline. 🎬

🕸️ MB326 — The First Timer Ghost Hunt

IO::Async::Timer::Countdown objects appeared in several parts of Mediabot:

channel join scheduling;
WHO requests;
help pacing;
Claude response pacing;
quote games;
radio download polling.

The common pattern looked innocent:

my $timer;
$timer = IO::Async::Timer::Countdown->new(
    on_expire => sub {
        ...
        $timer ...
    },
);

But the timer owned the callback, and the callback captured the timer.

That formed a reference cycle:

timer → callback → lexical timer → timer

Removing the timer from the loop was not enough by itself. The cycle could keep the object alive forever.

MB326 broke the dominant terminal paths, removed fired timers from the loop where needed, and ensured stale quote-game callbacks were cleaned up without changing game behavior.

It was the first serious sweep through the castle’s invisible cobwebs. 🕸️🧹

🔐 MB328 — Channel Privileges Stop Caring About Letter Case

IRC channel names are case-insensitive.

The channel-level privilege cache was not.

A level cached under:

#Foo

could survive an invalidation performed under:

#foo

for up to the cache TTL.

That meant a freshly removed or downgraded user could briefly retain stale access when the next permission check used a different channel capitalization.

MB328 normalized the cache key with lc() everywhere it is populated or invalidated.

The SQL behavior did not change. Only the in-memory key became consistent with IRC semantics.

No more privilege gained by wearing a different capitalization cloak. 🥸

☠️ MB329 — The Calculator’s Emergency Containment Charm

The internal calculator still evaluated user input with Perl string eval.

A character whitelist and keyword blacklist stood in front of it, but blacklists are poor guards for source-code evaluation.

Expressions such as function calls, process operations, sleeps, forks, file operations, or the Perl repetition operator could reach the evaluator if they used allowed characters and an unlisted identifier.

The immediate risks included:

killing processes owned by the bot account;
freezing the IRC event loop;
spawning children;
deleting files;
allocating enormous strings.

MB329 added a default-deny identifier allowlist so only recognized mathematical functions and numeric syntax could reach the legacy evaluator.

It closed the immediate hole.

But the review made the correct long-term conclusion:

The safest string eval is the one that no longer exists.

That became MB330. 🧪

🧮 MB330 — `SafeCalc` Replaces the Forbidden `eval`

MB330 introduced:

Mediabot::SafeCalc

The calculator now uses a dedicated recursive-descent parser.

User input is tokenized and interpreted as a restricted arithmetic language. It is never converted back into Perl source.

Supported features include:

decimal, scientific, and hexadecimal literals;
constants pi, tau, and e;
parentheses;
arithmetic operators;
exponentiation with both ** and ^;
approved mathematical functions;
right-associative powers;
explicit domain checks.

The parser also enforces resource limits:

maximum input length;
maximum parser steps;
maximum exponent magnitude;
maximum result magnitude.

Unknown identifiers, malformed syntax, invalid domains, division by zero, non-finite results, and abusive exponents are rejected cleanly.

The calculator went from “guarded code execution” to “an actual calculator.”

A small difference in wording. A very large difference in safety. 🧠✨

🧾 MB331 — Calculator Polish: Clean Numbers and Correct Destinations

The new parser exposed three smaller issues.

First, direct use of Perl’s hex() on very large user-provided literals could emit overflow and portability warnings into the bot journal.

MB331 added controlled hexadecimal and decimal literal conversion so invalid or oversized numbers fail cleanly without log pollution.

Second, calculator replies still used a channel-oriented send path in places. In a private conversation, the channel may be undefined, causing a valid result to disappear.

Responses now use Mediabot::Context, so public commands reply publicly and private commands reply to the requesting nick.

Third, calclast [n] had been documented but the argument was ignored.

It now genuinely supports:

calclast
calclast 1
calclast 2
calclast 3

with explicit validation for invalid values.

The arithmetic was already safe. MB331 made it civilised. 🧾

🛰️ MB332 — DCC CHAT Can No Longer Knock on Internal Doors

Active DCC CHAT requests contain an IPv4 address encoded as a 32-bit integer and a destination port.

Mediabot authenticated the user and validated the port, then attempted the outbound TCP connection.

The destination address itself was not restricted.

A privileged IRC user could therefore ask the server to connect toward:

loopback;
RFC1918 private networks;
link-local addresses;
carrier-grade NAT space;
documentation networks;
multicast;
reserved ranges.

Even with Partyline authentication still required afterward, the outbound connection was useful as a server-side network probe.

MB332 introduced a central active-target validator.

It checks:

exact IPv4 integer range;
canonical conversion;
user-port range;
public routability;
reserved and special networks.

The validation runs twice:

when the IRC DCC request is received;
immediately before the Partyline connection is opened.

Passive DCC with its opaque token remains unchanged.

The Floo Network now has an approved destination list. 🧱🔥

🧪 MB333–MB335 — The Test Suite Is Forced to Tell the Truth

This was the most painful chapter.

The historical test directory contains several different file contracts:

files returning closures to the custom runner;
hybrid files that return a closure when loaded but can run standalone;
plain TAP programs using Test::More;
older standalone scripts with explicit exit;
closures containing embedded Perl programs inside quoted strings.

The runner loaded most tests in one Perl process using do $file.

That allowed a Test::More file calling done_testing() to finalize the global Test::Builder, contaminating later tests with errors such as:

done_testing() was already called
You tried to plan twice

The first attempt isolated files by searching their text for the word exit.

That was too naive.

It missed standalone Test::More files without explicit exit, and it falsely classified closure tests whose quoted helper programs contained exit 0.

The final MB335 runner classifies tests by their actual contract:

native closure;
hybrid closure;
standalone TAP program.

Standalone TAP runs in a separate Perl process and is parsed with TAP::Parser.

This isolates:

plans;
done_testing();
TODO and SKIP behavior;
exit statuses;
parser failures;
Test::Builder state.

The YouTube regression test was also updated to recognize the real MB323 production-restoration marker rather than only the earlier MB322 wording.

One final trap remained in the installer: it demanded the exact total 112/112, while the real server correctly produced 115/115.

The installer interpreted success as failure and rolled back the valid runner.

That guard was replaced with the only rule that matters:

PASSED : n/n

where both numbers are equal and the process exits successfully.

The lesson was expensive but permanent:

A test harness must validate behavior, not freeze incidental assertion totals.

The Marauder’s Map is useful only when it shows who is really in the castle. 🗺️

📻 MB336 — Radio Cancellation Releases Timers That Never Fire

MB326 cleaned timer cycles on callback terminal paths.

But an object removed before its callback runs never reaches the cleanup code inside that callback.

Radio cancellation exposed exactly that case.

Three timers could be removed early:

the yt-dlp polling timer;
the cancellation reaper timer;
the TERM-to-KILL escalation timer.

Their callbacks still captured strong lexical references to the timer objects, so early removal could leave cycles behind.

MB336 keeps the real owners strong—the job state and IO::Async loop—and makes only the callback’s lexical timer reference weak with weaken().

The operational sequence remains unchanged:

TERM → non-blocking wait → KILL if needed → reap → cleanup

The difference is that an early-cancelled timer can now actually die. 📻👻

📞 MB337 — DCC Offers Release Their Listener and Timeout

The same lifetime problem existed in the two temporary DCC listener flows:

offer_dcc_chat()
accept_dcc_chat_passive()

Each offer created a listener and a 60-second timeout.

Their callbacks captured one another indirectly:

listener → callbacks → timeout → callback → listener

Stopping and removing both objects from the loop did not break those captured strong references.

MB337 weakens the lexical listener and timeout references only after the loop and DCC offer registry have become their strong owners.

It also explicitly removes the expired timeout from the loop.

The correction covers:

successful connection;
listener failure;
normal timeout;
classic CTCP DCC CHAT;
passive DCC CHAT tokens.

DCC offers now vanish when they are finished instead of haunting the Partyline corridors. 📞🫥

🛡️ Validation Discipline After the YouTube and Runner Incidents

This campaign reinforced a stricter release process.

A correction is not considered safe merely because a source-scanning test passes.

The expected sequence is now:

apply patch
→ compile candidate
→ compile installed files
→ run focused tests
→ restart dev when runtime code changed
→ execute the real IRC command
→ inspect logs
→ take a fresh snapshot

Before commit:

run the complete test suite
→ confirm a final PASSED summary
→ confirm the runner’s real exit status
→ review Git status and untracked files
→ commit

Patch installers also avoid hard-coded assertion totals. They trust the runner’s exit status and a consistent PASSED : n/n summary.

And long fragile shell regex commands are no longer handed to the operator for manual copying. Verification belongs in a script when quoting becomes non-trivial.

That is not glamorous magic.

It is the magic that prevents another six-minute test run from using the wrong runner. 🧯

🗄️ Database Impact

None.

0 new tables
0 changed columns
0 migrations
0 SQL schema changes
0 stored-data conversions

The entire campaign stayed above the database layer.

🏁 Final Result

Across MB325–MB337, Mediabot v3 gained:

UTF-8 byte-safe PRIVMSG, NOTICE, and ACTION output;
cleaner timer ownership and fewer long-running memory leaks;
case-correct privilege-cache invalidation;
a real arithmetic parser with no user-controlled string eval;
correct calculator behavior in public and private contexts;
functional calclast [1-3];
DCC protection against internal and reserved network targets;
isolated TAP execution with reliable result parsing;
safe dynamic assertion-count validation;
complete cleanup of radio cancellation timers;
complete cleanup of DCC listener/timeout pairs.

The visible commands remain familiar.

The machinery beneath them is now less trusting, less leaky, and much harder to trick.

Mediabot did not receive a new wand.

It learned not to point the old one at its own foot. 🪄🦶

You must be logged in to reply.