Forum teuk.org

🕰️🪄 Mediabot v3 and the Chamber of Blocking Calls

in Mediabot · started by TeuK · 1w ago

TeuK · 1w ago

The MB315–MB324 hardening campaign

Some releases arrive with fireworks over the Great Hall.

This one arrived with a process table, a stopwatch, a stack of IRC logs, and a very uncomfortable question:

“What happens when an external service takes too long, a child process refuses to leave, or a test says green while the real command is broken?”

The answer led Mediabot v3 through ten rounds of hardening, from MB315 to MB324.

This campaign did not change the database schema. It did not add tables, columns, or migrations. Instead, it strengthened existing features, removed dead dependencies, moved risky network work away from the IRC event loop, repaired race conditions, and—most importantly—turned one painful YouTube regression into a much stricter validation discipline.

The central rule became:

A feature is not validated because its source looks correct.
A feature is validated when it compiles, passes targeted tests,
restarts on the real server, answers on IRC, and leaves clean logs.

Welcome to the Chamber of Blocking Calls. Mind the pipes. 🕯️


🧹 MB315 — Sweeping Dead Imports and a Forgotten Pipe

The first round consolidated work that had initially been numbered incorrectly.

The project already used mb299 for an unrelated test, so the review was cleanly renumbered under the next available round: MB315.

Several pieces of old scaffolding were removed from Mediabot.pm:

use diagnostics;
use Switch;
use Moose;

None of them were needed:

  • diagnostics was expensive and inappropriate for normal production runtime;
  • Switch was loaded even though the module contained no switch or case;
  • Moose was imported even though the object was created manually with bless and used no Moose feature.

Removing them simplified startup and reduced unnecessary dependency weight.

ScriptRunner.pm also received a cleanup on its timeout path. Pipes still registered in the selector are now closed immediately instead of remaining open during child reaping.

This was not an inter-call descriptor leak—the lexical handles would eventually be destroyed—but it made all exit paths consistent and stopped relying on garbage-collection order.

Finally, the YouTube duration formatter was unified.

A live stream or premiere may report:

PT0S

The old inline formatter displayed 0s, while the shared helper correctly treated that value as “no meaningful duration”.

displayYoutubeDetails() now uses the shared helper and omits the duration block entirely when it is empty.

No more lonely separators around an imaginary zero-second live stream. 📺


🗺️ MB316 — whereis Leaves the Main Event Loop

MB314 had made whereis replies safe and predictable, but the actual lookup still happened synchronously inside the IRC WHOIS callback.

That path performed both:

gethostbyname(...)
HTTP::Tiny->get(...)

A slow resolver or delayed country.is request could therefore pause the entire bot.

MB316 introduced an asynchronous wrapper:

whereis_async($hostname, $callback)

The blocking DNS and HTTP work now runs in a child process.

The parent:

  • reads the result through IO::Async::Stream;
  • applies a bounded timeout;
  • escalates from TERM to KILL;
  • reaps only with waitpid(..., WNOHANG);
  • limits child output;
  • normalizes failure to N/A;
  • invokes the IRC callback exactly once.

The caller, requested nick, and reply target are captured before the asynchronous work begins, preventing a later WHOIS command from stealing the response context.

The owl may take the scenic route. The castle keeps moving. 🦉


🔮 MB317 — Runtime Version Checks Stop Freezing IRC

The live version command called the synchronous GitHub version checker directly.

That helper is perfectly acceptable during startup, before the normal IRC event loop is active.

It is not acceptable in a live command when DNS, TLS, or GitHub may take several seconds to respond.

MB317 added:

getVersion_async($callback)

The runtime command now performs external work in a child process and returns a validated structured result to the parent.

The existing startup behavior remains unchanged.

Users still see:

Mediabot version: <local>

and, when appropriate:

(update available: <remote>)

The difference is invisible but important: IRC no longer has to hold its breath while GitHub answers. 🌌


📚 MB318 — define Becomes Asynchronous and Deterministic

The Wiktionary command had two weaknesses.

First, its HTTP request ran directly inside the live command callback.

Second, when multiple languages were returned, the selected language depended on Perl hash iteration order:

my $first_lang = (values %$data)[0];

That meant identical input could produce different results after a restart.

MB318 moved the network work into a child process and introduced deterministic language selection:

  1. prefer the configured language;
  2. validate the returned structure;
  3. fall back in stable alphabetical order;
  4. display the language that actually supplied the definition.

Malformed API data is rejected cleanly, and private replies use the same safe context path as public replies.

The dictionary no longer rolls dice behind the librarian’s desk. 📖


🧠 MB319 — Trivia Learns Patience, Ownership, and Fair Scoring

The trivia command contacted Open Trivia DB synchronously.

While the API was answering, the entire IRC event loop could pause.

But the more interesting bugs lived in the game state.

Two users could launch a trivia request almost simultaneously because the channel was not marked busy until the HTTP response had arrived.

A late callback could overwrite a newer question.

Failed API requests consumed multi-round rounds even though no playable question had been delivered.

trivia start could also reset an active game before the active-question guard rejected the command.

MB319 introduced:

  • asynchronous API fetching;
  • one pending request per channel;
  • unique request ownership tokens;
  • stale callback rejection;
  • clean cancellation through triviastop;
  • round increments only after a valid question arrives;
  • correct display of the configured trivia timeout;
  • explicit diagnostics for unknown categories.

The scoreboard now waits until a real question exists before moving the enchanted hourglass. ⏳


📡 MB320 — The Ambitious YouTube Rewrite

The historical YouTube search command performed two sequential Google API requests:

  1. search for video IDs;
  2. fetch metadata for those IDs.

The rewrite aimed to move those calls away from the event loop while preserving three-result output, shared colors, duration formatting, private replies, payload limits, deduplication, and original result order.

The new design introduced hardened parsers and an asynchronous child worker.

On paper, it looked strong.

In the test harness, it looked green.

On the real teuk.org runtime, it failed.

The command returned:

YouTube: service unavailable (search).

This was the moment the portraits stopped talking and everyone looked at the test suite. 🖼️


🧵 MB321 — The Process Watcher Attempt

The first diagnosis focused on child-process ownership.

IO::Async already manages process collection, while the worker also used manual waitpid() logic.

MB321 switched the worker toward watch_process() and added better diagnostics.

The targeted tests passed.

The production command still failed.

This mattered because it exposed a dangerous truth:

A test can prove that a pattern exists without proving that the real server, real dependencies, real event loop, and real command path work together.

The spell was formally correct.

The dragon remained unimpressed. 🐉


🧯 MB322 — Restore the Known-Good Transport

At this point, reliability took priority over architectural elegance.

The user-facing YouTube command had worked for years through the synchronous in-process HTTP path.

MB322 restored that proven production transport while keeping the valuable hardening added during MB320:

  • strict JSON validation;
  • response-size limits;
  • ID validation and deduplication;
  • original ordering;
  • shared duration formatting;
  • three-result output;
  • Context-based public and private routing;
  • metadata fallback;
  • detailed logging.

The experimental async worker remained available for future redesign, but the live command no longer depended on it.

This was a deliberate trade-off:

A working synchronous command is better than a beautifully asynchronous command that does not work.

Sometimes the bravest spell is git restore with better guards. 🛟


💥 MB323 — The Generator Breaks the Spellbook

The MB322 patch generator made a second mistake.

It restored the synchronous command body but left behind the closing fragment of the old asynchronous callback:

    },
    );
}

The result was a syntax error near },, and because YouTube.pm is loaded through the main application stack, Mediabot could not start.

This was not a subtle runtime race.

This was a broken spellbook sitting open on the table. 📕⚡

MB323 repaired the complete youtubeSearch_ctx() region atomically.

More importantly, it changed the patching discipline:

  1. create a candidate file;
  2. compile the candidate with perl -c;
  3. replace the live module only if the candidate compiles;
  4. compile the installed module again.

The regression test now checks for the exact orphaned callback tail that caused the outage.

The repair was validated against the exact broken snapshot.

That incident permanently changed the release process.


🔵 MB324 — YouTube Links Become Blue and Underlined

Once the command was functional again, the visible output received its final polish.

The three YouTube URLs are now formatted with:

  • blue foreground;
  • underline;
  • transparent background;
  • a full reset after the URL.

The helper is intentionally simple:

return "\x0302\x1F" . $url . "\x0F";

The result is readable on both light and dark IRC themes and does not leak badge colors into the link.

The command now returns three clean, consistent result lines with links that look like links. 🔗


🧪 The Most Important Feature Added Was Not a Command

The biggest improvement in this campaign is the new validation protocol.

Static checks remain valuable, but they are no longer allowed to stand alone.

Every correction now follows this order:

patch
→ compile
→ targeted tests
→ restart mediabot@dev
→ run the exact IRC command
→ run m check immediately afterward
→ inspect the journal
→ take the next snapshot

Before commit:

full test suite
→ runtime smoke pack
→ private-message routing test
→ journal review
→ Git safety review
→ commit

The automatic validation gate now checks:

  • changed and untracked files;
  • SQL or schema artifacts;
  • obvious secret files;
  • Git whitespace;
  • entry-point syntax;
  • syntax of all changed Perl files;
  • automatically selected targeted tests;
  • the complete static suite in full mode.

But even a perfect static run does not replace the final IRC command.

The YouTube regression earned that rule the hard way. 🛡️


🎮 The Runtime Smoke Pack

The following features now have explicit real-world checks before commit:

m check
m status
m yt this is how you remind me
m version
m resolve teuk.org
m resolve 8.8.8.8
m whereis <connected nick>
m define test
m trivia categories
m trivia science easy
m triviastop
m poll ...
m pollresult
m pollstop

External or asynchronous commands are immediately followed by:

m check

If the control reply waits for the external request, the event loop is still blocked.

Public and private command routing are both tested whenever Mediabot::Context or reply-target logic changes.

The journal is searched for:

syntax error
Compilation failed
Undefined subroutine
worker_failed
timed out
service unavailable
fatal
exception

A known and deliberately tested failure may be acceptable.

An unexplained one blocks the commit.


🗄️ Database Impact

None.

0 new tables
0 changed columns
0 migrations
0 stored-data conversions
0 schema files added to the commit

The database remained safely outside the chamber. 🏰


🧭 What Changed for Users?

Users keep the same familiar commands.

What improves is everything around them:

  • live YouTube videos no longer display a fake 0s duration;
  • whereis, version, define, and trivia no longer perform blocking external work in the IRC callback;
  • trivia races and lost rounds are prevented;
  • YouTube search returns three ordered, validated results;
  • private replies use the correct destination;
  • YouTube links are blue and underlined;
  • failures produce clearer diagnostics;
  • production regressions face a real runtime gate before commit.

The commands did not become more complicated.

The machinery behind them became much harder to fool.


🏁 Final Result

MB315–MB324 was not a perfectly smooth flight.

It included a successful hardening campaign, an ambitious rewrite, a production failure, an incomplete repair, a generated syntax break, an emergency restoration, and finally a better engineering discipline.

That honesty matters.

Mediabot v3 is stronger not because nothing went wrong, but because the failures were investigated, documented, repaired, and converted into permanent safeguards.

The castle now has:

  • fewer blocking calls;
  • stricter child-process boundaries;
  • safer external-service handling;
  • deterministic results;
  • stronger parser validation;
  • clearer IRC output;
  • candidate compilation before installation;
  • mandatory real-command testing before commit.

No new broom was purchased.

Instead, the existing one received a compass, a brake system, a crash recorder, and a very suspicious-looking checklist. 🧹🧭🛡️

The event loop must keep moving—even when the outside world does not.

You must be logged in to reply.