Forum teuk.org

πŸ§™β€β™‚οΈ Mediabot v3 and the Prisoner of `waitpid(0)`

in Mediabot Β· started by TeuK Β· 1w ago

TeuK Β· 1w ago

✨ The Great Non-Blocking Escape β€” MB304 to MB314

Some updates bring a shiny new command.

Others descend into the dungeons, wake the sleeping child processes, question suspicious timers under Veritaserum, and politely ask Chromium why it is still holding the entire castle hostage. πŸ•ΈοΈ

This Mediabot v3 hardening pass did not reinvent the bot. It did something more valuable:

It made the existing features faster, safer, more honest, and far harder to freeze.

The rule of the day was simple:

No download, DNS lookup, child process, timeout,
poll, Partyline command, or AI reply
may hold the whole IRC bot hostage.

And one sacred promise was kept from beginning to end:

No database schema change.
No new table.
No new column.
No migration.
No stored-data conversion.

So grab your wand, keep an eye on the process table, and mind the zombies. πŸ‘»


πŸ•―οΈ MB304 β€” Authentication Errors Leave the Invisibility Cloak

Mediabot::Auth used:

$level ||= 3;

In Perl, numeric zero is false.

That meant an authentication failure intentionally logged at level 0 could be silently promoted to DEBUG3 and disappear from normal production logs.

A critical error had effectively discovered an invisibility cloak. πŸ«₯

The fix now applies the default only when the level is undefined:

$level = 3 unless defined $level;

Level-zero authentication failures remain visible exactly as intended.


πŸ”­ MB305 β€” Metrics Errors Learn to Shout Properly

The Metrics logger translated symbolic ERROR messages to numeric level 1.

But in Mediabot, level 0 is the level that remains visible even with minimal debugging enabled.

A Prometheus listener bind failure or radio-status provider error could therefore vanish into the Restricted Section without leaving a note.

Severity mapping is now explicit and consistent:

INFO    β†’ 0
ERROR   β†’ 0
WARN    β†’ 1
DEBUG   β†’ 2

Alternate logger objects also receive the correct named method instead of a vague approximation.


πŸ—³οΈ MB306 β€” The Glorious Reign of β€œWinner: 0” Is Over

When pollstop closed a poll, the winner could be announced as:

Winner: 0

Technically, that was the internal zero-based option index.

Humanly, it was about as useful as a Marauder’s Map with no names on it. πŸ—ΊοΈ

Weighted polls had another inconsistency:

  • pollresult respected weights;
  • pollstop ignored them.

The same poll could therefore produce two different winners.

The poll system now:

  • announces the real option label;
  • applies weights consistently;
  • reports ties explicitly;
  • keeps tied results deterministic;
  • updates the existing poll-duration metric.

Example:

Pizza x3 : 2 votes β†’ score 6
Sushi x1 : 3 votes β†’ score 3

The winner is now Pizza.

Not option 0.

Not Sushi just because it had more voters.

Justice has returned to the Great Hall. πŸ•


πŸ“» MB307 β€” A Radio Timeout Is No Longer Disguised as Success

The yt-dlp watcher decoded child status with:

my $exit = $? >> 8;

That works for a normal process exit.

But when a process is killed by TERM or KILL, the signal lives in the low bits of the raw wait status. Shifting by eight could produce exit code 0.

A real timeout could therefore end with this misleading message:

download finished, but no readable MP3 file was produced

The new logic distinguishes:

  • normal success;
  • normal failure;
  • signal termination;
  • timeout;
  • waitpid() failure.

Timeouts use conventional exit code 124, and the proper timeout message finally reaches the user.

No more Polyjuice Potion for failed downloads. πŸ§ͺ


πŸ•ΈοΈ MB308 β€” Chromium Can No Longer Petrify the Bot

The Chromium fallback already limited how long it read stdout and stderr.

But after both pipes closed, it still performed:

waitpid($pid, 0);

Pipe EOF does not prove that the child process has exited.

A broken Chromium process could close both pipes, remain alive, and freeze Mediabot forever.

The new child-reaping sequence is bounded:

  1. poll with WNOHANG;
  2. send TERM;
  3. allow a short grace period;
  4. escalate to KILL;
  5. reject signal-terminated or partial output.

One cursed web page can no longer turn the whole bot into stone. πŸͺ¨


πŸ§ͺ MB309 β€” Partyline .eval Loses Its Blocking Hourglass

The Owner-only .eval watchdog paused the main event loop with:

usleep(500_000);

For half a second, everything stopped:

  • IRC traffic;
  • timers;
  • radio;
  • Partyline;
  • asynchronous callbacks.

Worse, an unconditional waitpid($pid, 0) could freeze the bot indefinitely if evaluated code closed its output and kept running.

The watchdog is now fully asynchronous:

  • TERM and KILL escalation use timers;
  • child collection uses WNOHANG;
  • timeout output appears exactly once;
  • a final line without a trailing newline is preserved.

The dangerous spell remains Owner-only, but its containment wards are much stronger. πŸͺ„


πŸ›‘ MB310 β€” radiodlcancel Now Means β€œActually Cancelled”

The old cancellation path sent KILL, called:

waitpid($pid, WNOHANG);

once, then immediately announced success.

But WNOHANG may return zero while the child is still alive.

Mediabot could therefore:

  • announce cancellation too early;
  • clear runtime state before the child was reaped;
  • leave an old yt-dlp process lingering;
  • allow a new download to start beside the previous one.

Cancellation now has real states:

active
cancelling β€” cancel_phase=term
cancelling β€” cancel_phase=kill

Cleanup occurs only after confirmed child reaping.

Repeated cancel commands no longer spawn duplicate timer chains.

The process must leave the castle before the gates are declared closed. πŸšͺ


🌐 MB311 β€” resolve Finally Becomes Truly Asynchronous

Reverse DNS used:

gethostbyaddr(...)

directly inside the main Mediabot process.

A slow resolver could pause the entire bot.

Forward lookup already used a child process, but Mediabot still waited a fixed three seconds before reading the result.

Even a lookup completed in milliseconds had to perform its full dramatic entrance. 🎭

Forward and reverse lookup now share one asynchronous pipeline:

  • resolver work runs in a child;
  • the result pipe is consumed immediately;
  • three seconds is a real timeout, not a mandatory delay;
  • escalation from TERM to KILL is asynchronous;
  • child collection uses only WNOHANG;
  • output is bounded and addresses are deduplicated.

Fast DNS is now fast.

Slow DNS no longer drags the whole bot into the Forbidden Forest. 🌲


πŸ€– MB312 β€” OpenAI and Claude Stop Putting Mediabot to Sleep

Long AI answers are split into IRC-sized chunks.

Previously, each chunk was separated by a blocking usleep().

With four chunks and the default pacing delay, Mediabot could pause for roughly three seconds.

The new system uses an asynchronous queue per IRC target:

  1. first chunk immediately;
  2. later chunks through countdown timers;
  3. no useless delay after the final chunk;
  4. consecutive replies stay serialized;
  5. chunks from separate answers no longer interleave.

The existing settings remain unchanged:

openai.SLEEP_US
anthropic.SLEEP_US

The AI still speaks politely.

It simply stops putting the entire castle to bed between sentences. 😴


πŸ¦‰ MB313 β€” Partyline Reverse DNS Moves Away from the Front Door

Every telnet or DCC Partyline connection performed reverse DNS in the main process.

A slow lookup could delay the entire bot before the user had even entered.

The new behavior:

  • records the peer IP immediately;
  • resolves the hostname in a child process;
  • reads the result asynchronously;
  • enforces a bounded timeout;
  • prevents a stale DNS callback from modifying a later session that reused the same file descriptor;
  • falls back cleanly to the IP address.

The DNS owl may arrive later.

The door remains open. πŸ¦‰


πŸ—ΊοΈ MB314 β€” whereis Finally Knows Where to Send the Reply

The whereis command could return undef when:

  • the API returned malformed JSON;
  • the country field was missing;
  • DNS failed;
  • the input was invalid.

The visible result could become:

Country :

Private requests had another flaw: the reply was always sent to a channel, even when no channel existed.

The corrected command now guarantees:

  • a defined printable result;
  • N/A for controlled failure cases;
  • proper IPv4 validation;
  • a stricter Undernet hidden-host pattern;
  • replies to the channel in public;
  • replies to the requesting nick in private.

No more empty owl.

No more answer thrown into a wall. πŸ“œ


βš™οΈ The Real Theme: Stop Blocking the Main Event Loop

This entire series follows the same engineering principle.

Before:

sleep
usleep
waitpid(..., 0)
fixed waiting
state cleared too early
signal interpreted as success
duplicate timers
silent failure

After:

asynchronous timers
bounded deadlines
waitpid(..., WNOHANG)
explicit runtime state
idempotent cleanup
single final reply
accurate failure reporting

This pass is not spectacular because a new button appeared.

It is spectacular because the old buttons are now much harder to break. πŸ›‘οΈ


πŸ§ͺ 218 Targeted Assertions Later…

Each correction received focused regression coverage.

Across MB304 to MB314, 218 targeted assertions exercised:

  • log severity preservation;
  • weighted polls and ties;
  • Unix signal decoding;
  • timeout propagation;
  • Chromium child cleanup;
  • radio cancellation;
  • Partyline safeguards;
  • forward and reverse DNS;
  • OpenAI and Claude pacing;
  • private reply routing;
  • malformed remote data.

The final full project suite remains the last gate before commit and push.

The pre-commit guard already proved useful in real conditions: it detected a local SQL artifact and blocked the commit before a schema-related file could be added accidentally. 🧯

Even the anti-mistake wards got their own practical exam.


πŸ—„οΈ Database Impact

Still none.

0 new tables
0 modified columns
0 migrations
0 data conversions
0 SQL files included in the commit

The database watched the entire adventure from a comfortable armchair and was never disturbed. β˜•


🏁 Final Result

Mediabot v3 is now:

  • more responsive;
  • more honest about failures;
  • safer with child processes;
  • more robust around external services;
  • cleaner with timeouts;
  • more consistent in IRC replies;
  • much harder to freeze.

It did not receive a new broom.

It received something better:

stronger wards, better brakes, and far fewer ghosts in the process table. πŸ‘»βœ¨

You must be logged in to reply.