Forum teuk.org

๐Ÿช„ Lumos Maxima! โ€” Netsplits Tamed, Guards Hardened (May 2026)

in Mediabot ยท started by TeuK ยท 1mo ago

TeuK ยท 1mo ago

Overview

The IRC wilderness is hostile. Netsplits crash in like Dementors, draining the botโ€™s strength. Antiflood guards misfire on edge cases. Cursors leak into the void. This release draws its wand and answers: five netsplit fixes, three bug corrections, three hardening improvements, and every guard double-checked.

โ€œLumos Maxima! The darkness of netsplits retreats. The bot stands firm.โ€ ๐Ÿฐโšก๐ŸŒŸ


๐ŸŒฉ๏ธ NS1 + NS5 โ€” Netsplit QUIT Storm Suppression

During a netsplit, the server sends dozens of QUIT messages with the pattern "server1.net server2.net". Each one was previously triggering:

  • logBotAction โ†’ DB write
  • updateUserSeen โ†’ DB write
  • Q2 Claude history purge โ†’ memory wipe
  • auth->logout โ†’ session clear

On a 100-nick channel that meant 200+ synchronous DB queries in seconds, plus valid Claude conversation histories being silently destroyed for users who hadnโ€™t actually left.

# NS1: detect netsplit QUIT
my $is_netsplit = ($text =~ /^\S+\.\S+\s+\S+\.\S+$/);
if ($is_netsplit) {
    # Skip DB ops entirely โ€” just remove from in-memory nicklist
    $mediabot->channelNicksRemove($sChannel, $sNick);
    return;
}

NS5: Claude history is now only purged on genuine QUITs โ€” users who survive the split keep their conversation context intact.

A mediabot_netsplit_quits_total Prometheus counter tracks how many netsplit QUITs are absorbed per session.


๐Ÿ”„ NS2 โ€” Antiflood State Cleared on Reconnect

reconnect() now explicitly wipes all in-memory flood state before rebuilding the IRC object:

$mediabot->{_af}          = {};  # AF1: output flood state
$mediabot->{_af_params}   = {};  # AF1: params cache
$mediabot->{_chan_flood}   = {};  # AF4: input flood state
$mediabot->{_nick_flood}   = {};  # AF3: per-nick sliding window
$mediabot->{_nick_mute}    = {};  # CC3: auto-mutes
$mediabot->{_cmd_cooldown} = {};  # CC1: command cooldowns

Without this, a channel silenced by AF4 during the split would remain silenced indefinitely after reconnect โ€” the bot would ignore all commands on that channel until a manual .floodstatus inspection and restart.


โฑ๏ธ NS3 โ€” Throttled JOINs After Reconnect

The old joinChannels() sent all JOINs in a tight synchronous loop. On large channel lists after a split, this could trigger server-side flood protection and get the bot disconnected immediately after reconnecting.

The new implementation staggers each JOIN 1.5 seconds apart using IO::Async::Timer::Countdown:

JOIN #boulets       t=0s
JOIN #testchan      t=1.5s
JOIN #quebec        t=3.0s
JOIN #epiknet       t=4.5s

The Partyline and DB listener remain alive throughout โ€” only the IRC joins are deferred.


๐Ÿ‘ฅ NS4 โ€” WHO After Every JOIN to Resync Nicklist

During a netsplit, nicks on the remote side of the split quit without sending QUIT messages. The local nicklist accumulates ghosts until the next periodic refresh (default: 300s).

After each throttled JOIN, a WHO #channel is now automatically sent 3 seconds later:

# NS4: schedule WHO after join to sync nicklist
my $who_t = IO::Async::Timer::Countdown->new(
    delay => 3,
    on_expire => sub {
        eval { $irc->send_message('WHO', undef, $chan_name) }
            if $irc && $irc->is_connected;
    },
);

This ensures the nicklist is accurate within seconds of rejoining, not minutes.


๐Ÿ–ฅ๏ธ Partyline .netsplit โ€” Live Diagnostics

.netsplit
--- Netsplit state ---
  Netsplit QUITs since last reconnect: 47
  AF1 channels in state: 3
  AF4 channels in state: 0
--- Channel nicklist status ---
  #boulets      18 nicks
  #quebec       34 nicks
  #testchan     5 nicks

A single command to assess the damage after a split.


๐Ÿ› B-68-1 โ€” mp3_ctx Cursor Leak

mp3_ctx (search sub-command) uses two statement handles: $sth_count for counting results and $sth for fetching the first match. On the happy path (match found, results displayed, function returns), $sth was never closed:

# B-68-1/fix: ensure sth_first cursor is closed
$sth->finish if $sth;
return;

๐ŸŽฎ B-68-2 โ€” !triviastop Left Scores Behind

!triviastop (Master) stopped the active game but did not clear the scores hash or reset hint_given. Immediately starting a new game with !trivia start N would:

  • inherit the previous gameโ€™s scores (wrong leaderboard from round 1)
  • never show the mid-game hint (since hint_given was still 1)
# B-68-2/fix
delete $trivia->{scores};
$trivia->{hint_given} = 0;

๐Ÿ”’ B-68-3 โ€” .floodset window=0 Accepted Without Clamp

.floodset #chan 0 4 30 stored window=0 in _chan_flood_conf. Because Perlโ€™s // operator tests for undef (not falsiness), the value 0 was not replaced by the default โ€” it propagated into checkChanFlood, making the sliding window infinitely narrow and the grep always return an empty array.

# B-68-3/fix: clamp CC2 overrides to sane minimums
my $window = do {
    my $v = $conf_ov->{window};
    defined($v) ? (int($v) >= 1 ? int($v) : 1)
                : _af_conf_int($self, 'CHANFLOOD_WINDOW', 10, 1, 3600)
};

Same clamp applied to max_cmds and silence. The Partyline .floodset command now also warns when values are below 1 and reports the effective clamped values.


๐Ÿ“ˆ A-68-3 โ€” checkCmdCooldown Prometheus Counter

mediabot_cmdcooldown_blocks_total is now incremented at the moment the cooldown is enforced โ€” previously it was only declared in Metrics.pm but never actually incremented.


๐Ÿ“ฆ Files Changed

File Changes
mediabot.pl NS1โ€“NS5: netsplit QUIT filter, reconnect state reset, throttled JOINs, WHO resync
Helpers.pm B-68-1: mp3_ctx cursor fix; B-68-3: checkChanFlood window clamp; A-68-3: cmdcooldown metric
UserCommands.pm B-68-2: triviastop scores/hint reset
Partyline.pm .netsplit command; A-68-1: .floodset clamp+warn; A-68-2: .cmdcooldown range comment
Metrics.pm mediabot_netsplit_quits_total, mediabot_netsplit_rejoins_total
TeuK ยท 1mo ago

Small Lumos Maxima erratum: the live suite was already green, but the static guards exposed a few integration leftovers. This patch aligns the internal help table, removes duplicate dispatch/export noise, verifies the real throttled JOIN + WHO netsplit behavior, and keeps the full static + live test suite green.

You must be logged in to reply.