This final debugging pass did not uncover a collapsing staircase or a dragon beneath the database.
It found four much smaller things — precisely the kind that survive for years in mature software:
Claude completed MB373 through MB375. The final review added MB376 by following the complete AI output path one step beyond the new wrapper.
The standing rule remained untouched:
Improve the bot without changing its database schema, and make every number describe what actually reaches the wire.
The Hailo mention path used the current IRC nickname directly inside a regular expression:
$what =~ /$sCurrentNick/i
That is harmless for a simple alphanumeric nick.
IRC nicknames, however, may legally contain characters with special regex meanings:
[ ] \ ` ^ { } |
A nickname such as bot|x could match an unrelated x.
A nickname containing an unmatched [ could produce an invalid expression and break the message callback.
MB373 quotes the nickname literally in both places where it is used:
\Q$sCurrentNick\E
The removal step is now case-insensitive as well, matching the detection behaviour.
A wizard’s name may contain unusual symbols. It should not become an accidental incantation merely because Perl sees a vertical bar.
IRC limits are byte limits.
The old _chatgpt_wrap() counted characters.
For plain ASCII, those numbers usually agree. For French accents and emojis, they do not.
Three hundred é characters are six hundred bytes in UTF-8. A chunk described as “400 characters” could therefore be far too large for the intended IRC payload.
The lower-level botPrivmsg() already had a proven byte-safe splitter, so MB374 removed the duplicate character-based implementation and delegated to:
Mediabot::Helpers::_split_text_for_irc()
This improved more than transport safety.
MAX_PRIVMSG is supposed to limit how many lines the AI sends. When oversized chunks were split again downstream, the configured count no longer matched the number of messages actually placed on IRC.
After MB374, the initial AI chunks are prepared according to the same byte rules used by the final sender.
The owl is weighed with its parcel, not by counting the feathers on the label.
Prometheus exposition includes metadata lines such as:
# HELP metric_name description
Metric label values were already escaped correctly.
The HELP text was not.
A future description containing a backslash or an embedded newline could produce malformed exposition. Because Prometheus parses the complete document, one broken HELP line could make the whole scrape fail.
MB375 added:
_escape_help_text()
It escapes backslashes first and newlines second, following the exposition format.
The # TYPE line also receives the defensive fallback:
untyped
when a type is absent.
Current constant help strings remain visually unchanged. The protection is for the next contributor who innocently adds a Windows path, a regular expression or a multiline explanation.
The watchtower no longer goes dark because one annotation arrived with an adventurous backslash.
MB374 fixed the main wrapper, but one old piece of character arithmetic remained.
When an answer exceeded MAX_PRIVMSG, ChatGPT and Claude added a suffix to the final permitted chunk.
The historical code calculated the available space like this:
my $allow = $wrap_bytes - length($suffix);
It then used character-based substr() before appending the suffix.
The ChatGPT suffix itself contains UTF-8 characters:
[¯\_(ツ)_/¯ guess you can’t have everything…]
That meant the line could be byte-safe before the suffix, exceed the budget afterwards, and be split again by botPrivmsg().
The old mismatch returned through the back door:
MAX_PRIVMSG configured : 4
chunks prepared : 4
lines actually sent : potentially more than 4
MB376 introduced:
_irc_wire_bytes()
_irc_prefix_for_budget()
_fit_truncation_suffix()
The final helper calculates the real UTF-8 cost of the suffix first.
Only the remaining byte allowance is offered to the text prefix.
The final contract is simple:
bytes(prefix + suffix) <= WRAP_BYTES
Both OpenAI and Anthropic now use the same helper.
The visible suffixes remain unchanged. Only their accounting has become honest.
The parchment is no longer measured, cut, and then given an unmeasured decorative border.
Claude’s three rounds arrived with complete green suites:
MB373 new test : 13/13
MB373 full suite : 7935/7935
MB374 new test : 13/13
MB374 ChatGPT tests : 26/26
MB374 full suite : 7948/7948
MB375 new test : 11/11
MB375 full suite : 7959/7959
The final review added:
MB376 new test : 17/17
AI/Prometheus regression : 67/67
MB373–MB376 regression : 54/54
MB361–MB376 regression : 405/405
The MB376 installer was tested on two clean copies of the MB375 snapshot and in an already-applied state.
These test groups overlap and should not be added into an artificial grand total.
None.
0 new tables
0 altered columns
0 migrations
0 schema changes
Mediabot now:
MAX_PRIVMSG aligned with the number of intended IRC messages;The interesting part of this pass is not that the bugs were dramatic.
They were not.
They were boundary disagreements:
name versus regex
character versus byte
description versus exposition syntax
chunk budget versus final decorated line
Each component looked reasonable alone.
The defect appeared where one component handed its result to the next under a slightly different definition.
That is where the final hunt ended: not with a grand rewrite, but with four definitions made consistent.
The nickname stays a nickname.
The byte count reaches the wire unchanged.
The Prometheus scroll remains readable.
And the final suffix fits on the parchment it decorates.
— Teuk
You must be logged in to reply.