🧭🪄 Mediabot v3: Seven Wards for Metrics, Liquidsoap, Partyline and the Living Configuration Spellbook

🧭🪄 Mediabot v3: Seven Wards for Metrics, Liquidsoap, Partyline and the Living Configuration Spellbook — MB362 to MB368

in Mediabot · started by TeuK · 11h ago

TeuK · 11h ago

The latest Mediabot v3 expedition did not introduce a new database table, redesign the castle or replace machinery that was already working.

Instead, it followed a trail of smaller defects through some of the bot’s less visible corridors: an authentication gauge that had never been connected, a Liquidsoap command channel that trusted line breaks, an HTTP endpoint willing to read an endless header, Partyline sockets that could expose internal exceptions or accept unbounded input, and two configuration reload commands invoking spells that did not exist.

The work covers MB362 through MB368 and follows the same rule as the preceding rounds:

Improve the existing bot, preserve compatibility, keep failures contained, and do not change the database schema.

🧭 MB362 — The Auth hourglass is finally connected to Metrics

Mediabot::Auth already knew how to update:

mediabot_auth_sessions_total

The implementation looked complete, but the gauge never moved.

Auth was created before the Metrics object during startup, and the finished Metrics instance was never attached afterwards. The update calls therefore existed, but they had no object to talk to.

MB362 added a safe late-binding path:

$auth->set_metrics($metrics);

The connection is established immediately after Metrics becomes available.

The current session count is synchronised at once, including an explicit initial value of zero. Login, autologin, logout and session cleanup can now update a gauge that is genuinely alive.

The startup order remains unchanged, and invalid Metrics objects are rejected safely.

The instrument was already hanging on the wall. MB362 finally attached the hands to the clockwork.

🐍 MB363 — Liquidsoap commands can no longer hide a second incantation after a line break

The Liquidsoap Telnet client built its payload directly from the supplied command:

my $payload = $command . "\nquit\n";

A carriage return, line feed or NUL character inside a queue identifier or MP3 path could therefore split the intended request and smuggle another command into the same Telnet session.

MB363 added one central guard before any network connection is opened:

CR, LF and NUL are not allowed

Normal commands remain untouched:

push
queue
skip
flush_and_skip

Unsafe input is rejected locally and never reaches Liquidsoap.

No Basilisk is allowed to crawl into the control socket by disguising itself as the second line of a filename.

🏰 MB364 — The Metrics gate no longer accepts an endless HTTP scroll

The embedded Metrics HTTP server accumulated request headers until it found the terminating blank line.

There was no maximum size.

A slow, broken or hostile client could therefore keep the connection open and make the request buffer grow without a bound inside the main Mediabot process.

MB364 introduced a strict limit:

16 KiB

Requests above that limit now receive:

431 Request Header Fields Too Large

The rejection happens before Prometheus rendering and before the radio status provider is called.

The same round also ensured that:

only one response is generated per connection;
the request buffer is released after processing;
UTF-8 Content-Length values remain measured in bytes;
/metrics and /api/radio/status keep their existing successful behaviour.

The castle gate still welcomes legitimate owls. It simply refuses a parchment long enough to wrap the Astronomy Tower.

🕯️ MB365 — Raw Partyline exceptions stay inside the server log

The Partyline dispatcher executed commands under eval, but the Telnet path returned the complete exception to the remote client:

Internal error: $@

That text could contain:

full server paths;
module names and line numbers;
SQL fragments;
configuration details;
messages produced by external dependencies.

The DCC path was already safer and returned only:

Internal error.

MB365 unified both transports through a shared safe dispatcher.

From this round onward:

Telnet and DCC clients receive only a neutral error;
the complete diagnostic remains available in the server log;
multiline exceptions are normalised onto one log line;
a secondary logger or socket failure cannot trigger another uncaught exception.

The guest sees a sealed door. The caretaker still receives the full incident report.

🧵 MB366 — Partyline scrolls are cut at 4 KiB, and each door closes only once

Telnet and DCC both waited for a newline without limiting the amount of pending input.

A connection did not need to authenticate first. It could simply send an endless unfinished line and make the IO::Async buffer continue to grow.

MB366 introduced a shared byte limit:

4 KiB per input line

The input parser now:

accepts normal LF and CRLF framing;
measures UTF-8 input in bytes;
accepts exactly 4,096 bytes;
rejects the first byte above the limit;
clears the oversized buffer;
returns only Input line too long.;
closes the connection without logging the supplied content.

One closing door, one metric update

Several paths may try to close the same Partyline session:

EOF
on_closed
.quit
.boot
authentication timeout
oversized input rejection

Previously, a second close callback could decrement:

mediabot_partyline_sessions_current

again.

MB366 made _close_session() idempotent. The first call performs the cleanup and updates the gauge. Later calls for the same descriptor become harmless no-ops.

Even the doors in Hogwarts should not slam twice and charge two departures for one wizard.

🗝️ MB367 — Every Partyline side chamber now seals its own exceptions

MB365 protected exceptions that reached the main dispatcher.

Seven command handlers, however, used their own local eval blocks and consumed $@ before the dispatcher could redact it:

.reloadconf
.status
.metrics
.reload
.ai summary
.ai
.kick

These paths still returned internal details directly to the Partyline client.

MB367 added one shared operation error reporter.

The server log keeps the complete exception, while the client receives a stable contextual message:

Configuration reload failed.
Status unavailable.
Metrics render error.
Reload failed.
AI request failed.
Kick failed.

The main dispatcher also uses the same reporting machinery while retaining its historical:

Internal error.

After MB367, no Partyline write path interpolates $@ directly into a remote response.

The main corridor had already been warded. This round locked the seven forgotten cupboard doors as well.

📖♻️ MB368 — The configuration spellbook can finally reload itself

The improved MB367 logging immediately proved useful.

A live .reload attempt returned the safe client message:

Reload failed.

The server log revealed the actual defect:

Can't locate object method "load" via package "Mediabot::Conf"

The command was invoking a method that did not exist.

A deeper audit found that both reload commands were broken differently:

.reload      called load(), which did not exist
.reloadconf  looked for reload(), which did not exist either

A real atomic reload API

MB368 added:

$conf->reload();

The operation is atomic:

the associated file is checked;
a fresh Config::Simple object is built;
the new values are loaded into temporary state;
the active configuration is replaced only after complete success.

If the file is missing, unreadable or invalid, the last valid configuration remains active.

The existing Mediabot::Conf object is updated in place rather than replaced. Components already holding a reference to it therefore see the refreshed values.

Integer-diagnostic deduplication is reset as well, allowing a newly introduced bad value to be reported once after a reload.

One working path for both commands

.reload and .reloadconf now use the same internal reload helper.

On success:

Configuration reloaded.

On failure, the MB367 protections remain active:

Reload failed.
Configuration reload failed.

The detailed parser error stays in the server journal.

`.reload` is not `.rehash`

The distinction remains intentional:

.reload and .reloadconf reread the configuration file into the existing Conf object;
.rehash also rebuilds logger state, debug settings and the channel cache;
structural changes may still require a service restart.

The spellbook now knows how to reread its pages. Rebuilding the whole library remains a different operation.

⚗️ Validation cauldrons

Each round received its own focused test and regression perimeter:

MB362 new test                    : 23/23
MB363 new test                    : 17/17
MB364 new test                    : 26/26
MB365 new test                    : 25/25
MB366 new test                    : 43/43
MB367 new test                    : 43/43
MB368 new test                    : 42/42

The latest targeted validations reached:

configuration + redaction        : 151/151
selected Partyline regression    : 319/319
MB356 through MB368 regression   : 337/337

The application installers were also tested repeatedly on clean copies and in already-applied states.

These suites overlap and should not be added together into an artificial total. Each result belongs to its own regression perimeter.

🧱 Database impact

None.

0 new tables
0 altered columns
0 migrations
0 schema changes

🪶 What changed across the seven rounds

Together, MB362–MB368 deliver:

a working authentication session gauge;
protected Liquidsoap Telnet commands;
bounded HTTP request headers on the Metrics endpoint;
redacted Partyline dispatcher exceptions;
bounded Telnet and DCC input lines;
idempotent Partyline session cleanup;
redaction for command-local Partyline failures;
a real atomic configuration reload API;
repaired .reload and .reloadconf commands.

None of these changes alter Mediabot’s database model or replace its established architecture.

They make the existing machinery more observable, less trusting of remote input and much better at failing without taking secrets or neighbouring features down with it.

🔮 Closing the Marauder’s Map

This pass began with a silent gauge and ended with a live production error exposing a reload spell that had never existed.

That progression matters.

Better metrics reveal what the bot is doing. Better logs reveal what went wrong. Better redaction ensures only the administrator sees the dangerous details. Better bounds stop remote input from growing forever. Atomic reloads ensure a bad configuration cannot erase the last good one.

Mediabot v3 has not become louder or more complicated.

It has simply learned where to place the wards, which doors must close once, and which spells should never have been trusted without checking the incantation first.

— Teuk

You must be logged in to reply.