Forum teuk.org

🗺️🧪 The Marauder’s Map Was Lying: How Mediabot v3 Recovered a True 404/404 Green Test Suite

in Mediabot · started by TeuK · 1w ago

TeuK · 1w ago

404 tests found. 404 tests passed. Nothing missing. Nothing hiding in the Forbidden Forest.

Mediabot v3 had reached an awkward point in its evolution.

The bot was running. The modern plugin bridge was working. The codebase had been split into cleaner modules. Security guards had been added around authentication, subprocesses, database access, timers, radio commands, URL handlers, and external APIs.

And yet, the full test suite looked like it had been hit by a badly aimed Reducto spell:

Selected : 403
Passed   : 346
Failed   : 57
Timeouts : 0

At first glance, that looked like 57 regressions.

It was not.

What we actually found was a much more interesting maintenance story: the architecture had moved forward, while a large part of the test suite was still reading an old map.

This article is the story of how we audited the suite, separated real defects from historical assumptions, repaired the testing infrastructure, and brought Mediabot v3 back to a fully trustworthy result:

Selected : 404
Passed   : 404
Failed   : 0
Timeouts : 0

🕯️ Chapter One — The False Green Light

The first surprise was not a failing test.

It was a successful exit code that could not be trusted.

The historical runner, t/test_commands.pl, loaded many test files directly into the same Perl process. That worked for older closure-based tests, but it became dangerous when newer standalone TAP tests appeared.

One of those tests called:

exit(0);

Because it was loaded inside the runner process, that innocent exit(0) stopped the entire suite.

The result was deeply misleading:

EXIT CODE: 0

The test suite had not completed. It had simply vanished through a trapdoor.

🪤 Lesson: a green exit code is meaningless if the test runner never reached the end.


⏳ Chapter Two — Building a Time-Limited Test Corridor

An initial attempt to use TAP::Harness inside the historical runner proved too invasive and could hang on this mixed test architecture.

So we changed approach completely.

Instead of forcing every generation of test into one process, we created an external isolated runner:

/home/mediabot/run_mediabot_tests_isolated.sh

Its job is simple and deliberately boring:

  • discover every t/cases/*.t file;
  • execute each test independently;
  • apply an exact filter;
  • enforce a strict timeout with GNU timeout;
  • continue after failures;
  • produce a full log and a compact summary in /tmp;
  • prevent one blocked test from freezing the complete campaign.

Example output:

[0140] 140_external_chromium_timeout_kill_escalation.t
       FAIL (rc=1)

[0141] 141_external_facebook_handler.t
       FAIL (rc=1)

[0142] 143_external_facebook_fallback_all_urls.t
       FAIL (rc=1)

And if a test hangs:

TIMEOUT after 90s

This gave us the first genuinely complete and trustworthy baseline:

Selected : 403
Passed   : 346
Failed   : 57
Timeouts : 0

No hidden early exit. No frozen terminal. No guessing.


🧭 Chapter Three — The Castle Had Moved

The biggest group of failures came from tests that were still inspecting:

Mediabot/External.pm

But the implementation had already been split into dedicated modules:

Mediabot/External/Claude.pm
Mediabot/External/YouTube.pm
Mediabot/External/URL.pm
Mediabot/External/Spotify.pm

The tests were looking for real functions — but in the wrong room.

That affected checks around:

  • ChatGPT and OpenAI configuration;
  • Claude request handling and history;
  • TMDB query validation and mojibake repair;
  • YouTube colors, search and metadata;
  • Fortnite response guards;
  • Chromium timeout cleanup;
  • Facebook and X/Twitter handlers;
  • Apple Music and generic URL badges;
  • Spotify metadata extraction.

In other words, the functionality had not disappeared.

The ownership had changed.

🗝️ We realigned 40 tests with the current module boundaries and added a dedicated contract:

t/cases/525_mb303_external_module_test_ownership.t

Its purpose is to make future architectural movement explicit. If code ownership changes again, the tests should fail for the right reason — not because they are silently reading an abandoned file.

After that pass, the suite improved dramatically:

Selected : 404
Passed   : 387
Failed   : 17
Timeouts : 0

Forty false alarms had disappeared.


🧹 Chapter Four — Scourgify the Historical Contracts

The final 17 failures were not concentrated in one module. They were old expectations scattered across years of development.

They included several different kinds of historical residue.

🪙 Karma history had grown

One test still expected the in-memory karma history to be capped at 20 entries.

The current implementation intentionally keeps:

500 entries

The test was updated to describe the current contract rather than forcing an obsolete capacity.

🧾 DB safety tests counted text instead of behavior

Some tests compared the raw number of:

prepare(...)
finish(...)

That looked clever, but it was fragile.

A single prepared statement may legitimately contain several finish calls on mutually exclusive paths:

  • execution failure;
  • successful fetch;
  • early return;
  • cleanup branch.

Counting tokens in source code does not prove DB safety.

The updated tests now check the meaningful contract:

  • preparation errors are handled;
  • execution errors are handled;
  • statements are finished on failure;
  • statements are finished on success;
  • uncontrolled dbh->do() calls are not introduced;
  • runtime cleanup eval blocks are not confused with database write protection.

⏱️ Timer tests depended on old comments

Several scheduler tests looked for exact historical markers or formatting.

The current atomic timer reload behavior was already safer than the old text suggested:

  • old timers remain active if the reload fails;
  • replacement timers are prepared before publication;
  • previous timers are stopped only at the commit point;
  • a timer becomes “started” only after successful startup.

The tests were rewritten around those guarantees.

🧙 ScriptDryRun had evolved

The multilingual bridge had accumulated several layers of protection:

  • optional command allow-lists;
  • route maps;
  • dry-run and apply modes;
  • explicit IRC output gate;
  • apply-scope guard;
  • path traversal protection;
  • symlink checks;
  • stdin JSON protocol;
  • action validation;
  • timeout handling;
  • failed-script rejection;
  • legacy response compatibility.

Some tests still required old marker names, old comments, or old internal variable shapes.

They now validate the actual safety behavior instead.

📻 Radio dispatch had been centralized

Older tests expected radio commands to point directly to individual handlers.

The current architecture correctly routes them through:

_dispatch_radio($ctx, $cmd)

The tests now verify both:

  1. the centralized dispatch entry point;
  2. the internal mapping from command name to real handler.

📜 Configuration documentation had moved

Some radio configuration expectations still lived in README-oriented tests.

The canonical source is now:

mediabot.sample.conf

The tests were updated to inspect the real sample configuration instead of expecting duplicated documentation.


🧯 Chapter Five — Making the Runner Fail Properly

During the campaign, we also fixed a weakness in the historical test runner itself.

Previously, a test closure throwing an exception could kill the entire process with a code such as:

255

The runner now:

  • catches the exception;
  • reports it as a failed assertion;
  • preserves a useful diagnostic;
  • returns a meaningful failure status;
  • avoids silently losing the rest of the result.

A dedicated regression test was added:

t/cases/523_mb301_test_runner_crash_guard.t

A deliberately crashing probe confirmed the new behavior:

ERREUR d'execution : intentional mb301 crash probe
FAILED : 1/1
exit code : 1

That is exactly what a test runner should do: report the broken test, not disappear with it.


🪶 Chapter Six — The Last Two Feathers on the Floor

Once all 404 tests passed, the log still showed two small hygiene problems.

The TAP plan was wrong

383_dispatch_integrity.t announced:

1..29

but produced:

34 assertions

The test passed under the historical runner, but the TAP plan was objectively incorrect.

It now announces:

1..34

Two helpers shared the same global name

Tests 426 and 428 both defined:

sub make_bot

That produced:

Subroutine make_bot redefined

The helpers are now lexical closures:

my $make_bot = sub {
    ...
};

No warning, no behavior change, no namespace pollution.


🧬 What Changed in the Repository

This campaign changed the test suite and testing contracts, not the application runtime.

The main work included:

  • fixing historical tests after module ownership changes;
  • adding test-runner crash protection;
  • creating a safe isolated test launcher outside the repository;
  • updating DB safety assertions;
  • realigning timer and scheduler tests;
  • updating ScriptDryRun contracts;
  • correcting radio dispatch expectations;
  • moving configuration assertions to the canonical sample file;
  • fixing one incorrect TAP plan;
  • removing helper redefinition warnings.

New regression contracts include:

t/cases/523_mb301_test_runner_crash_guard.t
t/cases/525_mb303_external_module_test_ownership.t

🛡️ What Did Not Change

This work deliberately did not modify:

  • Mediabot runtime behavior;
  • the production Undernet instance;
  • database contents or schema;
  • live IRC configuration;
  • radio playback behavior;
  • plugin execution behavior;
  • authentication rules;
  • commit.sh;
  • .gitignore;
  • the mp3/ cache;
  • production credentials.

No daemon restart was required.

That does not make the work cosmetic.

A test suite is part of the engineering system. If it lies, hangs, exits early, or checks abandoned architecture, it cannot protect the runtime effectively.

This cleanup restored that protection.


🧪 What the 404 Tests Now Cover

The final green suite validates a surprisingly broad part of Mediabot v3:

🔐 Identity and security

  • authentication and autologin;
  • password handling;
  • hostmask registration;
  • login throttling;
  • secret redaction;
  • private message logging protection.

🗃️ Database discipline

  • prepared statements;
  • success and failure cleanup paths;
  • schema drift parsing;
  • safe pagination;
  • query escaping;
  • password field exposure prevention.

🌐 External services

  • OpenAI and Claude;
  • TMDB;
  • YouTube;
  • Fortnite;
  • Spotify;
  • Facebook;
  • X/Twitter;
  • Instagram;
  • Apple Music;
  • Chromium fallback and timeout cleanup.

📡 IRC and commands

  • public command dispatch;
  • private command routing;
  • partyline commands;
  • help aliases;
  • channel scope;
  • anti-flood behavior;
  • reminders, polls, notes, quotes and karma.

📻 Radio

  • Icecast JSON handling;
  • timeout sanitizing;
  • centralized command dispatch;
  • public URL defaults;
  • safe HTTP guards.

🧩 Plugin bridge

  • plugin loading;
  • autoload gates;
  • command routes;
  • Perl, Python and Tcl execution;
  • stdin JSON payloads;
  • structured stdout responses;
  • action validation;
  • dry-run planning;
  • apply mode;
  • IRC output gates;
  • path traversal and symlink protection;
  • timeout and malformed JSON handling;
  • failed-script rejection;
  • legacy protocol compatibility.

🏰 Final Result

The final isolated run reports:

Selected : 404
Passed   : 404
Failed   : 0
Timeouts : 0

The preflight suite is green.

The test runner no longer hides a premature exit.

The old monolith no longer confuses ownership tests.

Historical contracts now describe current behavior.

And the final log is quiet.

The runtime did not need a spell. The map did.

Mediabot v3 now has a test suite that once again reflects the castle as it really exists.

You must be logged in to reply.