404 tests found. 404 tests passed. Nothing missing. Nothing hiding in the Forbidden Forest.
Mediabot v3 had reached an awkward point in its evolution.
The bot was running. The modern plugin bridge was working. The codebase had been split into cleaner modules. Security guards had been added around authentication, subprocesses, database access, timers, radio commands, URL handlers, and external APIs.
And yet, the full test suite looked like it had been hit by a badly aimed Reducto spell:
Selected : 403
Passed : 346
Failed : 57
Timeouts : 0
At first glance, that looked like 57 regressions.
It was not.
What we actually found was a much more interesting maintenance story: the architecture had moved forward, while a large part of the test suite was still reading an old map.
This article is the story of how we audited the suite, separated real defects from historical assumptions, repaired the testing infrastructure, and brought Mediabot v3 back to a fully trustworthy result:
Selected : 404
Passed : 404
Failed : 0
Timeouts : 0
The first surprise was not a failing test.
It was a successful exit code that could not be trusted.
The historical runner, t/test_commands.pl, loaded many test files directly into the same Perl process. That worked for older closure-based tests, but it became dangerous when newer standalone TAP tests appeared.
One of those tests called:
exit(0);
Because it was loaded inside the runner process, that innocent exit(0) stopped the entire suite.
The result was deeply misleading:
EXIT CODE: 0
The test suite had not completed. It had simply vanished through a trapdoor.
🪤 Lesson: a green exit code is meaningless if the test runner never reached the end.
An initial attempt to use TAP::Harness inside the historical runner proved too invasive and could hang on this mixed test architecture.
So we changed approach completely.
Instead of forcing every generation of test into one process, we created an external isolated runner:
/home/mediabot/run_mediabot_tests_isolated.sh
Its job is simple and deliberately boring:
t/cases/*.t file;timeout;/tmp;Example output:
[0140] 140_external_chromium_timeout_kill_escalation.t
FAIL (rc=1)
[0141] 141_external_facebook_handler.t
FAIL (rc=1)
[0142] 143_external_facebook_fallback_all_urls.t
FAIL (rc=1)
And if a test hangs:
TIMEOUT after 90s
This gave us the first genuinely complete and trustworthy baseline:
Selected : 403
Passed : 346
Failed : 57
Timeouts : 0
No hidden early exit. No frozen terminal. No guessing.
The biggest group of failures came from tests that were still inspecting:
Mediabot/External.pm
But the implementation had already been split into dedicated modules:
Mediabot/External/Claude.pm
Mediabot/External/YouTube.pm
Mediabot/External/URL.pm
Mediabot/External/Spotify.pm
The tests were looking for real functions — but in the wrong room.
That affected checks around:
In other words, the functionality had not disappeared.
The ownership had changed.
🗝️ We realigned 40 tests with the current module boundaries and added a dedicated contract:
t/cases/525_mb303_external_module_test_ownership.t
Its purpose is to make future architectural movement explicit. If code ownership changes again, the tests should fail for the right reason — not because they are silently reading an abandoned file.
After that pass, the suite improved dramatically:
Selected : 404
Passed : 387
Failed : 17
Timeouts : 0
Forty false alarms had disappeared.
The final 17 failures were not concentrated in one module. They were old expectations scattered across years of development.
They included several different kinds of historical residue.
One test still expected the in-memory karma history to be capped at 20 entries.
The current implementation intentionally keeps:
500 entries
The test was updated to describe the current contract rather than forcing an obsolete capacity.
Some tests compared the raw number of:
prepare(...)
finish(...)
That looked clever, but it was fragile.
A single prepared statement may legitimately contain several finish calls on mutually exclusive paths:
Counting tokens in source code does not prove DB safety.
The updated tests now check the meaningful contract:
dbh->do() calls are not introduced;eval blocks are not confused with database write protection.Several scheduler tests looked for exact historical markers or formatting.
The current atomic timer reload behavior was already safer than the old text suggested:
The tests were rewritten around those guarantees.
The multilingual bridge had accumulated several layers of protection:
Some tests still required old marker names, old comments, or old internal variable shapes.
They now validate the actual safety behavior instead.
Older tests expected radio commands to point directly to individual handlers.
The current architecture correctly routes them through:
_dispatch_radio($ctx, $cmd)
The tests now verify both:
Some radio configuration expectations still lived in README-oriented tests.
The canonical source is now:
mediabot.sample.conf
The tests were updated to inspect the real sample configuration instead of expecting duplicated documentation.
During the campaign, we also fixed a weakness in the historical test runner itself.
Previously, a test closure throwing an exception could kill the entire process with a code such as:
255
The runner now:
A dedicated regression test was added:
t/cases/523_mb301_test_runner_crash_guard.t
A deliberately crashing probe confirmed the new behavior:
ERREUR d'execution : intentional mb301 crash probe
FAILED : 1/1
exit code : 1
That is exactly what a test runner should do: report the broken test, not disappear with it.
Once all 404 tests passed, the log still showed two small hygiene problems.
383_dispatch_integrity.t announced:
1..29
but produced:
34 assertions
The test passed under the historical runner, but the TAP plan was objectively incorrect.
It now announces:
1..34
Tests 426 and 428 both defined:
sub make_bot
That produced:
Subroutine make_bot redefined
The helpers are now lexical closures:
my $make_bot = sub {
...
};
No warning, no behavior change, no namespace pollution.
This campaign changed the test suite and testing contracts, not the application runtime.
The main work included:
New regression contracts include:
t/cases/523_mb301_test_runner_crash_guard.t
t/cases/525_mb303_external_module_test_ownership.t
This work deliberately did not modify:
commit.sh;.gitignore;mp3/ cache;No daemon restart was required.
That does not make the work cosmetic.
A test suite is part of the engineering system. If it lies, hangs, exits early, or checks abandoned architecture, it cannot protect the runtime effectively.
This cleanup restored that protection.
The final green suite validates a surprisingly broad part of Mediabot v3:
The final isolated run reports:
Selected : 404
Passed : 404
Failed : 0
Timeouts : 0
The preflight suite is green.
The test runner no longer hides a premature exit.
The old monolith no longer confuses ownership tests.
Historical contracts now describe current behavior.
And the final log is quiet.
The runtime did not need a spell. The map did.
Mediabot v3 now has a test suite that once again reflects the castle as it really exists.
You must be logged in to reply.