Forum teuk.org

🏰🪄 Mediabot v3: The Hall of Refactoring — External.pm, Seen Spells, Quotes, and Wordcount Runes

in Mediabot · started by TeuK · 1h ago

TeuK · 1h ago

Some Mediabot updates are about fixing one cursed candle in a corridor.

This one was different.

This was the moment where a very old wing of the castle finally got renovated.

For a while, External.pm had grown into a massive magical cabinet: Claude, ChatGPT, TMDB, Spotify, YouTube, URL titles, Facebook, Instagram, X/Twitter, Apple Music, weather, Fortnite, HTTP helpers, IRC formatting, cache handling… all living in the same ancient spellbook.

It worked.

But it was heavy.

So this pass was about taking that giant grimoire, opening it carefully on the table, and moving the spells into proper classrooms.

No database migration.
No schema change.
No reckless rewrite.

Just careful refactoring, plus a handful of real bugfixes found while walking through the moving staircases.

External.pm becomes a façade.
Spotify, Claude, YouTube and URL logic move into dedicated modules.
Runtime SQL mistakes are fixed.
Partyline commands become more consistent.
Wordcount gets smarter.
Quotes become nicer.

In short:

The castle is still the same castle.
But the corridors now have signs.


🧳 The big one: External.pm became a façade

Before this pass, Mediabot/External.pm was huge.

It contained several independent domains:

  • Claude / Anthropic;
  • ChatGPT / OpenAI;
  • TMDB;
  • Spotify;
  • YouTube;
  • generic URL titles;
  • Facebook;
  • Instagram;
  • X/Twitter;
  • Apple Music;
  • weather;
  • Fortnite;
  • shared HTTP helpers.

That is a lot for one file.

The refactoring plan was completed across this series.

Final shape:

Mediabot/External.pm
Mediabot/External/Spotify.pm
Mediabot/External/Claude.pm
Mediabot/External/YouTube.pm
Mediabot/External/URL.pm

External.pm now acts mostly as a façade: it keeps the public API stable, imports the specialized modules, and continues exporting the same public functions expected by the rest of Mediabot.

That is the key point:

No caller should need to know the castle was rearranged inside.

The doors still open.
The rooms are just cleaner behind them.


🕸 Spotify moved first

The first classroom extracted from the old tower was Spotify.

The Spotify URL handler already had a lot of internal helpers for:

  • cleaning bad metadata;
  • decoding titles;
  • parsing JSON-ish blobs;
  • extracting <meta> tags;
  • formatting durations.

Those helpers are now private functions inside:

Mediabot::External::Spotify

This made the Spotify path more readable without changing the expected IRC behavior.

A small but important refactor: less tangled vine, same green badge.


🧠 Claude, ChatGPT and TMDB got their own room

Then came the AI and TMDB logic.

Mediabot::External::Claude now owns:

  • !ai;
  • Claude API calls;
  • Claude history;
  • Claude cache handling;
  • !ai summary;
  • ChatGPT compatibility;
  • TMDB movie/series lookup.

This is a much better home for that code.

The extraction also exposed real runtime issues that syntax checks cannot catch, especially SQL column mistakes in !ai summary.

More on that below.


🐉 YouTube and URL handling were finally split out

The largest remaining pieces were YouTube and URL handling.

They moved into:

Mediabot::External::YouTube
Mediabot::External::URL

YouTube now contains:

  • YouTube Data API lookup;
  • HTML fallback;
  • duration formatting;
  • YouTube search;
  • weather handler;
  • Fortnite handler.

URL now contains:

  • generic URL titles;
  • Chromium fallback;
  • Instagram;
  • Facebook;
  • X/Twitter;
  • Apple Music;
  • generic title cleanup.

That is a big structural improvement.

The old External.pm is no longer a dragon nest.


📉 Result: External.pm lost most of its weight

The final report shows the result clearly:

External.pm         : 215 lines
External/Spotify.pm : 305 lines
External/Claude.pm  : 1143 lines
External/YouTube.pm : 1253 lines
External/URL.pm     : 1392 lines

External.pm went from roughly 3979 lines to a small façade of about 215 lines.

That is a massive maintenance win.

Not because fewer lines magically mean better code, but because each domain is now easier to inspect, test, and reason about.

A spellbook about dragons should not be mixed with a cookbook and a Quidditch rulebook.


🧯 The important lesson: Perl syntax checks are not SQL checks

During the extraction, one bug survived syntax validation:

ORDER BY cl.id

But the real primary key in CHANNEL_LOG is:

id_channel_log

So the SQL was syntactically valid Perl, but wrong at runtime.

The same investigation also caught a bad column name:

cl.text

where the real column is:

cl.publictext

This affected paths such as !ai summary, !wordcount, and Partyline .history.

The fix was straightforward:

cl.publictext AS text
ORDER BY cl.id_channel_log DESC

And an exhaustive scan confirmed no remaining occurrences of the bad patterns.

That is the kind of bug only the real castle map reveals.

perl -c checks the wand.
The database checks whether the spell hits the wall.


🧙 Claude private summary warning fixed

Another subtle issue appeared around !ai summary.

Some summary paths intentionally run with no IRC channel because the response is supposed to go to the requesting nick by NOTICE.

That meant $chan could be undefined.

One log path and history key path still used it directly, causing warnings like:

Use of uninitialized value $chan in concatenation

Now the code uses a stable private key:

__private__

for undefined-channel history contexts.

No behavior change.
Just cleaner logs and safer keys.


🦉 Partyline .seen became smarter

Partyline .seen gained two important fixes.

First, wildcard support was added to match the IRC-side behavior:

.seen teu*

can now return several matching users instead of searching literally for teu*.

Second, exact-match lookup now normalizes the target nick to lowercase.

That matters because USER_SEEN stores nicks in lowercase.

So:

.seen Te[u]K

can now find:

te[u]k

instead of pretending the user was never seen.

Small fix, very practical.


📜 Quotes got more civilized

Quotes also received attention.

Random quote ordering

!q random used LIMIT 1 OFFSET ? without a deterministic ORDER BY.

That is not safe SQL behavior. The database is free to return rows in whatever order its execution plan chooses.

Now the random offset is applied against a deterministic order:

ORDER BY q.id_quotes

Much better.

Avoid immediate repeat

Random quotes also avoid repeating the same quote twice in a row when more than one quote exists.

A small memory ring remembers the last quote per channel and retries a few times.

No one likes a bard who tells the same story twice in a row.

Quote view truncation

Very long quotes are now truncated in !q view, consistent with quote search and random output.

IRC is not a parchment roll without end.

Quote search relevance

!q search now computes a simple relevance score and shows the best match, instead of blindly showing the latest quote.

So search results are now less like Divination class and more like actual search.


🧭 Partyline reminders now validate targets

IRC !remind already learned to reject unknown target nicks.

Partyline .remind now follows the same rule.

Before:

.remind ghostnick hello

could create an orphan reminder that would never be delivered.

Now it validates the target through:

  1. live nicklists;
  2. USER_SEEN;
  3. registered USER.

If the nick is unknown, the reminder is not created.

That keeps the reminder table cleaner.

No letters to ghosts.


🧮 !wordcount learned named periods

!wordcount already became safer earlier with a 50k row limit.

Now it also understands named periods, consistent with !active and !top:

!wordcount today
!wordcount yesterday
!wordcount week
!wordcount 7d
!wordcount teuk week

This makes word statistics more natural.

Instead of always asking for all available history, users can ask human questions:

What words did I use today?
What about this week?
What about the last 7 days?

One final polish was added before commit: period-only forms are parsed correctly.

So:

!wordcount today

means:

wordcount for myself, today

not:

wordcount for user named "today"

Which is good, because “today” is rarely a real IRC nick.


🏗 Scheduler info in !status

!status now includes Scheduler information when available.

That gives a quick view of running scheduled tasks:

Scheduler: 8 task(s) — 8 running, 0 stopped

with task names, intervals, tick counts, and last run age.

This is useful operator visibility.

Mediabot has a lot of background magic now.
It helps to know which enchanted brooms are still sweeping.


🪄 External.pm refactor: compatibility preserved

The important design rule was preserved:

External.pm keeps the public export surface.
Callers do not need to change.

That means modules like Mediabot.pm, mediabot.pl, and existing command dispatch paths should continue using the same functions as before.

The internals changed.

The public spell names stayed stable.

That is exactly how a big refactor should feel from the outside: boring.

Boring is good when the bot is alive on IRC.


🛡 No database schema change

Despite the size of this pass:

No SQL migration.
No new table.
No new column.

The changes are application-level:

  • refactoring;
  • bugfixes;
  • SQL column corrections;
  • command parsing;
  • output improvements;
  • target validation.

Deployment should not require a schema migration.

That matters.


🧪 Suggested regression tests

External submodules

m ai hello
m ai summary today
m ai summary last
m yt search never gonna give you up
m meteo Paris
m fortnite <player>

Then test real URLs:

YouTube URL
Spotify URL
Facebook URL
Instagram URL
X/Twitter URL
Apple Music URL
generic web page URL

Quotes

m q random
m q random
m q view 1
m q search teuk trivia
m q stats

Wordcount

m wordcount
m wordcount today
m wordcount yesterday
m wordcount week
m wordcount 7d
m wordcount teuk week

Partyline

.seen Te[u]K
.seen teu*
.history #teuk
.remind teuk test from partyline
.top
.top #teuk 5

Shell

perl -I. -c Mediabot/UserCommands.pm
perl -I. -c Mediabot/External.pm
perl -I. -c Mediabot/External/Spotify.pm
perl -I. -c Mediabot/External/Claude.pm
perl -I. -c Mediabot/External/YouTube.pm
perl -I. -c Mediabot/External/URL.pm
perl -I. -c Mediabot/Partyline.pm
perl -I. -c Mediabot/Quotes.pm
perl -I. -c Mediabot/AdminCommands.pm
perl -I. -c mediabot.pl

LC_ALL=C grep -P '\x00' -n Mediabot/*.pm Mediabot/*/*.pm || true
grep -rn "cl\.text\b\|ORDER BY cl\.id\b" Mediabot/ || true
git diff --check

🏁 Final status

This was the Hall of Refactoring pass.

Mediabot now has:

  • a lightweight External.pm façade;
  • dedicated modules for Spotify, Claude, YouTube and URL handling;
  • fixed SQL column references in summary/history paths;
  • cleaner Claude private summary keys;
  • smarter Partyline .seen;
  • safer Partyline .remind;
  • better quote random/search/view behavior;
  • !wordcount with named periods;
  • Scheduler visibility in !status;
  • no database schema change.

The castle is cleaner.
The corridors are labeled.
The portraits are still sarcastic, but at least they are in the right rooms now.

You must be logged in to reply.