Forum teuk.org

The Great JOIN Migration: 27 SQL Spells Rewritten

in Mediabot · started by TeuK · 3w ago

TeuK · 3w ago

Overview

This session tackled one of the oldest forms of technical debt in the codebase: implicit SQL JOINs. Since Mediabot’s origins in 2007, many queries were written in the traditional comma-separated FROM A, B WHERE A.id = B.id style. Modern SQL has preferred explicit JOIN ... ON syntax for decades — it’s clearer, safer, and easier to audit. 27 occurrences across 9 files were converted in a single pass.

Two other improvements came along for the ride: database connection resilience via ensure_connected(), and a log level cleanup that stops flooding the console with debug noise.


1. The Implicit JOIN Problem

Implicit JOINs are not wrong — they produce identical query plans in MariaDB. But they have real costs:

  • Readability: the join condition is buried in the WHERE clause alongside filter conditions. It’s not immediately obvious which conditions define relationships and which filter rows.
  • Auditability: implicit cross-joins are indistinguishable from accidental cartesian products until you read every condition carefully.
  • Maintenance risk: adding a new condition to a WHERE clause that already mixes join conditions and filters is a common source of bugs.
  • LEFT JOIN impossible: you can’t express an outer join with comma syntax — which matters for the RESPONDERS queries (see below).

The Pattern

Every conversion followed the same mechanical rule:

-- Before (implicit)
SELECT level FROM CHANNEL, USER_CHANNEL
WHERE CHANNEL.id_channel = USER_CHANNEL.id_channel
  AND CHANNEL.name = ?
  AND USER_CHANNEL.id_user = ?

-- After (explicit)
SELECT USER_CHANNEL.level
FROM CHANNEL
JOIN USER_CHANNEL ON USER_CHANNEL.id_channel = CHANNEL.id_channel
WHERE CHANNEL.name = ?
  AND USER_CHANNEL.id_user = ?

The WHERE clause now contains only filter conditions. The relationship is declared in the JOIN.


2. File-by-File Breakdown

Helpers.pm — 6 conversions

The most conversions in a single file. Affected queries: BADWORDS lookup (called on every public message), USER_CHANNEL level lookups, IGNORES check.

The IGNORES query is called on every incoming IRC message to check if the sender is on the ignore list — making it one of the hottest paths in the bot. The explicit JOIN makes the intent immediately clear:

-- Before
SELECT * FROM IGNORES, CHANNEL
WHERE IGNORES.id_channel = CHANNEL.id_channel
  AND CHANNEL.name LIKE ?

-- After
SELECT IGNORES.* FROM IGNORES
JOIN CHANNEL ON CHANNEL.id_channel = IGNORES.id_channel
WHERE CHANNEL.name LIKE ?

Quotes.pm — 7 conversions

Every single query in Quotes.pm used the implicit style. All 7 were converted, including the date range queries (minDate/maxDate) and the random quote selector.

UserCommands.pm — 5 conversions

The most structurally varied set. Two of them were inside heredoc strings (Perl’s <<SQL ... SQL syntax), requiring careful multi-line replacement. The CHANNEL_LOG query also got its ambiguous column references qualified:

-- Before
FROM CHANNEL, CHANNEL_LOG
WHERE (event_type = 'public' OR event_type = 'action')
  AND CHANNEL.id_channel = CHANNEL_LOG.id_channel

-- After
FROM CHANNEL
JOIN CHANNEL_LOG ON CHANNEL_LOG.id_channel = CHANNEL.id_channel
WHERE (CHANNEL_LOG.event_type = 'public' OR CHANNEL_LOG.event_type = 'action')

DBCommands.pm — 2 conversions (with a twist)

The RESPONDERS queries were the trickiest. They use a global scope mechanism: when id_channel = 0, a responder applies to all channels. The original query expressed this with an OR:

FROM RESPONDERS, CHANNEL
WHERE ((CHANNEL.id_channel = RESPONDERS.id_channel AND CHANNEL.name LIKE ?)
    OR (RESPONDERS.id_channel = 0))

A simple INNER JOIN would exclude global responders (id_channel = 0) because there’s no matching CHANNEL row. The fix uses LEFT JOIN:

FROM RESPONDERS
LEFT JOIN CHANNEL ON CHANNEL.id_channel = RESPONDERS.id_channel
WHERE ((CHANNEL.name LIKE ? AND CHANNEL.id_channel IS NOT NULL)
    OR RESPONDERS.id_channel = 0)

This preserves the global-scope logic while making the relationship explicit.

Remaining files

Hailo.pm (1), Mediabot.pm (1), ChannelCommands.pm (2), Radio.pm (1), mediabot.pl (2) — all straightforward two-table or three-table conversions.


3. getDbh()ensure_connected()

mediabot.pl had 5 remaining calls to $mediabot->getDbh->prepare(...) — the RadioPub timer, the RandomQuote timer, the JOIN handler, the WHOIS automode lookup, and the random quote responder. These were all converted to:

$mediabot->{db}->ensure_connected()->prepare(...)

ensure_connected() was added to DB.pm in a previous session. It pings the database handle and reconnects automatically if the connection has dropped. The five hottest long-running paths in the bot — the ones most likely to run after a multi-hour idle period — now all go through it.


4. Log Level Cleanup

WHO / WHOIS / WHOWAS handlers

Three handlers were logging at level 0 (always visible) for purely informational messages that have no operational significance:

# Before
$mediabot->{logger}->log(0, "on_message_WHO() $target_name");

# After
$mediabot->{logger}->log(3, "on_message_WHO() $target_name");

Level 3 is debug output. These messages are only useful when diagnosing a specific WHO/WHOIS interaction — not something you want in the production log by default.

External.pm — API configuration messages

Nine messages related to missing API keys and empty responses were at level 0. These indicate misconfiguration, not crashes — level 1 is appropriate:

  • YouTube API key not set (×3 functions)
  • Fortnite API key not set
  • TMDB API key not set
  • ChatGPT empty response from API

Helpers.pm — operational noise

Four messages downgraded from 0 to 1: Trying to join, checkAntiFlood() End of antiflood, could not find CHANNEL_FLOOD record, Mediabot is up to date.


Summary

Change Files Count
Implicit JOIN → explicit JOIN 9 27
getDbh()ensure_connected() 1 5
log(0)log(1) 2 9
log(0)log(3) 1 3

Zero functional changes. The bot behaves identically — it just does it more clearly, more safely, and with less noise in the logs.


“The spells we cast most often are the ones most worth casting correctly.”
— Not Hermione, but she would have approved of the explicit JOIN syntax

You must be logged in to reply.