Forum teuk.org

Nearly Headless Nick: Fixing a Live SQL Crash, a Broken Feature, and a Schema That Had Been Lying to Us

in Mediabot · started by TeuK · 2w ago

TeuK · 2w ago

The Incident

Every time a user joined a channel, this appeared in the logs:

[DEBUG1] userOnJoin() SQL Error: Unknown column 'id_user_level' in 'SELECT'
         Query: SELECT id_channel, notice, id_user_level FROM CHANNEL WHERE name = ?

id_user_level has never existed in the CHANNEL table. The column belongs to USER, not CHANNEL. The query was selecting a ghost column — it had probably crept in during a refactoring pass and gone unnoticed because the rest of userOnJoin() still worked well enough: the user was matched, auto-mode and greet were applied, the only thing silently failing was the channel notice lookup.

Fix: Remove id_user_level from the SELECT. The function only uses id_channel and notice from that row, so:

-- Before (crashing)
SELECT id_channel, notice, id_user_level FROM CHANNEL WHERE name = ?

-- After
SELECT id_channel, notice FROM CHANNEL WHERE name = ?

One column removed. Zero regressions. Bug gone.


The Broken Feature: !holdcmd and !showcmd

These two commands exist in the bot’s dispatch table and have proper require_level('Administrator') guards, full argument validation, and logging. They just don’t work.

mbDbHoldCommand_ctx() does:

SELECT id_public_commands, active FROM PUBLIC_COMMANDS WHERE command = ?
UPDATE PUBLIC_COMMANDS SET active = 0 WHERE id_public_commands = ?

mbDbShowCommand_ctx() does:

SELECT PC.hits, PC.id_user, PC.creation_date, PC.action, PC.active, ...
FROM PUBLIC_COMMANDS PC ...

The column active does not exist in PUBLIC_COMMANDS in the production database. Both commands fail immediately on SQL error. The show command has a fallback defined $ref->{active} ? $ref->{active} : 1 which is a nice touch — but it never gets reached because the query dies first.

The reference schema (mediabot.sql) already has the column correctly defined:

`active` TINYINT(1) NOT NULL DEFAULT 1,

The production database is simply behind. Apply this migration:

ALTER TABLE PUBLIC_COMMANDS
  ADD COLUMN active TINYINT(1) NOT NULL DEFAULT 1;

That’s it. No code changes needed — the feature was already fully implemented.


The Schema Audit: mediabot.sql vs Production

While investigating the active column, a systematic diff was run between mediabot.sql (the reference schema) and the actual production dump. The results were instructive.

RESPONDERS — completely wrong column names

The reference schema had:

`command`  VARCHAR(255) NOT NULL,
`response` VARCHAR(255) NOT NULL,

The production database has:

`responder` VARCHAR(255) DEFAULT NULL,
`answer`    TEXT DEFAULT NULL,
`chance`    BIGINT UNSIGNED NOT NULL DEFAULT 95,
`hits`      BIGINT UNSIGNED NOT NULL DEFAULT 0,

The Perl code uses responder, answer, chance, and hits throughout DBCommands.pm. The reference schema was simply wrong — renamed columns that were never applied to production, and two columns (chance, hits) dropped from the schema file entirely by mistake.

mediabot.sql has been corrected to match production and the code.

TIMEZONE — wrong column name

Reference schema had timezone. Production has tz. The code does SELECT tz FROM TIMEZONE. Reference schema was wrong. Fixed.

TIMERS — columns that don’t exist yet

The reference schema had id_channel and enabled on TIMERS. Neither exists in production, and no Perl code references them. They appear to be planned features. Rather than silently dropping them, they’ve been kept as SQL comments:

-- Future columns (not yet deployed to production):
-- `id_channel` BIGINT UNSIGNED DEFAULT NULL,
-- `enabled`    TINYINT(1) NOT NULL DEFAULT 1,

This way the intent is preserved without misrepresenting the actual schema.

USER.hostmasks_legacy

The production database still has a hostmasks_legacy column in USER — the old CSV hostmask field from before the USER_HOSTMASK table migration. The bot already detects this on startup and logs a warning. Nothing to do in code; this is a cleanup task for a future migration script run.


Summary

Item Status Action
userOnJoin() SQL crash Fixed in code Deploy UserCommands.pm
!holdcmd / !showcmd broken Fixed — needs DB migration Run ALTER TABLE below
RESPONDERS schema wrong Fixed in mediabot.sql Commit schema file
TIMEZONE.tz schema wrong Fixed in mediabot.sql Commit schema file
TIMERS future columns Clarified in mediabot.sql Commit schema file
USER.hostmasks_legacy Pre-existing, tracked Future cleanup

Required migration on production DB

ALTER TABLE PUBLIC_COMMANDS
  ADD COLUMN active TINYINT(1) NOT NULL DEFAULT 1;

Run this once on your production database. All existing commands will default to active = 1 (no behavioural change). After this, !holdcmd and !showcmd will work.

You must be logged in to reply.