Forum teuk.org

🛡️ Protego Maxima: Hardening Quotes, Seen, Auth Metrics and Partyline Safety in Mediabot v3

in Mediabot · started by TeuK · 1w ago

TeuK · 1w ago

🛡️ Protego Maxima: Hardening Quotes, Seen, Auth Metrics and Partyline Safety in Mediabot v3

A new hardening milestone has been reached for Mediabot v3.

This update is not about one flashy feature. It is about making several important internal systems safer, faster, and more reliable:

!quote search
!quote random
!seen / USER_SEEN
logging rotation
Partyline .ban safety
nick flood protection
auth session metrics
DCC parser regression coverage
database bootstrap notes

The goal was simple: keep Mediabot running like a serious IRC bot should — stable, predictable, and resistant to edge cases.


Quote search: filtering moved into SQL

The quote search command was improved so it no longer fetches all quotes into memory before filtering them in Perl.

Before, a quote search could effectively do this:

load all quotes for the channel
loop through them in Perl
run a user-controlled regex match

That was inefficient and risky.

Now the filtering is done directly in SQL with LIKE placeholders.

This avoids loading the whole quote table into memory for every search.


Quote search: safer user input

The old search path could pass user input into a Perl regex.

That is dangerous because a malicious or accidental regex such as:

(.*)+

can cause catastrophic backtracking and hurt performance.

The new implementation no longer uses direct user regex matching for quote search.

Instead, it uses SQL placeholders and escapes SQL LIKE wildcard characters:

%  becomes \%
_  becomes \_
\  becomes \\

This means user input is treated more literally.


Quote search: multi-word search

The quote search now supports multi-word queries more naturally.

For example:

!q search hello world

can match quotes containing both words, even if they are not one exact contiguous string.

Each word becomes its own SQL LIKE condition joined with AND.

That gives a better user experience while keeping the search logic clear.


Quote random: removed random SQL sorting

The random quote command no longer relies on random SQL sorting.

The old pattern:

random SQL sort + LIMIT 1

is expensive because the database may need to evaluate and sort many rows just to return one quote.

The new approach uses a count plus random offset strategy.

That is much better for larger quote tables.


USER_SEEN: lowercase nick normalization

The USER_SEEN handling was hardened by normalizing nicknames to lowercase.

IRC nicknames are effectively case-insensitive from a user perspective, so:

Teuk
teuk
TEUK

should not become separate seen entries.

The update ensures that updateUserSeen() stores nicks consistently, and the seen lookup path also normalizes the requested nick.

This makes !seen more reliable.


USER_SEEN: safer event hooks

Several IRC event handlers now protect updateUserSeen() calls with eval.

This matters because IRC bots should not crash just because the database is temporarily unavailable.

The protected paths include events such as:

PRIVMSG
JOIN
QUIT
PART
NICK

If the database has a temporary issue, the failure is logged instead of bringing down the IRC handler.

That is exactly the kind of resilience a long-running bot needs.


Database note: USER_SEEN table

For existing installations, the USER_SEEN table may need to be created manually before using the !seen feature.

On the development server, the table was created with the following SQL.

Recommended manual import command used during this milestone:

cd /home/mediabot/mediabot_v3

mysql -u root --default-character-set=utf8mb4 mediabotv3 << 'SQL'
CREATE TABLE IF NOT EXISTS `USER_SEEN` (
  `nick`        VARCHAR(64)   CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL,
  `channel`     VARCHAR(64)   CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `userhost`    VARCHAR(128)  CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NOT NULL DEFAULT '',
  `event_type`  ENUM('message','join','part','quit','nick') NOT NULL DEFAULT 'message',
  `last_msg`    VARCHAR(512)  CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `new_nick`    VARCHAR(64)   CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `seen_at`     DATETIME      NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`nick`),
  KEY `idx_user_seen_seen_at` (`seen_at`),
  KEY `idx_user_seen_channel` (`channel`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
SQL

After creating the table, the relevant syntax checks were run:

perl -I. -c Mediabot/Helpers.pm && \
perl -I. -c Mediabot/UserCommands.pm && \
perl -I. -c mediabot.pl && echo ALL OK

For a future clean install, this should eventually become part of the installer or a dedicated migration file so that a fresh git clone does not require manual database archaeology.

A good future migration name would be:

install/migrations/YYYYMMDD_user_seen.sql

Logging: cheaper rotation checks

Mediabot::Log was improved so it no longer checks the log file size on every single log line.

The old behavior could cause a filesystem stat() call for every write.

On a busy bot, especially with high debug levels, that can be wasteful.

The new logic throttles size checks using a write counter and resets the counter after rotation/reopen.

This keeps log rotation working while reducing unnecessary system calls.


Partyline .ban: real WHOIS token guard

The Partyline .ban flow uses WHOIS data to resolve a target before applying a ban.

The risky case was concurrent .ban commands from multiple Partyline sessions.

Because the WHOIS context is asynchronous, one .ban could overwrite the shared WHOIS state used by another.

A real token guard was added:

Partyline .ban stores a one-shot token on the session
WHOIS_VARS also stores the same token
the WHOIS callback compares both before applying the ban
mismatch means the ban is refused and the user is asked to retry

This is important because a comment saying “token guard” is not enough. The callback now actually verifies the token before applying the action.


Nick flood protection

A per-nick flood protection layer was added.

The goal is to prevent a single nick from hammering the bot with too many commands in a short period.

The protection now:

normalizes nicknames to lowercase
uses a short sliding window
cleans stale nick flood states periodically
logs flood events

This is independent from channel anti-flood logic and helps protect both public and private command paths.


Auth metrics: session gauge fixed

The auth session metric was improved.

The gauge:

mediabot_auth_sessions_total

is now updated after session changes, not before.

This avoids off-by-one behavior after autologin.

Logout paths also update the metric, including numeric user-id based logout.

This gives Prometheus/Grafana a more accurate view of authenticated sessions.


DCC parser regression fix

A small but important DCC parser regression was also corrected.

The parser now distinguishes correctly between:

raw CTCP DCC payload
stripped DCC payload

For example:

\x01DCC CHAT chat 1383695523 1024\x01

is marked as CTCP-wrapped, while:

CHAT chat 1383695523 1024

is marked as already stripped.

That distinction is now covered by tests.


New regression test file

A new test file was added:

t/cases/12_hotfix_after_claude.t

It verifies the important hardening markers:

quote search uses SQL LIKE with ESCAPE
quote search does not use direct user regex
quote random no longer uses random SQL sorting
nick flood normalizes lowercase
auth session metrics are updated through a helper
Partyline .ban has a real WHOIS token guard
Log.pm has write counter throttling
updateUserSeen failures are protected
DCC parser CTCP flag behavior is correct

This is useful because it protects the intent of the hardening pass, not just the syntax.


Validation

The full test suite was run successfully after the fixes.

The final result was:

PASSED

with the full suite covering the existing command framework, auth, metrics, Partyline, DCC parser, channel bans, and the new hotfix regression checks.


Files involved

Main files touched by this hardening pass:

Mediabot/Quotes.pm
Mediabot/Helpers.pm
Mediabot/Auth.pm
Mediabot/Log.pm
Mediabot/Partyline.pm
Mediabot/DCC.pm
mediabot.pl
t/cases/12_hotfix_after_claude.t

Depending on what was already staged or committed from the DCC work, these test files may also be part of the broader validation context:

t/cases/10_dcc_ctcp_regression.t
t/cases/11_dcc_parser.t

Suggested commit

🛡️ Protego Maxima: harden quotes, seen, auth metrics and Partyline safety

This update makes Mediabot v3 more robust in several places that matter for a real IRC bot.

It improves performance, removes risky parsing behavior, protects asynchronous Partyline operations, keeps metrics more accurate, and adds regression tests so these fixes do not silently disappear later.

Mediabot continues to move in the right direction: still personal, still Perl, still IRC-native, but increasingly structured like a serious long-running bot.

You must be logged in to reply.