Minimize Spam with Bayesian Filtering on Red Hat and Ubuntu Servers

Spam emails are not just annoying—they’re a security risk and a drain on server resources. While basic spam filters catch obvious junk, Bayesian filtering provides intelligent, self-learning protection that improves over time. This guide will walk you through implementing Bayesian filtering with SpamAssassin on both Red Hat (RHEL/CentOS/AlmaLinux/Rocky) and Ubuntu servers.

What is Bayesian Filtering?

Bayesian filtering uses probability theory to determine if an email is spam. It analyzes the words and patterns in emails, learning from what you mark as spam or ham (legitimate mail). The more you train it, the smarter it gets.

Prerequisites

Root or sudo access to your server
Postfix or other MTA already configured
Basic understanding of email flow

Part 1: Installing SpamAssassin On Ubuntu/Debian

# Update package list
apt update

# Install SpamAssassin and related tools
apt install spamassassin spamc sa-learn -y

# Enable the service to start on boot
systemctl enable spamassassin

On Red Hat/CentOS/AlmaLinux/Rocky

# Install EPEL repository (if not already enabled)
dnf install epel-release -y

# Install SpamAssassin
dnf install spamassassin spamc -y

# Enable the service
systemctl enable spamassassin

Part 2: Initial Configuration

2.1 Configure SpamAssassin

Edit the main configuration file:

nano /etc/mail/spamassassin/local.cf

Add these basic settings:

# Required score to mark as spam (lower = more aggressive)
required_score 5.0

# Rewrite subject line for spam
rewrite_header subject *****SPAM*****

# Use Bayesian filtering
use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_spam 7.0
bayes_auto_learn_threshold_ham 0.5

# Enable network tests
skip_rbl_checks 0
use_razor2 1
use_dcc 1
use_pyzor 1

# DNS blocklists
use_dnsbl 1
dns_available test: 8.8.8.8
score DNSBL 3.0

# Whitelist and blacklist
# whitelist_from *@yourdomain.com
# blacklist_from *@known-spam-domain.com

# Additional rules
ok_languages all
ok_locales all

2.2 Configure SpamAssassin to Work with Postfix

Edit Postfix master configuration:

nano /etc/postfix/master.cf

Add or uncomment these lines:

smtp      inet  n       -       y       -       -       smtpd
  -o content_filter=spamassassin

spamassassin unix -     n       n       -       -       pipe
  flags=Rq user=spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}

2.3 Start SpamAssassin

# Start the service
systemctl start spamassassin

# Restart Postfix to apply changes
systemctl restart postfix

# Check status
systemctl status spamassassin

Part 3: Initializing the Bayesian Database

3.1 Create the Bayes Directory

mkdir -p /root/.spamassassin
chmod 750 /root/.spamassassin

3.2 Initialize with a Test Message

echo "This is a normal test email from my server. It contains regular text that should be considered ham." | sa-learn --ham

3.3 Verify Database Creation

# Check database files
ls -la /root/.spamassassin/

# View database statistics
sa-learn --dump magic

Expected output:

0.000          0          3          0  non-token data: bayes db version
0.000          0          0          0  non-token data: nspam
0.000          0          1          0  non-token data: nham
0.000          0         21          0  non-token data: ntokens

Part 4: Training the Filter

4.1 Create Training Script

Create a script to automate training:

nano /usr/local/bin/sa-learn-weekly.sh

Paste this content (adjust paths for your mail setup):

#!/bin/bash
# Bayesian filter training script

LOG_FILE="/var/log/sa-learn.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")

echo "[$DATE] Starting Bayes training..." >> $LOG_FILE

# Train spam from all users' Spam folders
if [ -d "/var/qmail/mailnames" ]; then
    # Plesk/Qmail structure
    find /var/qmail/mailnames -path "*/.Spam/cur" -type d 2>/dev/null | while read spamdir; do
        count=$(ls "$spamdir" | wc -l)
        echo "  Training spam from $spamdir ($count messages)" >> $LOG_FILE
        sa-learn --spam "$spamdir" --quiet 2>/dev/null
    done
    
    # Train ham from inboxes (excluding Spam folders)
    find /var/qmail/mailnames -path "*/cur" -type d 2>/dev/null | grep -v "/\\.Spam/" | while read inbox; do
        count=$(ls "$inbox" | wc -l)
        echo "  Training ham from $inbox ($count messages)" >> $LOG_FILE
        sa-learn --ham "$inbox" --quiet 2>/dev/null
    done
elif [ -d "/var/mail" ]; then
    # Standard mail directory structure
    for user in $(ls /var/mail); do
        if [ -d "/var/mail/$user/.Spam" ]; then
            sa-learn --spam /var/mail/$user/.Spam --quiet
        fi
        if [ -d "/var/mail/$user/cur" ]; then
            sa-learn --ham /var/mail/$user/cur --quiet
        fi
    done
fi

# Log results
echo "  Training complete. Database stats:" >> $LOG_FILE
sa-learn --dump magic >> $LOG_FILE
echo "" >> $LOG_FILE

Make it executable:

chmod +x /usr/local/bin/sa-learn-weekly.sh

4.2 Schedule Automatic Training

Add to crontab (runs every Sunday at 3 AM):

# Edit crontab
crontab -e

# Add this line:
0 3 * * 0 /usr/local/bin/sa-learn-weekly.sh

4.3 Train Existing Mail (Optional)

If you have existing mail folders, train them immediately:

# For Plesk servers
for spamdir in $(find /var/qmail/mailnames -path "*/.Spam/cur" -type d 2>/dev/null); do
    echo "Training spam from: $spamdir"
    sa-learn --spam "$spamdir" --progress
done

for inbox in $(find /var/qmail/mailnames -path "*/cur" -type d 2>/dev/null | grep -v "/\\.Spam/"); do
    echo "Training ham from: $inbox"
    sa-learn --ham "$inbox" --progress
done

Part 5: Advanced Configuration

5.1 Adjust Spam Sensitivity Edit `/etc/mail/spamassassin/local.cf`:

# More aggressive (lower score)
required_score 3.5

# Less aggressive (higher score)
required_score 7.5

# Bayesian thresholds
bayes_auto_learn_threshold_spam 6.0
bayes_auto_learn_threshold_ham 0.1

5.2 Add Custom Blocklists

# Add to local.cf
blacklist_from *@*.xyz
blacklist_from *@*.top
blacklist_from *@*.bid
blacklist_from *@*.work
blacklist_from *@*.date
blacklist_from *@*.win

5.3 Create User-Specific Whitelists

# For a specific user
mkdir -p /home/user/.spamassassin
echo "whitelist_from trusted@domain.com" >> /home/user/.spamassassin/user_prefs

Part 6: Monitoring and Maintenance

6.1 Check Bayesian Database Status

# View current statistics
sa-learn --dump magic

Expected output:

0.000          0       3103          0  non-token data: nham
0.000          0        456          0  non-token data: nspam
0.000          0     207642          0  non-token data: ntokens

6.2 Monitor Spam in Real-Time

# Watch mail logs
tail -f /var/log/mail.log | grep -E "spamd: result"

# Check spam scores
grep "spamd: result" /var/log/mail.log | tail -20

6.3 Database Maintenance

Periodically expire old tokens:

# Manually expire old data
sa-learn --force-expire

# Check after expiry
sa-learn --dump magic

6.4 Create Monitoring Script

nano /usr/local/bin/check-bayes.sh

#!/bin/bash
# Check Bayes health

STATS=$(sa-learn --dump magic)
SPAM=$(echo "$STATS" | grep nspam | awk '{print $5}')
HAM=$(echo "$STATS" | grep nham | awk '{print $5}')
TOKENS=$(echo "$STATS" | grep ntokens | awk '{print $5}')

echo "Bayes Status:"
echo "  Ham: $HAM messages"
echo "  Spam: $SPAM messages"
echo "  Tokens: $TOKENS"

if [ "$SPAM" -lt 200 ]; then
    echo "⚠️  Warning: Only $SPAM spam messages learned (need 200+)"
fi

if [ "$HAM" -lt 200 ]; then
    echo "⚠️  Warning: Only $HAM ham messages learned (need 200+)"
fi

Make it executable:

chmod +x /usr/local/bin/check-bayes.sh

Part 7: Testing Your Setup

7.1 Test Spam Detection

# Create a test spam message
echo "XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X" | spamc

You should see a high spam score.

7.2 Test Ham Detection

# Create a test ham message
echo "Dear colleague, please find the quarterly report attached." | spamc

Should return a low score.

Troubleshooting Common Issues and Solutions

Issue: Bayes database not created

# Create manually
mkdir -p /root/.spamassassin
sa-learn --sync

Issue: Permission denied

# Fix permissions
chown -R spamd:spamd /root/.spamassassin
chmod 750 /root/.spamassassin

Issue: SpamAssassin not starting

# Check logs
journalctl -u spamassassin -f

# Test configuration
spamassassin --lint

Issue: No spam being detected

# Check required score
grep required_score /etc/mail/spamassassin/local.cf

# Lower it if too high
sed -i 's/required_score .*/required_score 3.5/' /etc/mail/spamassassin/local.cf
systemctl restart spamassassin

Best Practices Summary

Practice	Recommendation
Initial training	Start with at least 200 ham and 200 spam
Ongoing training	Weekly automatic training
Sensitivity	Start with 5.0, adjust based on results
Monitoring	Check stats monthly
Backup	Backup `/root/.spamassassin` regularly
Updates	Keep SpamAssassin updated

Conclusion

Bayesian filtering is one of the most effective ways to combat spam. Unlike static rules, it adapts to your specific email patterns and improves over time. With this setup, your server will:

✅ Learn from every email it processes
✅ Improve accuracy over time
✅ Reduce false positives
✅ Catch more spam with fewer resources
✅ Require minimal maintenance

The initial training period takes a few weeks, but once your database reaches 1,000+ spam and ham messages, you’ll see excellent results. Combined with DNS blocklists and regular updates, this provides enterprise-grade spam protection for your servers.

Remember: The key to effective Bayesian filtering is consistent training. Set up the cron job, let it run, and watch your spam detection improve month after month.

Minimize Spam with Bayesian Filtering on Red Hat and Ubuntu Servers

What is Bayesian Filtering?

Prerequisites

Part 1: Installing SpamAssassin On Ubuntu/Debian

On Red Hat/CentOS/AlmaLinux/Rocky

Part 2: Initial Configuration

2.1 Configure SpamAssassin

2.2 Configure SpamAssassin to Work with Postfix

2.3 Start SpamAssassin

Part 3: Initializing the Bayesian Database

3.1 Create the Bayes Directory

3.2 Initialize with a Test Message

3.3 Verify Database Creation

Part 4: Training the Filter

4.1 Create Training Script

4.2 Schedule Automatic Training

4.3 Train Existing Mail (Optional)

Part 5: Advanced Configuration

5.1 Adjust Spam Sensitivity Edit `/etc/mail/spamassassin/local.cf`:

5.2 Add Custom Blocklists

5.3 Create User-Specific Whitelists

Part 6: Monitoring and Maintenance

6.1 Check Bayesian Database Status

6.2 Monitor Spam in Real-Time

6.3 Database Maintenance

6.4 Create Monitoring Script

Part 7: Testing Your Setup

7.1 Test Spam Detection

7.2 Test Ham Detection

Troubleshooting Common Issues and Solutions

Best Practices Summary

Conclusion

Submit a Comment Cancel reply

News & Updates

Comments & Suggestions

Ready to start? Contact us!

Minimize Spam with Bayesian Filtering on Red Hat and Ubuntu Servers

What is Bayesian Filtering?

Prerequisites

Part 1: Installing SpamAssassin On Ubuntu/Debian

On Red Hat/CentOS/AlmaLinux/Rocky

Part 2: Initial Configuration

2.1 Configure SpamAssassin

2.2 Configure SpamAssassin to Work with Postfix

2.3 Start SpamAssassin

Part 3: Initializing the Bayesian Database

3.1 Create the Bayes Directory

3.2 Initialize with a Test Message

3.3 Verify Database Creation

Part 4: Training the Filter

4.1 Create Training Script

4.2 Schedule Automatic Training

4.3 Train Existing Mail (Optional)

Part 5: Advanced Configuration

5.1 Adjust Spam Sensitivity Edit /etc/mail/spamassassin/local.cf:

5.2 Add Custom Blocklists

5.3 Create User-Specific Whitelists

Part 6: Monitoring and Maintenance

6.1 Check Bayesian Database Status

6.2 Monitor Spam in Real-Time

6.3 Database Maintenance

6.4 Create Monitoring Script

Part 7: Testing Your Setup

7.1 Test Spam Detection

7.2 Test Ham Detection

Troubleshooting Common Issues and Solutions

Best Practices Summary

Conclusion

Submit a Comment Cancel reply

News & Updates

Comments & Suggestions

5.1 Adjust Spam Sensitivity Edit `/etc/mail/spamassassin/local.cf`: