Spam emails are not just annoying—they’re a security risk and a drain on server resources. While basic spam filters catch obvious junk, Bayesian filtering provides intelligent, self-learning protection that improves over time. This guide will walk you through implementing Bayesian filtering with SpamAssassin on both Red Hat (RHEL/CentOS/AlmaLinux/Rocky) and Ubuntu servers.
What is Bayesian Filtering?
Bayesian filtering uses probability theory to determine if an email is spam. It analyzes the words and patterns in emails, learning from what you mark as spam or ham (legitimate mail). The more you train it, the smarter it gets.
Prerequisites
- Root or sudo access to your server
- Postfix or other MTA already configured
- Basic understanding of email flow
Part 1: Installing SpamAssassin On Ubuntu/Debian
# Update package list
apt update
# Install SpamAssassin and related tools
apt install spamassassin spamc sa-learn -y
# Enable the service to start on boot
systemctl enable spamassassin
On Red Hat/CentOS/AlmaLinux/Rocky
# Install EPEL repository (if not already enabled)
dnf install epel-release -y
# Install SpamAssassin
dnf install spamassassin spamc -y
# Enable the service
systemctl enable spamassassin
Part 2: Initial Configuration
2.1 Configure SpamAssassin
Edit the main configuration file:
nano /etc/mail/spamassassin/local.cf
Add these basic settings:
# Required score to mark as spam (lower = more aggressive)
required_score 5.0
# Rewrite subject line for spam
rewrite_header subject *****SPAM*****
# Use Bayesian filtering
use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_spam 7.0
bayes_auto_learn_threshold_ham 0.5
# Enable network tests
skip_rbl_checks 0
use_razor2 1
use_dcc 1
use_pyzor 1
# DNS blocklists
use_dnsbl 1
dns_available test: 8.8.8.8
score DNSBL 3.0
# Whitelist and blacklist
# whitelist_from *@yourdomain.com
# blacklist_from *@known-spam-domain.com
# Additional rules
ok_languages all
ok_locales all
2.2 Configure SpamAssassin to Work with Postfix
Edit Postfix master configuration:
nano /etc/postfix/master.cf
Add or uncomment these lines:
smtp inet n - y - - smtpd
-o content_filter=spamassassin
spamassassin unix - n n - - pipe
flags=Rq user=spamd argv=/usr/bin/spamc -f -e /usr/sbin/sendmail -oi -f ${sender} ${recipient}
2.3 Start SpamAssassin
# Start the service
systemctl start spamassassin
# Restart Postfix to apply changes
systemctl restart postfix
# Check status
systemctl status spamassassin
Part 3: Initializing the Bayesian Database
3.1 Create the Bayes Directory
mkdir -p /root/.spamassassin
chmod 750 /root/.spamassassin
3.2 Initialize with a Test Message
echo "This is a normal test email from my server. It contains regular text that should be considered ham." | sa-learn --ham
3.3 Verify Database Creation
# Check database files
ls -la /root/.spamassassin/
# View database statistics
sa-learn --dump magic
Expected output:
0.000 0 3 0 non-token data: bayes db version 0.000 0 0 0 non-token data: nspam 0.000 0 1 0 non-token data: nham 0.000 0 21 0 non-token data: ntokens
Part 4: Training the Filter
4.1 Create Training Script
Create a script to automate training:
nano /usr/local/bin/sa-learn-weekly.sh
Paste this content (adjust paths for your mail setup):
#!/bin/bash
# Bayesian filter training script
LOG_FILE="/var/log/sa-learn.log"
DATE=$(date "+%Y-%m-%d %H:%M:%S")
echo "[$DATE] Starting Bayes training..." >> $LOG_FILE
# Train spam from all users' Spam folders
if [ -d "/var/qmail/mailnames" ]; then
# Plesk/Qmail structure
find /var/qmail/mailnames -path "*/.Spam/cur" -type d 2>/dev/null | while read spamdir; do
count=$(ls "$spamdir" | wc -l)
echo " Training spam from $spamdir ($count messages)" >> $LOG_FILE
sa-learn --spam "$spamdir" --quiet 2>/dev/null
done
# Train ham from inboxes (excluding Spam folders)
find /var/qmail/mailnames -path "*/cur" -type d 2>/dev/null | grep -v "/\\.Spam/" | while read inbox; do
count=$(ls "$inbox" | wc -l)
echo " Training ham from $inbox ($count messages)" >> $LOG_FILE
sa-learn --ham "$inbox" --quiet 2>/dev/null
done
elif [ -d "/var/mail" ]; then
# Standard mail directory structure
for user in $(ls /var/mail); do
if [ -d "/var/mail/$user/.Spam" ]; then
sa-learn --spam /var/mail/$user/.Spam --quiet
fi
if [ -d "/var/mail/$user/cur" ]; then
sa-learn --ham /var/mail/$user/cur --quiet
fi
done
fi
# Log results
echo " Training complete. Database stats:" >> $LOG_FILE
sa-learn --dump magic >> $LOG_FILE
echo "" >> $LOG_FILE
Make it executable:
chmod +x /usr/local/bin/sa-learn-weekly.sh
4.2 Schedule Automatic Training
Add to crontab (runs every Sunday at 3 AM):
# Edit crontab
crontab -e
# Add this line:
0 3 * * 0 /usr/local/bin/sa-learn-weekly.sh
4.3 Train Existing Mail (Optional)
If you have existing mail folders, train them immediately:
# For Plesk servers
for spamdir in $(find /var/qmail/mailnames -path "*/.Spam/cur" -type d 2>/dev/null); do
echo "Training spam from: $spamdir"
sa-learn --spam "$spamdir" --progress
done
for inbox in $(find /var/qmail/mailnames -path "*/cur" -type d 2>/dev/null | grep -v "/\\.Spam/"); do
echo "Training ham from: $inbox"
sa-learn --ham "$inbox" --progress
done
Part 5: Advanced Configuration
5.1 Adjust Spam Sensitivity Edit /etc/mail/spamassassin/local.cf:
# More aggressive (lower score)
required_score 3.5
# Less aggressive (higher score)
required_score 7.5
# Bayesian thresholds
bayes_auto_learn_threshold_spam 6.0
bayes_auto_learn_threshold_ham 0.1
5.2 Add Custom Blocklists
# Add to local.cf
blacklist_from *@*.xyz
blacklist_from *@*.top
blacklist_from *@*.bid
blacklist_from *@*.work
blacklist_from *@*.date
blacklist_from *@*.win
5.3 Create User-Specific Whitelists
# For a specific user
mkdir -p /home/user/.spamassassin
echo "whitelist_from trusted@domain.com" >> /home/user/.spamassassin/user_prefs
Part 6: Monitoring and Maintenance
6.1 Check Bayesian Database Status
# View current statistics
sa-learn --dump magic
Expected output:
0.000 0 3103 0 non-token data: nham 0.000 0 456 0 non-token data: nspam 0.000 0 207642 0 non-token data: ntokens
6.2 Monitor Spam in Real-Time
# Watch mail logs
tail -f /var/log/mail.log | grep -E "spamd: result"
# Check spam scores
grep "spamd: result" /var/log/mail.log | tail -20
6.3 Database Maintenance
Periodically expire old tokens:
# Manually expire old data
sa-learn --force-expire
# Check after expiry
sa-learn --dump magic
6.4 Create Monitoring Script
nano /usr/local/bin/check-bayes.sh
#!/bin/bash
# Check Bayes health
STATS=$(sa-learn --dump magic)
SPAM=$(echo "$STATS" | grep nspam | awk '{print $5}')
HAM=$(echo "$STATS" | grep nham | awk '{print $5}')
TOKENS=$(echo "$STATS" | grep ntokens | awk '{print $5}')
echo "Bayes Status:"
echo " Ham: $HAM messages"
echo " Spam: $SPAM messages"
echo " Tokens: $TOKENS"
if [ "$SPAM" -lt 200 ]; then
echo "⚠️ Warning: Only $SPAM spam messages learned (need 200+)"
fi
if [ "$HAM" -lt 200 ]; then
echo "⚠️ Warning: Only $HAM ham messages learned (need 200+)"
fi
Make it executable:
chmod +x /usr/local/bin/check-bayes.sh
Part 7: Testing Your Setup
7.1 Test Spam Detection
# Create a test spam message
echo "XJS*C4JDBQADN1.NSBN3*2IDNEN*GTUBE-STANDARD-ANTI-UBE-TEST-EMAIL*C.34X" | spamc
You should see a high spam score.
7.2 Test Ham Detection
# Create a test ham message
echo "Dear colleague, please find the quarterly report attached." | spamc
Should return a low score.
Troubleshooting Common Issues and Solutions
Issue: Bayes database not created
# Create manually
mkdir -p /root/.spamassassin
sa-learn --sync
Issue: Permission denied
# Fix permissions
chown -R spamd:spamd /root/.spamassassin
chmod 750 /root/.spamassassin
Issue: SpamAssassin not starting
# Check logs
journalctl -u spamassassin -f
# Test configuration
spamassassin --lint
Issue: No spam being detected
# Check required score
grep required_score /etc/mail/spamassassin/local.cf
# Lower it if too high
sed -i 's/required_score .*/required_score 3.5/' /etc/mail/spamassassin/local.cf
systemctl restart spamassassin
Best Practices Summary
| Practice | Recommendation |
|---|---|
| Initial training | Start with at least 200 ham and 200 spam |
| Ongoing training | Weekly automatic training |
| Sensitivity | Start with 5.0, adjust based on results |
| Monitoring | Check stats monthly |
| Backup | Backup /root/.spamassassin regularly |
| Updates | Keep SpamAssassin updated |
Conclusion
Bayesian filtering is one of the most effective ways to combat spam. Unlike static rules, it adapts to your specific email patterns and improves over time. With this setup, your server will:
✅ Learn from every email it processes
✅ Improve accuracy over time
✅ Reduce false positives
✅ Catch more spam with fewer resources
✅ Require minimal maintenance
The initial training period takes a few weeks, but once your database reaches 1,000+ spam and ham messages, you’ll see excellent results. Combined with DNS blocklists and regular updates, this provides enterprise-grade spam protection for your servers.
Remember: The key to effective Bayesian filtering is consistent training. Set up the cron job, let it run, and watch your spam detection improve month after month.