Skip to main content
IN
Back to blog
TroubleshootingJanuary 8, 202514 min read

The Email Engineer's Debugging Playbook

Systematic approaches to diagnosing bounce codes, blacklist issues, slow delivery, and reputation drops. Tools, logs, and monitoring strategies that work.

DebuggingMonitoringBest Practices

When email delivery breaks, panic is not a strategy. Here is the systematic playbook I use to diagnose and fix issues in production.

Step 1: Read the Bounce Code

SMTP bounce codes tell the whole story. 5.1.1 means the mailbox does not exist. 5.7.1 usually means a policy rejection — check SPF, DKIM, and DMARC. 4.4.7 indicates a timeout, often caused by a firewall or greylisting. Keep a reference sheet of common codes handy.

Step 2: Check Your IP Reputation

Use MXToolbox, SenderScore, and Google Postmaster Tools. If your IP is on a blacklist (Spamhaus, Barracuda, Sorbs), identify the cause before requesting delisting. Most blacklists require proof of remediation, not just an apology.

Step 3: Inspect the Headers

Use Message Header Analyzer to trace the email's path. Look for authentication results (`Authentication-Results`), DMARC alignment, and any unexpected hops. A missing DKIM signature or a failed SPF check will be flagged here.

Step 4: Review Logs

Your mail server logs are the source of truth. In Postfix, check `/var/log/mail.log`. Look for `status=bounced`, `status=deferred`, and connection timeouts. Correlate timestamps with your application logs to find the root cause.

Step 5: Test with External Tools

Before blaming the ISP, test your configuration. Use Mail Tester for a quick score, GlockApps for inbox placement testing, and Telnet to manually verify SMTP handshakes. If Telnet to port 25 times out, you have a network-level issue.

Step 6: Monitor Continuously

Set up alerts for queue depth, bounce rate, and complaint rate. We use Prometheus + Alertmanager to page the on-call engineer when bounce rates exceed 3% or queue depth stays above 10,000 for more than 5 minutes.

Email debugging is forensic work. The answer is almost always in the logs — you just need to know where to look.

IN

Written by Irfan Naseem

Senior Software Engineer at Netcode. Building email infrastructure and scalable systems.