DNS Troubleshooting: The One Tool I Use More Than Anything Else

DNS problems are everywhere and they are almost never the first thing you suspect. The website is slow — must be the server. The email is not sending — probably the SMTP relay. The VPN keeps dropping — must be the firewall. And then you spend an hour chasing the wrong problem before someone runs dig and finds the real issue: a DNS record expired and nobody noticed.

I have been doing this long enough that DNS is the first thing I check now, not the last. Here is my approach.

The tool you need

Learn dig. Not nslookup — dig. Nslookup is fine for basic queries, but dig gives you more information in a format that is actually useful for troubleshooting. It is available on every Linux and macOS system. Windows users can install it through WSL or use the Windows version.

The most common command I run: dig +trace example.com. That traces the full resolution path from the root nameservers down to the authoritative server for the domain. If there is a problem somewhere in the chain, +trace will show you exactly where it is.

If a domain is not resolving, the question is always: is the problem at the root, at the TLD, at the authoritative server, or at the recursive resolver? The +trace flag answers that question in one shot.

What to look for

Common issues I see all the time. TTL set too low, causing excessive queries. TTL set too high, meaning changes take forever to propagate. Missing glue records for custom nameservers. SOA serial numbers that did not get incremented after a zone change. DNSSEC misconfiguration.

The DNSSEC one is becoming more common as more domains enable it. If you have DNSSEC enabled and your RRSIG signatures expire, the domain will simply stop resolving for users with validating resolvers. No error message. No warning. Just a domain that works for some people and not others.

I also see a lot of CNAME misconfiguration. A CNAME record cannot coexist with other record types at the same name. That is a hard rule. I have lost count of how many times I have seen a CNAME for www.example.com alongside an MX record for the same name. That is not valid and the resolver will ignore one of them.

Quick checklist

When a user says “the website is down,” before you restart anything, check DNS. Run a dig trace. Check propagation using multiple resolvers. Verify the TTL matches expectations. Check for DNSSEC issues. Nine times out of ten, the problem is not the server. It is the system that tells everyone where the server is.

Leave a Comment