Sunday, July 17, 2005

Greylisting, Some Good, Some Bad

Greylisting has had much said about it over the past year or two and it seems to be a valid concept at reducing spam. This is my experience implementing it using Qmail, SPP, Debian and Ubuntu. The idea is that since most spammers do not use proper mail queues to handle their SMTP transfers, they will not ever retry a delivery if you at first refuse it. A greylisting daemon therefore examines the originating IP address, sender and recipient and for any unique combination, it declines the first attempted delivery requiring the sender to retry.

A year ago I passed on implementing greylisting because since I use qmail it would require me to recompile my server with SPP support. However, when a friend I trust recommended it as having reduced his spam load, I felt it was worth the time to build a new qmail package and manually apply all the patch sets I use including the qmail-spp patch (which given my over-patched qmail install, I had to deal with multiple failed hunks).

The mail server cluster I manage handles 89,000 emails each week which can be broken down into roughly 3 categories. 75,000 completely bogus spam sent to never-existent email addresses, 7,500 spam and 7,500 legitimate. The 75,000 bogus messages are all sent to the same collection of 3,000 addresses which never existed but for some reason appear in some spammer's database. My guess is that a hard-up spammer simply generated bogus addresses to put on one of those email address collections for sale to other spammers. Since this list of 3,000 addresses never changes and was discovered by looking at my log of undeliverable bounces, I long ago established a list of invalid RCPT-TOs and have for about 8 months repelled those weekly 75K emails.



Once I configured and installed Peter Conrad's greylisting-spp, I saw a significant reduction in my bad-RCPT-TOs. The majority of them were never retried. Of all incoming SMTP connections, a reliable 79% of them were attempts to deliver email to that list of bogus addresses, having greylisting, that number has dropped to 20%. I suspect, however that this number is overly large due to the conservative failure mode of greylisting-spp; i.e. if anything happens, let the email through. When using the FILE database, it does not retry or block when acquiring the file lock and therefore if two emails arrive simultaneously one isn't subject to greylisting restrictions.

In my deployment, this failure mode allowed 3749 emails to bypass the greylist check in a 24hr period. For the same period, 17452 initial rejections were made of incoming first-time attempts. This was in a mature environment where most of the 'regular' correspondents had already been seeded into the greylist database and so those 17K rejections represent almost entirely new IP/FROM/TO combos. Therefore I modified my copy of greylisting-spp to do a blocking wait on the lock.

The vast majority of the emails that greylisting protected me from were those that the bad RCPT-TO list was already blocking, however it also blocked about 90% of the spam that my ordinary spam filters had to deal with. Unfortunately there was no reduction in the amount of spam that got through my filters. This means that the spammers using broken mail queues are those with the spam that is entirely and accurately identified by existing spam filtering solutions. The good news, however, is that the reduction by 90% of the spam in my 'spam' folder means that I am more likely to go poking around in there for miscategorized spam. That's a net plus.

All in all, I'm very happy with greylisting as a solution to reduce the illegitimate mail volume on my servers while having a very limitied impact on legitemate traffic. Namely the delay by a few minutes of first-time correspondents. Unless you're like me and have your mail client set to check mail every minute, you're unlikely to even notice the delay in practice. In fact the only plausable inconvenience someone suggested it might cause would be when someone you've never corresponded with tells you over the phone they are sending you something and you want it right away. Certainly a boundary condition.

I did end up making a few more minor modifications to the greylisting-spp code. Changes which I've made available in the form of a patch against the 0.2 source code. These changes allow me to more finely tune the behavior of greylisting-spp with additional environment variables. My changes are as follows:

  • Modified the default window for confirmed deliveries to 14 days instead of 3

  • Added environment parameter GL_IGNORE_SENDER which when set to 1 causes the FROM field to not be considered on retries

  • Added environment parameter GL_IGNORE_RECIPIENT which when set to 1 causes the recipient to be ignored when considering retries

  • Added environment parameter GL_IGNORE_REMOTEIP which... yeah you get it

  • Added environment parameter GL_MASK_IP which you can set to 1 to mask off the last byte of the IP address of the remote host; this makes unknown sending clusters less likely to be caught up.

  • Caused an X-Greylist-Accepted: header to be added to the email containing the tuple used to approve the message.

  • Added logging of rejections

  • Modified the FILE database interface to block while waiting for the file lock

  • Stripped the full path from the program name in log messages (and all output for that matter)

1 comment:

  1. Thanks for the patch, especially the file block thing. Was looking through the source my self and found the problem but as im no c hacker I didnt even try to fix it, instead just googled for a solution and found yours..

    Hope you sent the patch to Peter Conrad, the greylisting-spp author?

    ReplyDelete