In my capacity as an administrator, I’m often asked — and often want to know myself — how much of the mail that comes in is spam, and how much is legitimate mail. Except usually the question isn’t phrased like that (the usual version is something like, “I receive literally HUNDREDS of SPAM messages daily! Can’t you do something about all this SPAM?”), and invariably whenever I look at an individual user’s account the claim of a spam overload is greatly overstated.
We use a handy log summarizer but the way Postfix is set up it double-counts every delivered message and thus isn’t quite as helpful as I’d like. So today I modified it to produce more useful data, including a count of actual incoming mail (determined by what passes through the spam filter). In the graph, the purple line is rejected spam, the blue line is the total of all deliveries (and thus the sum of the lines below it), the green line is mail determined not to be spam, the yellow line is spam (marked [SPAM] before it reaches users, and delivered to a spam mailbox), and the red line (hovering just above zero) is for mail so large the spam filter skipped it. Some of that is probably spam, but some of it is legitimate mail (scanned documents get emailed back and forth quite a bit).
As this graph shows, the amount of attempted — and rejected — spam delivery on any given day dwarfs the amount of legitimate inbound mail (even counting as legitimate the stuff that gets tagged as [SPAM] in a later step). According to the unmodified script, messages rejected outright based on RBLs or server misconfiguration (on the other end) accounted for 56% of all mail (including outbound mail from the server and double-counted messages). The modification to the script reveals that number to be 83% of all attempted deliveries to the server (averaged over ten weeks of log data). Yesterday’s number is 92%, reflecting the drop in legitimate email on Sunday. Spam drops on Sundays too, but not as much as real mail does.
Interesting tidbit visible in the chart: on weekends, while the total number of delivery attempts drops, the rate of delivered spam stays roughly constant. This indicates that a few spam messages squeak through per user, every day of the week, and this is somewhat inconsistent with the other trends visible in the chart (note how the green and yellow lines roughly intersect over several weekends, then diverge during the week).
