E-mail Disintegrating

Tagged with Regular Expressions

We need to talk about e-mail. E-mail isn't the topic we planned for this month, and its administration isn't what we generally regard as a "Regular Expressions" subject.

Essentially all the applications we develop include an e-mail component, though, and there are vital aspects of e-mail management no one else seems to cover; we finally realized this means it's a conversation we need to start.

Bad, and Getting Worse--But Ahead of Everything Else

E-mail is old-fashioned, it's stodgy, it's probably lost value in the last decade, and there are few innovations that seem poised to improve it significantly--yet e-mail started with such an enormous lead as the original "killer app" that it remains, in 2009, indispensable. Essentially all of us count on it. Even with the competition SMS, IM, "Web 2.0 messaging," and more esoteric channels provide, e-mail does the heavy hauling for all sorts of "mission-critical" situations.

This remains true despite severe liabilities. It's well-known, for example, that at least nine out of every ten messages are spam.

As annoying and disgusting as the noise of spam is, though, there are at least two other technical aspects of e-mail that we find even thornier. The noise of spam admits palliatives, if not cures: scores of vendors offer products and services that reduce spam. It's the countermeasures and consequences of spam that have grabbed our attention lately.

Belligerence in Administration

We now work as hard fighting blacklists as the spam they're supposed to combat. In principle, central experts identify and report on--"blacklist"--spam sources, so that all of us can cheaply learn which mail servers to avoid. Everyone recognizes there'll be occasional "false positives," with innocent servers mistakenly characterized as spam sources. All blacklists have procedures for reporting and resolving such mistakes.

As plausible as this arrangement sounds, we're finding that its reality is nearly unlivable. Essentially all our servers are on some blacklist, always. Fresh servers we bring on-line for the first time are occasionally blacklisted, because they're re-using an IP address that belonged to a spammer in the past, sometimes many years ago. Others are born blacklisted, because they're co-located at a data center where other customers have let hosts be taken over.

Even when we escape these hazards, we find ourselves blacklisted shortly after we first send out e-mail. No one thinks our e-mail is particularly commercial, or unsolicited, and certainly not both at the same time; however, as best we can tell, once one of our return addresses shows up in the address book of a correspondent whose machine was hijacked in the past, our server is flagged for exclusion.

Confusion and Ignorance

We explain this with an excess of conditionals--"might be," "we think"--because it's so hard to be certain. Quite a bit of the security implemented around spam involves obscurity: vendors don't and won't document their precise algorithms for judgment, because they reasonably fear the bad guys will abuse the information. Even when we manage to make contact with a human at the blacklist maintainers, they often seem sincerely unable to explain how we got on, or how to stay off, their lists.

It's surely not because we're abusing e-mail. We manage a few mass mailings every year for customers, but our mass mailings have almost nothing in common with the spammers: we send out information on specific computing languages, certain scientific research opportunities, or similarly obscure and "unmonetized" subjects, to subscribers who've asked for the information. In the abstract, it's hard to confuse any of our activities with spamming.

We certainly show up on blacklists, though. And, from what we can tell, the situation is getting worse. Major ISPs, including Comcast, refuse e-mail from some of our servers without explanation or recourse. As best we can tell, they seem to have decided that it's simply not worth their bother to blacklist accurately, and they appear to have made permanent exclusion a policy for small operators contaminated even once by spamming suspicion.

The rules aren't the same for the major players. Industry heavyweights like Gmail, AT&T, Comcast, MSN, and so on, appear to have worked things out so that they accept e-mail from each other, even though we can document spam originating within each of their services. It's easy to anticipate a near future in which e-mail must pass through one of the major players to have a chance of successful reception.

We suspect that part of the attraction of Facebook and similar services is this preferential treatment of their transmissions. Internet service providers make a point of "clearing the tracks" for mass-market messages to and from Facebook, while they throttle e-mail traffic from small domains like the ones we manage.

The result: it's simply impossible to manage e-mail for a single organization or interest group as we would have a decade ago. It's not enough to configure a server correctly, and leave it running for years at a time; now, it takes weekly intervention to minimize blacklisting.

A dual consequence of the market shift described above has also arisen, with the result that other small domains have become harder to trust. We've seen that e-mail gating now is often on a basis of market share rather than any traditional technical criterion: e-mail from AT&T gets through, and ours doesn't, even though we follow the pertinent standards at least as well as AT&T.

With this pattern, we're finding that other small administrators are giving up on standards conformance. Forward-confirmed reverse DNS (FCrDNS) look-ups provide an example: we typically configure our mail transfer agents (MTAs) to confirm FCrDNS as part of spam detection for incoming e-mail traffic. This amounts to a well-documented and reasonable requirement that a server appears to be located within the organization it claims to represent. If an MTA in Upper Spamola tries to send one of our customers e-mail claiming to be from Bank of America, our default action is to block the e-mail. While this is a weak form of authentication, it significantly lightens the load on our MTAs.

It's not as safe a check as we'd like, though. FCrDNS and related techniques depend on correct configuration of peer MTAs. When we find cases where DNS for a university or small business is misconfigured, our policy is to be a good 'Net citizen and let them know how to correct the error. What we've observed in recent years is administrators indifferent to Internet standards. Their position: as long as e-mail gets through to SBC, AOL, and so on, there's nothing to change. Published standards don't matter apart from their enforcement by the market leaders.

Summary

The e-mail landscape is discouraging. Spam is a drag on all our productivity, and, apart from the distraction of spam, we see e-mail practices increasingly determined by market dominance rather than engineering standards and virtues. E-mail remains indispensable to us, and we'll continue to manage at least some of our own servers for the benefits of internal automation, and integration with the vertical-market applications we develop. If you have ideas about how to deal with the problems this column outlines, though, or just your own problems and surprises in managing e-mail, let us know.

The next installment of "Regular Expressions" will return to our usual focus on working code in high-level languages that you can use in your own programs.

Kathryn and Cameron run their own consultancy, Phaseit, Inc., specializing in high-reliability and high-performance applications managed by high-level languages. They write about scripting languages and related topics in their "Regular Expressions" columns. Their own experience with e-mail goes back thirty years.

 

0
combat spam
Submitted by dhayes501 on Fri, 02/27/2009 - 22:09.

Ok, everybody knows that SPAM is a problem, but it can be stopped. It'll take a slight modification of how email works, but this could be done fairly easily while keeping backwards compatibility. Here's how.

Whenever an email is sent, the smtp server that sends the email first sends a request to the pop3 server that receives mail for the sending address (this will require login credentials). The pop3 server of the sender generates a unique email verification code and returns that to the smtp server to include in the email that is sent out. The sender's pop3 server saves this code for later verification. This requires the sender of an email to have ownership of the "from address" that the email is coming from. When the receiving pop3 email server receives the email, it sends the email verification code to the pop3 server of the sending email address for verification. If it passes, that guarantees the email came from who it said its from.

This will require that a pop3 account is setup to send email OR that the pop3 server is aware of addresses that are authorized to send email and can create and save verification codes for those emails. This sounds relatively simple to me.

This would also work on mailing lists b/c the pop3 server could authenticate the first X number of emails, etc.

Optionally, on failure, the message can be returned to sender with a request for them to update their email software to support the new authentication system. This way the system would also be able to retain backwards compatibility by ignoring a failed test until the world was mostly updated.

The only real cost is the extra packets sent to verify the email and the added complexity on the pop3 server to store authentication id's and process requests. But this should be much less than the increased network traffic and system resources used up with spam.

Thoughts?