There have been a lot of unsolicited emails over the years, but I don't need to tell anybody that the the number has gone through the roof over the past 12 months or so. Like so many other people, I find over half of my email is now unwanted and something has to be done. Fortunately, there is light at the end of the tunnel.
I have been playing with some interesting software called PopFile for about the last six months. It installs on my computer as an email proxy, intercepts all my incoming mail sessions and processes the messages, marking those it considers to be spam. My email client puts the suspects in a separate folder so I can check how well it's all working.
In fact, the software works much better than I imagined, sorting my emails with an accuracy of about 99.6 percent, although this does mean that about one in 200 emails is put in the wrong category. PopFile works by learning the characteristics of spam and wanted email by looking at each word and keeping a record of how often that word appears in each category. It then calculates a probability that a particular message is spam by combining the separate probabilities of the words, using Bayes' Rule, a method that now seems to be known as Bayesian filtering.
For the first few weeks, you have to pick out those messages that are wrongly categorised but, after that, the system makes so few errors that many people just don't bother to deal with them.
The beauty of all this is that the system gets better with time and adapts very quickly to new types of spam. I find that it is very good at picking out emails with viruses even if they appear to be from people I know.
Of course, you might actually want some types of spam. The system handles this by adapting to each user's preferences.
As I've said, a wanted message is occasionally wrongly categorised as spam. That message might be very important so you do have to check through the spam before you delete it. This doesn't take long if everything is in the same place. Some of the programs can be adjusted to make it much more likely that some spam is missed than wanted messages are wrongly filed.
Having been thoroughly fascinated by this technology, I decided to look into it further and found that this type of email classifier has been around for about five years. The original work was done separately by IBM and Microsoft but others have refined the techniques since and the performance has improved steadily. There are now quite a lot of filtering programs that work this way, and many, like PopFile, are open source.
Microsoft is now so confident about the technique that it will be included in the next version of Outlook, Microsoft's email client.
Is this the beginning of the end for spammers? Well, not yet. The only way to defeat spammers is to make spamming uneconomic. This means reducing the response rates to extremely small levels. Spammers will fight back and try to devise ways of getting their message through, however, probably by making spam very similar to normal email, which would cause it to lose much of its appeal. There is hope.












reader comments