« Blogs as a tool to impress co-workers | Main | We »

Foreign language spam - it's a problem

New milestone in my Internet life -- I have just received my very first piece of spam in Spanish.. After the initial chuckle and lame jokes shared with various buddies I realized the real problem.

Spam assassin completely missed it..

X-Spam-Status: No, hits=0.0 required=5.0

Nada! Not even half a point!

Why?

It's a plain text email without obviously forged headers and spam assassin relies heavily on commonly used phrases..

This is going to be a nightmare if this first e-mail indicates that I've gotten on some foreign-based spam lists... Unless of course foreign-language support is built into spamassassin.. Once again, we rely too heavily on the whole world speaking English.


Oy

Comments

Score one for Bayesian filtering: I get a bit of Portuguese spam, and a ton in Chinese/Japanese/Korean, and since I never get non-spam in any of those languages it only takes one to teach POPFile that they are all spam.

Lo siento mucho

Yup, agree with Phil on this.... bogofilter has been filtering out non-english spam for a while now. I have a nice little procmail rule to drop unreadable (read: chinese/japanese/korean spam). Not sure if it's useful to you at all, but here it is anyway:

## Silently drop all completely unreadable mail
:0:
* 1^0 ^\/Subject:.*=\?(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)\?
* 1^0 ^\/Content-Type:.*charset="(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)
spam
####

Agh, used to get so much Korean spam it was crazy. 10-30/day. Awful.

For me the problem is reverse: most of the spam is foreign for me (usually English), but I do receive also legitimate English mail which sometimes is classified as spam.

I'm using the bayesian filter of the Mail program in Mac OS X, which works usually rather well, but can have problems with spam-like legitimate messages in English. Fortunately, in Finnish there is virtually no spam at all, which makes it easy to avoid wrong classifications in my native language.

For me, the DNSBL at blackholes.easynet.nl (run by the Netherlands branch of Easynet, who also happens to be my ISP here in the UK) has blocked a huge amount of spam and I am yet to hear of any false hits.

As a very large European ISP, their DNSBL has info on the latest spam mails so very few spam messages slip through.

Kasia -- that's what Bayes is for ;) Train it as spam with "sa-learn", and pretty soon any mail in Spanish will be marked as spam.

I get lots of spam in Turkish and that works great for me...

Would anyone be kind enough to translate (I speak English only) the technical language above? My problem is the same as many of you--I get up to 300 spanish spams per day, (some of them very naughty). I have spamkiller on my forwarding service and on earthlink, but it still comes in. What can I do as a non-technical person--I can't write code. Thanks!

You can list the languages you receive legitimate email in using these two options:

ok_locales en
ok_languages en

This did the trick with the Korean / Thai / Arabic spam I was receiving.

> Foreign language spam - it's a problem

every %¤#¤& spam i get is in a foreign language - english! anyone got a procmail filter?

barx from norway