July 09, 2003
Foreign language spam - it's a problem

New milestone in my Internet life -- I have just received my very first piece of spam in Spanish.. After the initial chuckle and lame jokes shared with various buddies I realized the real problem.

Spam assassin completely missed it..

X-Spam-Status: No, hits=0.0 required=5.0

Nada! Not even half a point!

Why?

It's a plain text email without obviously forged headers and spam assassin relies heavily on commonly used phrases..

This is going to be a nightmare if this first e-mail indicates that I've gotten on some foreign-based spam lists... Unless of course foreign-language support is built into spamassassin.. Once again, we rely too heavily on the whole world speaking English.


Oy

Posted July 09, 2003 10:54 AM in Spam sucks
TrackBack URL for this entry: http://www.unix-girl.com/mt/mt-tb.cgi/782
Comments
On July 9, 2003 11:44 AM Phil Ringnalda added:

Score one for Bayesian filtering: I get a bit of Portuguese spam, and a ton in Chinese/Japanese/Korean, and since I never get non-spam in any of those languages it only takes one to teach POPFile that they are all spam.

#
On July 9, 2003 11:48 AM Steve Friedl added:

Lo siento mucho

#
On July 9, 2003 01:09 PM Arcterex added:

Yup, agree with Phil on this.... bogofilter has been filtering out non-english spam for a while now. I have a nice little procmail rule to drop unreadable (read: chinese/japanese/korean spam). Not sure if it's useful to you at all, but here it is anyway:

## Silently drop all completely unreadable mail
:0:
* 1^0 ^\/Subject:.*=\?(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)\?
* 1^0 ^\/Content-Type:.*charset="(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)
spam
####

#
On July 9, 2003 02:33 PM Jeremy C. Wright added:

Agh, used to get so much Korean spam it was crazy. 10-30/day. Awful.

#
On July 10, 2003 03:29 PM Juha Haataja added:

For me the problem is reverse: most of the spam is foreign for me (usually English), but I do receive also legitimate English mail which sometimes is classified as spam.

I'm using the bayesian filter of the Mail program in Mac OS X, which works usually rather well, but can have problems with spam-like legitimate messages in English. Fortunately, in Finnish there is virtually no spam at all, which makes it easy to avoid wrong classifications in my native language.

#
On July 11, 2003 12:43 PM Baba added:

For me, the DNSBL at blackholes.easynet.nl (run by the Netherlands branch of Easynet, who also happens to be my ISP here in the UK) has blocked a huge amount of spam and I am yet to hear of any false hits.

As a very large European ISP, their DNSBL has info on the latest spam mails so very few spam messages slip through.

#
On July 11, 2003 02:41 PM Justin added:

Kasia -- that's what Bayes is for ;) Train it as spam with "sa-learn", and pretty soon any mail in Spanish will be marked as spam.

I get lots of spam in Turkish and that works great for me...

#
On July 16, 2003 07:17 AM Sue added:

Would anyone be kind enough to translate (I speak English only) the technical language above? My problem is the same as many of you--I get up to 300 spanish spams per day, (some of them very naughty). I have spamkiller on my forwarding service and on earthlink, but it still comes in. What can I do as a non-technical person--I can't write code. Thanks!

#
On July 20, 2003 12:06 AM Chris Adams added:

You can list the languages you receive legitimate email in using these two options:

ok_locales en
ok_languages en

This did the trick with the Korean / Thai / Arabic spam I was receiving.

#
On October 15, 2003 03:48 AM barx added:

> Foreign language spam - it's a problem

every %¤#¤& spam i get is in a foreign language - english! anyone got a procmail filter?

barx from norway

#
Trackbacks