Foreign language spam - it's a problem
New milestone in my Internet life -- I have just received my very first piece of spam in Spanish.. After the initial chuckle and lame jokes shared with various buddies I realized the real problem.
Spam assassin completely missed it..
X-Spam-Status: No, hits=0.0 required=5.0
Nada! Not even half a point!
Why?
It's a plain text email without obviously forged headers and spam assassin relies heavily on commonly used phrases..
This is going to be a nightmare if this first e-mail indicates that I've gotten on some foreign-based spam lists... Unless of course foreign-language support is built into spamassassin.. Once again, we rely too heavily on the whole world speaking English.
Oy
Comments
Score one for Bayesian filtering: I get a bit of Portuguese spam, and a ton in Chinese/Japanese/Korean, and since I never get non-spam in any of those languages it only takes one to teach POPFile that they are all spam.
Posted by: Phil Ringnalda | July 9, 2003 11:44 AM
Lo siento mucho
Posted by: Steve Friedl | July 9, 2003 11:48 AM
Yup, agree with Phil on this.... bogofilter has been filtering out non-english spam for a while now. I have a nice little procmail rule to drop unreadable (read: chinese/japanese/korean spam). Not sure if it's useful to you at all, but here it is anyway:
## Silently drop all completely unreadable mail
:0:
* 1^0 ^\/Subject:.*=\?(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)\?
* 1^0 ^\/Content-Type:.*charset="(.*big5|iso-2022-jp|ISO-2022-KR|euc-kr|gb2312|ks_c_5601-1987|windows-1251|windows-1256)
spam
####
Posted by: Arcterex | July 9, 2003 01:09 PM
Agh, used to get so much Korean spam it was crazy. 10-30/day. Awful.
Posted by: Jeremy C. Wright | July 9, 2003 02:33 PM
For me the problem is reverse: most of the spam is foreign for me (usually English), but I do receive also legitimate English mail which sometimes is classified as spam.
I'm using the bayesian filter of the Mail program in Mac OS X, which works usually rather well, but can have problems with spam-like legitimate messages in English. Fortunately, in Finnish there is virtually no spam at all, which makes it easy to avoid wrong classifications in my native language.
Posted by: Juha Haataja | July 10, 2003 03:29 PM
For me, the DNSBL at blackholes.easynet.nl (run by the Netherlands branch of Easynet, who also happens to be my ISP here in the UK) has blocked a huge amount of spam and I am yet to hear of any false hits.
As a very large European ISP, their DNSBL has info on the latest spam mails so very few spam messages slip through.
Posted by: Baba | July 11, 2003 12:43 PM
Kasia -- that's what Bayes is for ;) Train it as spam with "sa-learn", and pretty soon any mail in Spanish will be marked as spam.
I get lots of spam in Turkish and that works great for me...
Posted by: Justin | July 11, 2003 02:41 PM
Would anyone be kind enough to translate (I speak English only) the technical language above? My problem is the same as many of you--I get up to 300 spanish spams per day, (some of them very naughty). I have spamkiller on my forwarding service and on earthlink, but it still comes in. What can I do as a non-technical person--I can't write code. Thanks!
Posted by: Sue | July 16, 2003 07:17 AM
You can list the languages you receive legitimate email in using these two options:
ok_locales en
ok_languages en
This did the trick with the Korean / Thai / Arabic spam I was receiving.
Posted by: Chris Adams | July 20, 2003 12:06 AM
> Foreign language spam - it's a problem
every %¤#¤& spam i get is in a foreign language - english! anyone got a procmail filter?
barx from norway
Posted by: barx | October 15, 2003 03:48 AM