There Is No Cat

The alternative to flowers!

Sunday, May 14, 2006

Fighting comment spam

Comment spammers have been attacking There Is No Cat for the past three weeks or so. It's been kind of fun doing battle with them, although I have to say, I'm getting kind of tired of it.

There Is No Cat runs a content management system of my own creation. One of the benefits of this is that it's relatively immune to comment spam. I would occasionally get some manual drive-by spams, but nothing too bad. Almost nobody is going to bother to take the time to custom code a spam system to hit a single system run by a host with only fair-to-middling Google whuffie. Almost nobody.

The first run at my server three weeks ago was clearly a test run. I received 110 comments in the space of about three hours on a weekend with no links and nonsensical text. It was clear someone was preparing for something. That was what made me think some custom coding was required. I caught the spam a few hours after the initial attack ended. With so many spam comments, it was just easier for me to go into MySQL and manually nuke all the comments at once. Before I did that, I saved a copy of the database and loaded it on my computer at home so I could analyze the attack and where it came from at my leisure.

A few days later, the real comment spam started showing up. With each attack, I would block the class C network from which it came, which slowed things down. But I also started noting characteristics of the spam, such as a particular misspelled word, or a method of trying to include URLs. One advantage of having written my own system is that it was relatively simple for me to go into the code and add some filters for these characteristics.

The attacks started coming more often in the coming days, and with them some new characteristics. I added some more filtering, and added a logging capability that noted the IP address and which filter triggered the spam blocking code. At this point, I was catching about 98% of the spam. I could have caught 100%, but one of the phrases I would have had to filter on was one I thought had too high a probability of filtering out legitimate comments.

At this point, I looked at my server logs to see if I could discern any patterns over the previous few weeks. Inevitably, just before an attack on a particular page, that page would be accessed with a GET command from the IP address 72.232.92.142, which resolves to 142.92.232.72.reversedns.resolve.ru. I found one instance of this IP address being mentioned on a Polish bulletin board as a source of spam. Okay, so it looks like I'm dealing with a Russian spammer. Searching the ARIN Whois database, I discovered that the net block for this IP address belongs to a company in St. Petersburg:

CustName: Internet Technologies Ltd
Address: Rustavele 48/1 of. 42
Address: IP Management Department
City: Saint Petersburg
StateProv: Saint Petersburg
PostalCode: 199000
Country: RU

In fact, said company owns more than one segment of IP addresses.

I added the following lines to my .htaccess file to prevent them from accessing my site from their spam seeding host at any of their possible IP addresses:

Deny from 72.232.92 # Russian spammer
Deny from 72.232.93 # Russian spammer
Deny from 72.36.222 # Russian spammer
Deny from 72.36.223 # Russian spammer
Deny from 72.36.244 # Russian spammer
Deny from 72.36.245 # Russian spammer

This is actually a little broader than it needs to be; not all of the subnets this company owns are full Class C networks. But I didn't feel like being charitable.

This stopped the spam seeding accesses, but the actual spam attacks still came (although my countermeasures were still catching 98% of them).

After three weeks of logging the attacks, I had about 800 accesses documented. I wasn't sure if the spammer was spoofing IP addresses, in which case the IP addresses attacking me would likely be completely random, or if he was operating a bot net of compromised hosts, in which case the same limited number of IP addresses would likely show up over and over.

Well, they weren't all that limited, but it appeared that most of the IP addresses were used multiple times. There were a few with only single accesses, but most had between three and ten attack instances logged by my filters. And it was clear that in most cases there were only one or two machines on a subnet attacking the site. Probably a bot net, then. In any case, a limited set of IP addresses was being used. So I picked out single lines for each machine and wound up adding 249 individual hosts to my .htaccess list of hosts denied access to the site. I did that about 24 hours ago. Since then, fingers crossed, no spam, and no additions to my log file of blocked attempts. You're welcome to look at the list of hosts; if this same scumbag is attacking you, maybe you'll find it useful.

I hope this is the end of this. Why someone would bother to attack a system with one host is beyond me. It would seem to me to be more worthwhile from the perspective of the spammer to attack systems like WordPress or Moveable Type. Of course, it possible they're using a system that just parses any random comments form and attacks that way without having any special knowledge of how the system is set up by default, in which case my use of a unique CMS wouldn't afford me any extra protection, but the test run made me think that maybe that wasn't the case here. Fortunately, after almost 20 years of using computers online, I not only know my way around networks, but also have an in house network forensics expert to bounce ideas off of....

Tags:

Posted at 10:25 PM

Comments

Note: I’m tired of clearing the spam from my comments, so comments are no longer accepted.

My blog isn't of my own making, but it is open source. It is a php blog called Simplog. It is not the best or worst, and I estmate maybe 20-100 people use it at most. Maybe your spammer is guessing your CMS has some other users somewhere.

I have tweaked Simplog in my own unique way to block spammers. I offer people making comments a spot to put in their e-mail address, but if they do, they get a message telling them not to, and their comment is shipped to /dev/null. This also forces anonymity, but that is the way I run my blog anyway. Spambots don't get this and keep trying to fill in an e-mail address, but never get through. It is a very simple and effective modification, until they learn better. By simply going against expectations, and the dominant web paradigm, I have cut out all spambots, as far as I can tell.

I also had a problem with spam trackbacks. I had thousands of trackbacks on my old posts on my blog. Since no one I know ever used trackbacks, I just disabled them and deleted the links from my MySQL database. Trackbacks have never really seemed to catch on in a big way.

Posted by lilbro at 8:17 AM, May 15, 2006 [Link]

I didn't get too many trackback spams; I turned it off for older posts. But I noticed about twenty this morning for the first time in a while, and my tolerance for spam is particularly low right now. I never got many trackbacks, but probably more than you. In any case, not enough to make keeping it around worthwhile. I've disabled it. One less potential problem to worry about.

Goddamned spammers.

Posted by ralph at 6:58 AM, May 24, 2006 [Link]

Trackbacks

This site is copyright © 2002-2024, Ralph Brandi.

What do you mean there is no cat?

"You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat."

- Albert Einstein, explaining radio


There used to be a cat

[ photo of Mischief, a black and white cat ]

Mischief, 1988 - December 20, 2003

[ photo of Sylvester, a black and white cat ]

Sylvester (the Dorito Fiend), who died at Thanksgiving, 2000.


Stylesheets


This site is powered by Missouri. Show me!

Valid XHTML 1.0!

Valid CSS!

XML RSS feed

Read Me via Atom

new host

Me!

Home Page
Resume
Married
Photographs
Flickr Photostream
Instagram Archive
Twitter Archive

last.fm

There Is No Cat is a photo Ralph Brandi joint.


Archives

Search



Family Blogs

Geneablogy
Jersey Girl Dance
Awakening
DullBlog
Mime Is Money

Blogs I Read

2020 Hindsight
AccordionGuy
Adactio
Allied
Apartment Therapy
Assorted Nonsense
Backup Brain
Burningbird
Chocolate and Vodka
Creative Tech Writer
Critical Distance
Daily Kos
Dan Misener likes the radio
Daring Fireball
Design Your Life
design*sponge
Doc Searls
Edith Frost
Elegant Hack
Emergency Weblog
Empty Bottle
Five Acres with a View
Flashes of Panic
Future of Radio
Groundhog Day
Hello Mary Lu
iheni
Inessential
Interllectual
Jeffrey Zeldman Presents
Jersey Beat
John Gushue ... Dot Dot Dot
john peel every day
JOHO The Blog
Kathryn Cramer
Kimberly Blessing
La Emisora de la Revolucion
Lacunae
Loobylu
mamamusings
Medley
mr. nice guy
MyDD
Orcinus
oz: the blog of glenda sims
Pinkie Style
Pinkie Style Photos
Pop Culture Junk Mail
Seaweed Chronicles
Shortwave Music
Slipstream
Talking Points Memo
The Unheard Word
Tom Sundstrom - trsc.com
Typographica
Unadorned
Vantan.org
WFMU's Beware of the Blog