Scenario I encountered recently which has me stumped..
- httpd at max showing many backed-up requests
- Incoming traffic is far below normal and pages load incredibly slowly (2-3min per request) -- understandable since most clients connecting are seeing timeouts and slow page loads -- I'm measuring traffic at eth level, so it's not just apache traffic but all traffic to the machine.
- outgoing traffic is also far below normal
- machine is near idle, far below normal on cpu and memory usage
- Plain text pages load just as slow as mod_perl pages
- Restarting httpd doesn't help.. server goes right up to maxing out on requests
- Cycling the box doesn't help.
- Http servers downstream are zippy and mostly idle, only receiving few requests from the main (the problematic one) server
- Everything else on the machine is fine.. no other issues that can be found.. log files are at their typical size, hard drive has plenty of room
- Firewall is logging absolutely nothing unusual
- No errors in any log files (systems and httpd)
- Scan of access_log shows nothing out of ordinary (other than fewer requests than normal due to slowness)
- The same slowness occurs when loading pages locally on the machine itself.
- Nameserver isn't the issue either.. nslookup to the two servers configured were zippy.
Now after couple hours of tearing my hair out, I shut the site down at which point it was only serving a very small text file with a "technical difficulties" msg for any request. This file was taking just as long to load as the pages when site was active. After 15 minutes or so of shutdown, the problem went away. Turned the site back on.. been on for hours, problem has not returned.
I'm suspecting some sort of httpd attack that threw apache in for some internal loop.. but then again, cpus were nearly idle, so it wasn't processing hard.. just spawning processes at the max and not returning pages.
Anyone have any ideas?
TrackBack URL for this entry: http://www.unix-girl.com/mt/mt-tb.cgi/848
Total stab in the dark: is your MaxKeepAliveRequests set too high?
#An attack where you open up a bunch of conenctions and make no requests (lots of 'em), or where you do an http version of a smurf attack can do this. It would have been interesting to see netstat -an output and a minute or so of strace output from one of your httpd children.
#That's what I didn't do.. strace.. I checked netstat and wasn't unusual.
#lsof may have given you a better clue as to what apache was doing...
it sounds as though apache was behaving exactly as sshd or an SMTP daemon would, if reverse dns was severely broken or slow... do you have apache set to resolve host names from ips? if so that may have been the problem -- especially if your nameserver's from the resolv.conf were fubar'd (not just down... but FUBARd)
deffinately odd, though...
#DNS was not an issue.. checked that. I did run lsof at one point but didn't save the output, hm, should have..
#Failing hard drive can cause those effects.
#Failing network card, too.
#I'll blame this one on Apache, pure and simple.
I have seen Apache do crazy things of this sort way too many times in my lifetime. My favourite (happened three times in a period of about 4-5 months) was when Apache just went on a malloc binge, sucking up everything it could until swap was exhausted (the box has 1GB, swap is 2GB). The 2nd and 3rd times I saw this, I was live on the machine and managed to stop it -- all I did was stop/start the daemon again. Yeah, uh, what the hell. I don't even use mod_perl (only module, besides stock, are mod_watch and mod_php4). ktrace, fstat, and lsof showed absolutely nothing out of the ordinary, other than Apache calling clock-related functions way more often than it should.
I stopped filing bug reports because I still have a security-related Apache bug report that remains open (and unresolved!) since _1997_. I have absolutely no respect for the current Apache team other than Ken Coar. I hope Marc Slemko dies a slow and painful death someday.
There were some infinite loop bugs fixed in 1.3.28 and 2.0.x recently; I have no idea if they're analogous to what you experienced, though. :-) My point is that Apache does crazy shit sometimes, and well, you may have just witnessed it.
#