Yahoo Slurp Spider Drives Forum Server Load Through the Roof!
Not long ago, I made a change with my hosting account at Pair Networks, adding a second server for the sole purpose of handling our databases, while having our primary server responsible for the page requests only. I did this because our server load was consistently high. High server load is bad because that means your server is bogging down, and a bogged down server means slow websites. Website visitors are very intolerant of slow websites, and a recent survey showed that 75% of visitors to a slow e-commerce site will never shop on that site again. If they act like that on an e-commerce site, more than likely they’ll do the same on any other site as well.
Well, the mystery of it is that my server load was still relatively high even after this change in my server setup. Yes, there was an improvement, but the load was still not in the range I would have expected. Well, yesterday morning, I went over to the PCMech Forums as I do every morning to check things out. I noticed that there were over 900 guests on my forum! A couple weeks ago, I made a post here about the robots.txt file and how you can control spiders across your site. I put two and two together and decided to check out the listing of all people on the forums at that time. Sure enough, the Yahoo Slurp Spider was VERY active on my forums, so much so that there were probably at least 700-800 instances of Yahoo on my forums at the same time!
That is TOTALLY ridiculous. Google drives FAR more traffic to PCMech than does Yahoo, and we’re talking many orders of magnitude. And Google doesn’t send 800 instances of their spider to my site and drag my forums to a crawl. So, Yahoo is inundating my server all to send me a relative pittance in traffic. Come on, Yahoo! I like your company. I like your search engines. But, throttle back your damn spiders!
The solution? I changes my robots.txt file to include the following line:
User-agent: Slurp
Crawl-delay: 10
This dictates that Yahoo Slurp can only visit my site once every 10 seconds. You can even up the value if you want, however I do not think it is a good idea to make it TOO hard for Yahoo to spider your site. After all, they are the next major search engine after Google. The result? Whereas this time yesterday morning my servers were operating over 1.0 server load, now it is operating at between 0.10 and 0.20. As I type this, PCMech has 236 guests on the forums. That’s a far cry from 900+, and the server load has decreased considerably.
I think Yahoo needs to address this situation. I know they are trying to out-spider Google and compete with them, but in the process, it seems they are over-working the servers of those of us who provide the content they are trying so hard to spider.
If you enjoyed this article, you might also like...
- Wordpress Borks My Server? Dealing with High Server Load
- Wordpress and Curing High Server Load
- How Does Your Forum Compare?
- Yahoo Strokes Google, Puts on Show for Microsoft
- Digg, The Server Killer And How To Rescue Wordpress From It
-
Damon
I'm David Risley. I've been making my living as a blogger for over a decade. Blogging is my business and how I support my family. With this blog, I'm just gettin' REAL and telling you how this business works.








