Robot using too much bandwidth

  • Hyarion
  • Born
  • Born
  • Hyarion
  • Posts: 3

Post 3+ Months Ago

The company I work for has designed a webpage for a radio station in our city. Everything has been going well but our internet provider has now contacted us to say that the website is exceeding the bandwidth it's allowed. I checked the stats (reportmagic) and it's showing a huge increase:

July - Oct: +- 60Mb / month
November: 1.1Gb
December 4.4Gb
January (1st - 6th): 700Mb

Looking at the domains it shows that 68% of all traffic is coming from .jp domain - specifically proxy3a.nagaokaut.ac.jp. Looking at the stats from awstats it shows that just in the last 6 days "Unknown robot (identified by 'crawl')" has requested over 10,000 pages. I'm assuming this is a robot coming from that japanese site (a university in japan).

Is there a way, maybe using robots.txt, to block this unknown robot (if it uses robots.txt) or have you got any other suggestions?

- I should add that the site is using phpnuke
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • GimmeMyDomains
  • Student
  • Student
  • GimmeMyDomains
  • Posts: 75

Post 3+ Months Ago

Wouldn't something like this work in your robots.txt file?

User-agent: crawl
Disallow: /

I'm guessing that the robot agent is called "crawl" - if not, you'd have to enter what the robot agent's name is.

I believe you can also block specific IPs.
  • meman
  • Web Master
  • Web Master
  • User avatar
  • Posts: 3432
  • Loc: London Town , Apples and pears and all that crap

Post 3+ Months Ago

I would just ban the hostname.
You wont gain anything by letting them spider your site.

When you use robots.txt to restrict spider bots you have to assume that they comform to the rules in the robots.txt, which they probably don't.
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

Yeah, .htaccess would be a better way to ban them.

Robots.txt is voluntary. It's a file that says "Hey, Google, please don't steal all my /images directory" and Google complies. :)

With some random search engine in .jp they're just as likely to say "heh, yeah, whatever" and eat all your bandwidth anyway.

Blocking them through .htaccess file will ensure they can't even access your web provider. If your site doesn't actually cater to people in Japan, you might as well just ban the whole country.

I've got several Asian countries banned from accessing some of my sites because the majority have just stolen content from the sites, republished it, etc. So, ban Japan, ban Malaysia, ban Indonesia. You know what? The content theft stopped :)
  • Hyarion
  • Born
  • Born
  • Hyarion
  • Posts: 3

Post 3+ Months Ago

Thank you guys, I've placed a .htaccess file blocking the domain that's sending the requests as well as bad bots - found this while looking up blocking domains:

http://www.javascriptkit.com/howto/htaccess13.shtml

Will add an update tomorrow on whether it's worked. :)
  • Hyarion
  • Born
  • Born
  • Hyarion
  • Posts: 3

Post 3+ Months Ago

I've had to change the .htaccess file to

deny from [domainname]

and it's reduced the bandwidth from 199Mb to 38Mb!
  • edawg
  • Graduate
  • Graduate
  • User avatar
  • Posts: 105

Post 3+ Months Ago

How much Bandwidth is too much for a robot to use?, i have a smaller site, and msn seems to take about 7megs per visit, is that normal?
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

How much is really going to be determined by the sites of your site and the amount of bandwidth available to you.

Search engines use about 10-12Gig of bandwidth a month on one of my sites. Regular visitors are about another 60 or so Gig/mo.

If search engines were using that much and I were onyl using about 5Gig on human visitor traffic, then 10-12Gig would be to much as far as I'm concerned :)

As far as what's normal, again, it really depends on the size of your site. The more pages you have, the more often it's updated, the more search engines will visit your pages and the more bandwidth they'll use.

Post Information

  • Total Posts in this topic: 8 posts
  • Users browsing this forum: No registered users and 3 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.