Some Bad Bots ignore htaccess?

  • mico
  • Beginner
  • Beginner
  • User avatar
  • Posts: 54
  • Loc: Neo Universe

Post 3+ Months Ago

i've been trying to add some bad bots to my htaccess but why the bots still coming and crawling in my site?
did i write it wrong? here it is (with those keep-coming-back bots) :
Code: [ Select ]
RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR] <--or Baiduspider+ ?
RewriteCond %{HTTP_USER_AGENT} ^Googlebot [OR] <--i hate it much!
RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] <--directory index forbidden
RewriteCond %{HTTP_USER_AGENT} ^Twiceler [OR] <--the worst! worst! worst! directory index forbidden
RewriteCond %{HTTP_USER_AGENT} ^spbot
RewriteRule ^.* - [F,L]
  1. RewriteCond %{HTTP_USER_AGENT} ^Baiduspider [OR] <--or Baiduspider+ ?
  2. RewriteCond %{HTTP_USER_AGENT} ^Googlebot [OR] <--i hate it much!
  3. RewriteCond %{HTTP_USER_AGENT} ^msnbot [OR] <--directory index forbidden
  4. RewriteCond %{HTTP_USER_AGENT} ^Twiceler [OR] <--the worst! worst! worst! directory index forbidden
  5. RewriteCond %{HTTP_USER_AGENT} ^spbot
  6. RewriteRule ^.* - [F,L]


any mistake?

And when i add "Yahoo! Slurp" to the list, my website be in error code 500 for being accessed.
how's the proper way to write Yahoo! Slurp on the list above?

thanks in advance
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13503
  • Loc: Florida

Post 3+ Months Ago

There's nothing you can do to prevent them from coming back. You can only control what they see. If you look in the log files for them, are you seeing HTTP 200 OK status codes for their requests, or are you seeing 403 Forbidden HTTP status codes ?

When you add Yahoo, use "Slurp" instead of "Yahoo!".
  • mico
  • Beginner
  • Beginner
  • User avatar
  • Posts: 54
  • Loc: Neo Universe

Post 3+ Months Ago

Alright i get it.
They all sent to 403.

But this Twiceler keep trying to crawl even to forbidden/protected area AND to any broken links i've fixed later (was it from their cache of previous act maybe? ) But it sent to 403 & 404(this one creeps me out!) in d end anyway.

Thanks for helping me again, master joebert.
  • webmaster[+-]
  • Beginner
  • Beginner
  • User avatar
  • Posts: 44
  • Loc: UK

Post 3+ Months Ago

Perhaps you forgot to turn RewriteEngine on?

This should be before your "badbot" lines...

Code: [ Select ]
Options +FollowSymlinks
RewriteEngine on
  1. Options +FollowSymlinks
  2. RewriteEngine on


Use exact bot names and redirect them somewhere else... this will stop crawling your site for good.


But I really <3 Googlebot... :roll:
  • mico
  • Beginner
  • Beginner
  • User avatar
  • Posts: 54
  • Loc: Neo Universe

Post 3+ Months Ago

Thanks for d reply webmaster[+-]
I've turned it on. But somehow this damn Twiceler seems out of control. It even oftenly try to access forbidden areas. Hardly amuses me.

Actualy i need Googlebot too^^
But i dont like its way of listing all my pages link when they suppose to be viewed in some ways i want (like frames, mean i dont want anybody know d links path directly from google). Ugly!

@ clcheapshoes520
How to know what?
  • tastysite
  • Proficient
  • Proficient
  • User avatar
  • Posts: 349
  • Loc: Brighouse, West Yorkshire, England

Post 3+ Months Ago

There are some robots that do ignore your htaccess file not much you can do about it but hope they go away. however I goggled Twiceler and it seams legitimate so i'm not sure are you SURE you know what it responds to? I have never heard of it and I tend to block all but msn (now bing) google (I know you have blocked it but I think the fact that it is the most used search makes up for the bandwidth) yahoo and any other ones I find that are good.

plus - this may be true for most search sites but there is 2 google robots googlebot for sites and googleimagebot for getting the pictures off your site if you block one it does not block the other.
  • mico
  • Beginner
  • Beginner
  • User avatar
  • Posts: 54
  • Loc: Neo Universe

Post 3+ Months Ago

@tastysite
"There are some robots that do ignore htaccess"
WOW! Thats surprising! I tho htaccess is the almighty that nothing can escape from its commands. *pale*

I dont care if Twiceler is legitimate or sumthing, knowing that it keep trying to access forbidden directories on my website.
In the end, i block all the IPs Twiceler bots.

I put granda google's bot on my surveillance. But i hate its imagebot :p

Thank you for your help, tastysite!
  • mico
  • Beginner
  • Beginner
  • User avatar
  • Posts: 54
  • Loc: Neo Universe

Post 3+ Months Ago

btw i have one more devil keep trying to grab my data via libwww-perl.
i've add it some ways to my htaccess but it keep coming. anyone know how to kick libwww-perl out of my web?
it's keep coming all the time, everyday :evil:

here's my htaccess for it
Code: [ Select ]
RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [OR]
RewriteCond %{HTTP_USER_AGENT} ^libwwwperl [OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC]
RewriteRule ^.* - [F,L]
  1. RewriteCond %{HTTP_USER_AGENT} ^libwww-perl [OR]
  2. RewriteCond %{HTTP_USER_AGENT} ^libwwwperl [OR]
  3. RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC]
  4. RewriteRule ^.* - [F,L]

Post Information

  • Total Posts in this topic: 8 posts
  • Users browsing this forum: No registered users and 39 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.