SearchMe.com ... How to stop it's madness!

  • kbergmann
  • Expert
  • Expert
  • User avatar
  • Posts: 659
  • Loc: USA

Post 3+ Months Ago

First off I am not sure where this questions home should be, but I saw it fitting to put it in with website design, as it could effect a lot of people.

I manage a website for work where each URL ends with a unique ID. SearchMe was built by people not knowing what the hell they are doing and can view the home page multiple times as it accounts for the whole web address not realizing it has been there before.

It is screwing up the traffic logs, hammering the server, and killing our bandwidth.

I know the IP range they use and thier spider's name and some of their aliases.

To solve this issue I have a few options, block them in the vhosts file with apache or with htaccess. I have tried up and down to make the vhosts work and it wont block them. And they don't respect the robots.txt file .... Changing the website is not an option.

How would you go about this?

I have pondered using PHP and if it's them send them back to their site so they hammer themselves as hard as they do us.

Any help would be appreciated, their application flat out sucks and hammers websites.

Thanks in advance.
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • digioz
  • Newbie
  • Newbie
  • User avatar
  • Posts: 6
  • Loc: Chicago, IL

Post 3+ Months Ago

Hello kbergmann,

Have you considered apache mod-rewrite to change your url to something more user friendly? For example you could change this:

Code: [ Select ]
http://www.example.com/viewcatalog.php? ... hats&id=53


To something like this:

Code: [ Select ]
http://www.example.com/catalog/hats/53/


You can find more information on the apache website:

http://httpd.apache.org/docs/2.0/misc/rewriteguide.html

Pete
  • kbergmann
  • Expert
  • Expert
  • User avatar
  • Posts: 659
  • Loc: USA

Post 3+ Months Ago

That is an option, yet we try not to go too exotic here at work with apache mods.

In your suggestion code where you have `id` we have `c` which does not act like an ID and is unique to each visitor and I believe is related to sessions. SearchMe hits each page in succession, thus every time it comes back in for a new page it gets a new session ID as it does not just follow links on the pages. It is not a very smart spider ...

I appreciate the help though, thank you.
  • joebert
  • Fart Bubbles
  • Genius
  • User avatar
  • Posts: 13502
  • Loc: Florida

Post 3+ Months Ago

Have you tried http://www.searchme.com/support/spider/#pos_02 ?
  • kbergmann
  • Expert
  • Expert
  • User avatar
  • Posts: 659
  • Loc: USA

Post 3+ Months Ago

Yes, and emailing them got the spider off the site. This is only temporary though as next search that is done, back to hammering the server.

It's a cool idea, they just need someone who knows what they are doing at the helm as they COULD be successful with this idea if it worked properly.
  • Bogey
  • Genius
  • Genius
  • Bogey
  • Posts: 8388
  • Loc: USA

Post 3+ Months Ago

Quote:
User-Agent: Charlotte
Disallow: /

Have you tried that in your robots.txt?
  • kbergmann
  • Expert
  • Expert
  • User avatar
  • Posts: 659
  • Loc: USA

Post 3+ Months Ago

Yes, they read it several hundreds of times and but apparently missed those 2 lines.
  • Bogey
  • Genius
  • Genius
  • Bogey
  • Posts: 8388
  • Loc: USA

Post 3+ Months Ago

kbergmann wrote:
Yes, they read it several hundreds of times and but apparently missed those 2 lines.

lol ok

Post Information

  • Total Posts in this topic: 8 posts
  • Users browsing this forum: No registered users and 94 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.