robot.txt - Do I need them?

  • Tim Otool
  • Beginner
  • Beginner
  • No Avatar
  • Joined: Oct 18, 2004
  • Posts: 54
  • Status: Offline

Post November 3rd, 2004, 12:42 pm

Is robot.txt recommended to be on my web space?

If so, can somebody post me the most common robot.txt script?
(they are short, right?)

Or is it something I don't need to have on my server?

Thank you.
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post November 3rd, 2004, 12:42 pm

  • madmonk
  • Mastermind
  • Mastermind
  • No Avatar
  • Joined: May 04, 2004
  • Posts: 2115
  • Loc: australia
  • Status: Offline

Post November 3rd, 2004, 12:51 pm

It is recommended on your server. Really short..

If you use the search function in ozzu or google search for robot.txt, you can find a decent explanation/example on one.. :-)
Tattoos Gallery
  • bb99
  • Graduate
  • Graduate
  • No Avatar
  • Joined: Sep 15, 2004
  • Posts: 106
  • Loc: Singapore
  • Status: Offline

Post November 3rd, 2004, 12:54 pm

Hi guy you can visit this site also :

http://www.robotstxt.org/wc/meta-user.html
  • Jess
  • Guru
  • Guru
  • User avatar
  • Joined: Sep 10, 2004
  • Posts: 1153
  • Loc: USA
  • Status: Offline

Post November 3rd, 2004, 1:09 pm

its worth having a robots.txt - even if its just to tell the spiders they can index all the pages of your site.

Of course its more important to use if you have files you don't want search engines to index.
  • madmonk
  • Mastermind
  • Mastermind
  • No Avatar
  • Joined: May 04, 2004
  • Posts: 2115
  • Loc: australia
  • Status: Offline

Post November 3rd, 2004, 5:38 pm

Quote:
Of course its more important to use if you have files you don't want search engines to index.


yeah thats very important. for example, you ripped a pdf file off somewhere else online... you may not want the search engine to index.

so, like me, you will want to put a line in robots.txt to tell search engine bots not to index... :oops:
Tattoos Gallery
  • GSlinger
  • Proficient
  • Proficient
  • User avatar
  • Joined: Oct 21, 2004
  • Posts: 384
  • Loc: Ohio
  • Status: Offline

Post November 3rd, 2004, 7:01 pm

and private pages are best to use these scripts. or if your server is having a hard time and you see alot of bots and spiders are eating bandwidth.

and another reason is if you have a "pay per hit" setup. these bots and spiders count as hits. so you are paying for an inanimate user hitting your site.
  • Jess
  • Guru
  • Guru
  • User avatar
  • Joined: Sep 10, 2004
  • Posts: 1153
  • Loc: USA
  • Status: Offline

Post November 4th, 2004, 5:27 am

madmonk wrote:
Quote:
Of course its more important to use if you have files you don't want search engines to index.


yeah thats very important. for example, you ripped a pdf file off somewhere else online... you may not want the search engine to index.

so, like me, you will want to put a line in robots.txt to tell search engine bots not to index... :oops:


LOL naughty naughty :P
  • madmonk
  • Mastermind
  • Mastermind
  • No Avatar
  • Joined: May 04, 2004
  • Posts: 2115
  • Loc: australia
  • Status: Offline

Post November 4th, 2004, 12:18 pm

I thought everybody does that :shock:
only me? oopss....
Tattoos Gallery
  • eitemiller
  • Banned
  • Banned
  • No Avatar
  • Joined: Oct 06, 2004
  • Posts: 11
  • Loc: Arizona
  • Status: Offline

Post November 4th, 2004, 6:38 pm

For obvious reasons, i will not post the names of the directories that I can the bots access to. However, there are many bots that I ban from my site. Most are Bandwidth stealers, email collecters, and other crap.

User-Agent: almaden
Disallow: /
User-Agent: ASPSeek
Disallow: /
User-Agent: Axmo
Disallow: /
User-Agent: BaiduSpider
Disallow: /
User-Agent: booch
Disallow: /
User-Agent: DTS Agent
Disallow: /
User-Agent: Downloader
Disallow: /
User-Agent: EmailCollector
Disallow: /
User-Agent: EmailSiphon
Disallow: /
User-Agent: EmailWolf
Disallow: /
User-Agent: Expired Domain Sleuth
Disallow: /
User-Agent: Franklin Locator
Disallow: /
User-Agent: Gaisbot
Disallow: /
User-Agent: grub
Disallow: /
User-Agent: HughCrawler
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: lcabotAccept
Disallow: /
User-Agent: IconSurf
Disallow: /
User-Agent: Iltrovatore-Setaccio
Disallow: /
User-Agent: Indy Library
Disallow: /
User-Agent: IUPUI
Disallow: /
User-Agent: Kittiecentral
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: larbin
Disallow: /
User-Agent: lwp-trivial
Disallow: /
User-Agent: MetaTagRobot
Disallow: /
User-Agent: Missigua Locator
Disallow: /
User-Agent: NetResearchServer
Disallow: /
User-Agent: NextGenSearch
Disallow: /
User-Agent: NPbot
Disallow: /
User-Agent: Nutch
Disallow: /
User-Agent: ObjectsSearch
Disallow: /
User-Agent: Oracle Ultra Search
Disallow: /
User-Agent: PEERbot
Disallow: /
User-Agent: PictureOfInternet
Disallow: /
User-Agent: PlantyNet
Disallow: /
User-Agent: QuepasaCreep
Disallow: /
User-Agent: ScSpider
Disallow: /
User-Agent: SOFT411
Disallow: /
User-Agent: spider.acont.de
Disallow: /
User-Agent: Sqworm
Disallow: /
User-Agent: SSM Agent
Disallow: /
User-Agent: TAMU
Disallow: /
User-Agent: TheUsefulbot
Disallow: /
User-Agent: TurnitinBot
Disallow: /
User-Agent: Tutorial Crawler
Disallow: /
User-Agent: TutorGig
Disallow: /
User-Agent: WebCopier
Disallow: /
User-Agent: WebZIP
Disallow: /
User-Agent: ZipppBot
Disallow: /
User-Agent: Xenu
Disallow: /
User-Agent: Wotbox
Disallow: /
User-Agent: Wget
Disallow: /
User-Agent: NaverBot
Disallow: /
User-Agent: mozDex
Disallow: /
  • zcorpan
  • Student
  • Student
  • No Avatar
  • Joined: Nov 10, 2004
  • Posts: 77
  • Status: Offline

Post November 21st, 2004, 10:12 am

elitemiller: how about...
Code: [ Select ]
User-agent: *
Disallow: /
  1. User-agent: *
  2. Disallow: /
That will disallow all user agents to index the site.
  • djtheropy
  • Graduate
  • Graduate
  • No Avatar
  • Joined: Nov 05, 2004
  • Posts: 111
  • Status: Offline

Post November 21st, 2004, 1:21 pm

no because that would block every search engine from indexing your site that list above (i believe) just blocks spam bot
  • dprichard
  • Beginner
  • Beginner
  • User avatar
  • Joined: Jun 21, 2004
  • Posts: 61
  • Loc: Clearwater Florida
  • Status: Offline

Post November 24th, 2004, 11:54 am

Thanks for the list eitemiller! Good idea
  • stoner3221
  • Novice
  • Novice
  • User avatar
  • Joined: May 25, 2004
  • Posts: 26
  • Loc: United States, New York
  • Status: Offline

Post November 24th, 2004, 2:15 pm

I had to recently implement a rather extensive robots txt due to the number of spiders that were on the site at the same time. They were bringing down the whole server. I never really gave much consideration to it previously.
  • webton
  • Newbie
  • Newbie
  • No Avatar
  • Joined: Sep 02, 2006
  • Posts: 5
  • Status: Offline

Post September 2nd, 2006, 1:42 pm

Although not necessary, it is important to have the file, robots.txt, in the root of your website (http://www.yourdomain.com/robots.txt). The basic use of the robots.txt is to note which files and directories the robots should not index. The absence of the file will show 404 errors in your website logs, making it more difficult to parse the stats logs for useful information. I have found a great article about this topic at: http://www.planetarywebsites.com/Articl ... _site.html
  • remaxactionfirst
  • Graduate
  • Graduate
  • No Avatar
  • Joined: Sep 15, 2006
  • Posts: 105
  • Status: Offline

Post September 26th, 2006, 11:37 pm

For robots.txt code is here for you in general format.

User-agent: *
Disallow: /
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post September 26th, 2006, 11:37 pm

Post Information

  • Total Posts in this topic: 15 posts
  • Users browsing this forum: No registered users and 33 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 2011 Unmelted, LLC. Ozzu® is a registered trademark of Unmelted, LLC.