robot.txt - Do I need them?

  • Tim Otool
  • Beginner
  • Beginner
  • Tim Otool
  • Posts: 54

Post 3+ Months Ago

Is robot.txt recommended to be on my web space?

If so, can somebody post me the most common robot.txt script?
(they are short, right?)

Or is it something I don't need to have on my server?

Thank you.
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • madmonk
  • Mastermind
  • Mastermind
  • madmonk
  • Posts: 2115
  • Loc: australia

Post 3+ Months Ago

It is recommended on your server. Really short..

If you use the search function in ozzu or google search for robot.txt, you can find a decent explanation/example on one.. :-)
  • bb99
  • Graduate
  • Graduate
  • bb99
  • Posts: 106
  • Loc: Singapore

Post 3+ Months Ago

Hi guy you can visit this site also :

http://www.robotstxt.org/wc/meta-user.html
  • Jess
  • Guru
  • Guru
  • User avatar
  • Posts: 1153
  • Loc: USA

Post 3+ Months Ago

its worth having a robots.txt - even if its just to tell the spiders they can index all the pages of your site.

Of course its more important to use if you have files you don't want search engines to index.
  • madmonk
  • Mastermind
  • Mastermind
  • madmonk
  • Posts: 2115
  • Loc: australia

Post 3+ Months Ago

Quote:
Of course its more important to use if you have files you don't want search engines to index.


yeah thats very important. for example, you ripped a pdf file off somewhere else online... you may not want the search engine to index.

so, like me, you will want to put a line in robots.txt to tell search engine bots not to index... :oops:
  • GSlinger
  • Proficient
  • Proficient
  • User avatar
  • Posts: 384
  • Loc: Ohio

Post 3+ Months Ago

and private pages are best to use these scripts. or if your server is having a hard time and you see alot of bots and spiders are eating bandwidth.

and another reason is if you have a "pay per hit" setup. these bots and spiders count as hits. so you are paying for an inanimate user hitting your site.
  • Jess
  • Guru
  • Guru
  • User avatar
  • Posts: 1153
  • Loc: USA

Post 3+ Months Ago

madmonk wrote:
Quote:
Of course its more important to use if you have files you don't want search engines to index.


yeah thats very important. for example, you ripped a pdf file off somewhere else online... you may not want the search engine to index.

so, like me, you will want to put a line in robots.txt to tell search engine bots not to index... :oops:


LOL naughty naughty :P
  • madmonk
  • Mastermind
  • Mastermind
  • madmonk
  • Posts: 2115
  • Loc: australia

Post 3+ Months Ago

I thought everybody does that :shock:
only me? oopss....
  • eitemiller
  • Banned
  • Banned
  • eitemiller
  • Posts: 11
  • Loc: Arizona

Post 3+ Months Ago

For obvious reasons, i will not post the names of the directories that I can the bots access to. However, there are many bots that I ban from my site. Most are Bandwidth stealers, email collecters, and other crap.

User-Agent: almaden
Disallow: /
User-Agent: ASPSeek
Disallow: /
User-Agent: Axmo
Disallow: /
User-Agent: BaiduSpider
Disallow: /
User-Agent: booch
Disallow: /
User-Agent: DTS Agent
Disallow: /
User-Agent: Downloader
Disallow: /
User-Agent: EmailCollector
Disallow: /
User-Agent: EmailSiphon
Disallow: /
User-Agent: EmailWolf
Disallow: /
User-Agent: Expired Domain Sleuth
Disallow: /
User-Agent: Franklin Locator
Disallow: /
User-Agent: Gaisbot
Disallow: /
User-Agent: grub
Disallow: /
User-Agent: HughCrawler
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: lcabotAccept
Disallow: /
User-Agent: IconSurf
Disallow: /
User-Agent: Iltrovatore-Setaccio
Disallow: /
User-Agent: Indy Library
Disallow: /
User-Agent: IUPUI
Disallow: /
User-Agent: Kittiecentral
Disallow: /
User-Agent: iaea.org
Disallow: /
User-Agent: larbin
Disallow: /
User-Agent: lwp-trivial
Disallow: /
User-Agent: MetaTagRobot
Disallow: /
User-Agent: Missigua Locator
Disallow: /
User-Agent: NetResearchServer
Disallow: /
User-Agent: NextGenSearch
Disallow: /
User-Agent: NPbot
Disallow: /
User-Agent: Nutch
Disallow: /
User-Agent: ObjectsSearch
Disallow: /
User-Agent: Oracle Ultra Search
Disallow: /
User-Agent: PEERbot
Disallow: /
User-Agent: PictureOfInternet
Disallow: /
User-Agent: PlantyNet
Disallow: /
User-Agent: QuepasaCreep
Disallow: /
User-Agent: ScSpider
Disallow: /
User-Agent: SOFT411
Disallow: /
User-Agent: spider.acont.de
Disallow: /
User-Agent: Sqworm
Disallow: /
User-Agent: SSM Agent
Disallow: /
User-Agent: TAMU
Disallow: /
User-Agent: TheUsefulbot
Disallow: /
User-Agent: TurnitinBot
Disallow: /
User-Agent: Tutorial Crawler
Disallow: /
User-Agent: TutorGig
Disallow: /
User-Agent: WebCopier
Disallow: /
User-Agent: WebZIP
Disallow: /
User-Agent: ZipppBot
Disallow: /
User-Agent: Xenu
Disallow: /
User-Agent: Wotbox
Disallow: /
User-Agent: Wget
Disallow: /
User-Agent: NaverBot
Disallow: /
User-Agent: mozDex
Disallow: /
  • zcorpan
  • Student
  • Student
  • zcorpan
  • Posts: 77

Post 3+ Months Ago

elitemiller: how about...
Code: [ Select ]
User-agent: *
Disallow: /
  1. User-agent: *
  2. Disallow: /
That will disallow all user agents to index the site.
  • djtheropy
  • Graduate
  • Graduate
  • djtheropy
  • Posts: 111

Post 3+ Months Ago

no because that would block every search engine from indexing your site that list above (i believe) just blocks spam bot
  • dprichard
  • Beginner
  • Beginner
  • User avatar
  • Posts: 61
  • Loc: Clearwater Florida

Post 3+ Months Ago

Thanks for the list eitemiller! Good idea
  • stoner3221
  • Novice
  • Novice
  • User avatar
  • Posts: 26
  • Loc: United States, New York

Post 3+ Months Ago

I had to recently implement a rather extensive robots txt due to the number of spiders that were on the site at the same time. They were bringing down the whole server. I never really gave much consideration to it previously.
  • webton
  • Newbie
  • Newbie
  • webton
  • Posts: 5

Post 3+ Months Ago

Although not necessary, it is important to have the file, robots.txt, in the root of your website (http://www.yourdomain.com/robots.txt). The basic use of the robots.txt is to note which files and directories the robots should not index. The absence of the file will show 404 errors in your website logs, making it more difficult to parse the stats logs for useful information. I have found a great article about this topic at: http://www.planetarywebsites.com/Articl ... _site.html
  • remaxactionfirst
  • Graduate
  • Graduate
  • remaxactionfirst
  • Posts: 103

Post 3+ Months Ago

For robots.txt code is here for you in general format.

User-agent: *
Disallow: /

Post Information

  • Total Posts in this topic: 15 posts
  • Users browsing this forum: No registered users and 8 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.