Importance Of Using Robots.txt File

  • coolslko
  • Proficient
  • Proficient
  • coolslko
  • Posts: 288
  • Loc: India

Post 3+ Months Ago

If you have a well designed and well optimized website with keyword rich content to attract visitors and search engines that is really great but you are missing something that is very important. Do you know what that is? That is Robots.txt file.

Robots.txt file has lots of importance as it allows spiders or crawlers to allow or disallow to crawl all pages of a website or a particular webpage. Sometimes people have some confidential data on their website and by using robots.txt file they can restrict crawlers or spider to not crawl or index that particular page so no one can reach on that page and in this way confidential data on that page will be secure.

While going to crawl a website or a webpage search engine spiders or crawlers often look for a special file that is called Robots.txt file because through robots.txt file search engine spiders or crawlers come to know about which web pages of that website have to crawl or index and which web pages have to ignore.

Robots.txt file is a simple text file that must be placed in root directory of a website.
For Example:

Robots.txt file must be like as-

http://www.abc.com/robots.txt

Creating Robots.txt File:

As mentioned above, robots.txt file is a simple text file and you can create it by opening a simple text editor like notepad. The data or a command mentioned in robots.txt file is called as "records".

A record includes the information of a particular search engine and each record have two fields- User agent where you mention the robots or spider name and other field is disallow lines that may be one or more where you have to mention that which pages or files have to be ignored. For Example:

User-agent: googlebot

Disallow: /cgi-bin/

In above example robots.txt file allows "googlebot" that is the search engine spider of major search engine Google to crawl each and every page of the website except of files from "cgi-bin" directory. Means googlebot have to ignore all files from "cgi-bin"directory.

And if you enter like below:

User-agent: googlebot

Disallow: /support

Googlebot will not crawl any file from support directory as robots.txt file has instructions to googlebot to not crawl any file from support directory.

In case you leave disallow field blank then it will indicate to googlebot to crawl all files of the website. But in any case you must have a disallow field for every user agent.

The all above example were only for googlebot but if you want to give same rights to all other search engine's spiders then use asterisk (*) instead of googlebot in user agent field. For example:

User-agent: *

Disallow: /cgi-bin/

In above example * represents all search engine spiders and robots.txt file above allows all search engine spiders to crawl each and every page of the website except of files from "cgi-bin" directory. Means all spiders from different search engines have to ignore all files from "cgi-bin"directory.

If you want to know user agent names for other search engines then you can find it in your log files by checking for requests to robots.txt. Most often, all search engine spiders should be given the same rights. in that case, use User-agent: * as mentioned above
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4924
  • Loc: NY

Post 3+ Months Ago

Doesn't a robots.txt file tell people where you are hiding your confidential information?
It may stop the bots from crawling there but does it stop everyone? Any confidential information shouldn't be anywhere near the web space.
http://www.spongefish.com/creations/657 ... -2/steps/1

http://johnny.ihackstuff.com/ghdb.php?f ... ail&id=468
  • skysoldier
  • Graduate
  • Graduate
  • User avatar
  • Posts: 133
  • Loc: Philippines

Post 3+ Months Ago

Yup, but mainly the purpose of robot.txt is for web crawlers!
  • webmindz24
  • Novice
  • Novice
  • webmindz24
  • Posts: 30

Post 3+ Months Ago

Nice Article.. robots.txt shows the exact way to web crawler in which manner to crawl your website..
  • Steven D
  • Proficient
  • Proficient
  • Steven D
  • Posts: 263

Post 3+ Months Ago

Don2007 wrote:
Doesn't a robots.txt file tell people where you are hiding your confidential information?
It may stop the bots from crawling there but does it stop everyone? Any confidential information shouldn't be anywhere near the web space.
http://www.spongefish.com/creations/657 ... -2/steps/1

http://johnny.ihackstuff.com/ghdb.php?f ... ail&id=468


I was so thinking that. Really if you are disalowing a directory you should also have directory listing turned off, however, people can still take a guess at your password.

I dont use a robots.txt file, I place my admin and confidential info into a seperate directory which is 24 characters long, it only allows access from my IP address and has a Administrator only privilage set. Some would call me anal, I would agree.
  • Don2007
  • Web Master
  • Web Master
  • Don2007
  • Posts: 4924
  • Loc: NY

Post 3+ Months Ago

Anal, my a**. You're not anal. That's the way to do it.
  • Steven D
  • Proficient
  • Proficient
  • Steven D
  • Posts: 263

Post 3+ Months Ago

Don2007 wrote:
Anal, my a**. You're not anal. That's the way to do it.


Thanks, that makes me feel better because I always wonder if im going to far.

I am sure I wasnt because when I check my log files there are so many people trying to do things they shouldnt that it can be very worrysome.
  • karmadir
  • Graduate
  • Graduate
  • karmadir
  • Posts: 103
  • Loc: India

Post 3+ Months Ago

If you have more than 100 links in your site,ideal to use robot.txt since it can lead to effect your ranking.

Post Information

  • Total Posts in this topic: 8 posts
  • Users browsing this forum: No registered users and 7 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.