google and robots.txt

  • ccb056
  • Graduate
  • Graduate
  • User avatar
  • Joined: Mar 20, 2004
  • Posts: 189
  • Status: Offline

Post March 20th, 2004, 10:30 pm

Will a page that is disallowed in robots.txt still have a google pagerank after a database update?

Does the robots.txt prevent the page from being accessed or cached by google?

How does google treat '*' in robots.txt files?
  • Anonymous
  • Bot
  • No Avatar
  • Joined: 25 Feb 2008
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post March 20th, 2004, 10:30 pm

  • CazpianXI
  • Proficient
  • Proficient
  • User avatar
  • Joined: Dec 22, 2003
  • Posts: 285
  • Status: Offline

Post March 22nd, 2004, 7:38 pm

Yes, here's more info about that (from the big men themselves...)

http://www.google.com/bot.html

FYI: ever wondered what Google's robots.txt was? http://www.google.com/robots.txt
  • cyberax
  • Graduate
  • Graduate
  • User avatar
  • Joined: Apr 07, 2004
  • Posts: 169
  • Loc: INDIA
  • Status: Offline

Post April 11th, 2004, 10:12 pm

Robots.txt is a standard document that can tell Search Engine Bots not to download some or all information from your web server. For information on how to create a robots.txt file, see [url=http://www.robotstxt.org/wc/norobots.html]The Robot Exclusion Standard[url]

==============

The robots.txt prevents the GoogleBot from accessing the page thus the cacheing is also prevented.

Google automatically takes a "snapshot" of each page it crawls and caches it. This enables us to show the search terms highlighted on text heavy pages so users can find relevant information quickly, and to retrieve pages for users if the site's server temporarily fails. Users can access the cached version by choosing the "Cached" link on the search results page. If you do not want your content to be accessible through Google's cache, you can use the NOARCHIVE meta-tag. Place this in the <HEAD> section of your documents:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

This tag will tell robots not to archive the page. Google will continue to index and follow links from the page, but will not present cached material to users.

If you want to allow other robots to archive your content, but prevent Google's robots from caching, you can use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

Note that the change will occur the next time Google crawls the page containing the NOARCHIVE tag (typically at least once per month). To control whether the page is indexed, use the NOINDEX tag; to control whether links are followed, use the NOFOLLOW tag. See the Robots Exclusion page for more information.

http://www.google.co.in/webmasters/3.html

=========


Cheers
Meme4u.com : Free Ecards - Free ecards, printables and e-invites portal
India Forums - Indian television discussion portal
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Joined: Sep 30, 2003
  • Posts: 796
  • Status: Offline

Post June 4th, 2004, 9:55 pm

If I have a page that I don't want to have crawled in folder1.....and I put a disallow statement in my robots.txt file to deny access to folder1...then I link to this page from my site's index page. Will the spiders index the page? Or will the disallow prevail even though I linked to the page from a spiderable page?
  • Shrek_Update
  • Student
  • Student
  • User avatar
  • Joined: Jun 02, 2004
  • Posts: 73
  • Status: Offline

Post June 4th, 2004, 10:37 pm

if there is link from another site on that it will get pagerank anyways
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Joined: Sep 30, 2003
  • Posts: 796
  • Status: Offline

Post June 6th, 2004, 10:05 am

I'm not worried about PR. What I don't want to happen is for my login page to show up in search results. It's only for the site owner not the general public.
  • webmasterbrain
  • Beginner
  • Beginner
  • No Avatar
  • Joined: May 04, 2004
  • Posts: 51
  • Status: Offline

Post June 7th, 2004, 7:30 am

phaugh wrote:
I'm not worried about PR. What I don't want to happen is for my login page to show up in search results. It's only for the site owner not the general public.


In that case you better cloak your robots.txt file, so users (or more specifically hackers) can't see the url to your login page, or nest this between your head tags: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
  • Matthew
  • Proficient
  • Proficient
  • User avatar
  • Joined: Jul 07, 2005
  • Posts: 266
  • Loc: Canada
  • Status: Offline

Post July 11th, 2005, 8:34 am

If I have this code...

Code: [ Select ]
<META NAME="ROBOTS" CONTENT="NOARCHIVE, NOINDEX, NOFOLLOW">


...will the page that contains this code still get PageRank, and will the pages that this page links to still get PageRank?
Matthew Doucette, Xona Games
Award winning indie game studio.
  • rtchar
  • Expert
  • Expert
  • User avatar
  • Joined: Mar 22, 2004
  • Posts: 606
  • Loc: Canada
  • Status: Offline

Post July 11th, 2005, 5:00 pm

There seems to somewhat of a misunderstanding here ...

NOARCHIVE tells Google not to cache the page

NOINDEX tells Google not to include the site in search results

NOFOLLOW tells Google to ignore links on the page

If a page is NOINDEX'd there is no way page rank can be assigned, since it is not included in the database. This also prevents a site from being included in search results.

A page that is NOARCHIVE'd can be found in the search results, and is assigned page rank. However no snapshot is kept on file.

With NOFOLLOW the linked pages will not receive page rank Google will not even follow the link path.
  • Matthew
  • Proficient
  • Proficient
  • User avatar
  • Joined: Jul 07, 2005
  • Posts: 266
  • Loc: Canada
  • Status: Offline

Post July 11th, 2005, 5:27 pm

I actually knew what they meant, but I was unsure of how strictly they (Google, whoever...) followed those definitions. For example, I was confused if a NOFOLLOW page would pass along PageRank to a page Google knew of via other means, because Google would not have to actually follow the link to know about and index that page. I was thinking too technically, as that is obviously not the case. Thanks for the clarification.
Matthew Doucette, Xona Games
Award winning indie game studio.

Post Information

  • Total Posts in this topic: 10 posts
  • Users browsing this forum: No registered users and 16 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 2011 Unmelted, LLC. Ozzu® is a registered trademark of Unmelted, LLC.