google and robots.txt

  • ccb056
  • Graduate
  • Graduate
  • User avatar
  • Posts: 189

Post 3+ Months Ago

Will a page that is disallowed in robots.txt still have a google pagerank after a database update?

Does the robots.txt prevent the page from being accessed or cached by google?

How does google treat '*' in robots.txt files?
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • CazpianXI
  • Proficient
  • Proficient
  • User avatar
  • Posts: 285

Post 3+ Months Ago

Yes, here's more info about that (from the big men themselves...)

http://www.google.com/bot.html

FYI: ever wondered what Google's robots.txt was? http://www.google.com/robots.txt
  • cyberax
  • Graduate
  • Graduate
  • User avatar
  • Posts: 169
  • Loc: INDIA

Post 3+ Months Ago

Robots.txt is a standard document that can tell Search Engine Bots not to download some or all information from your web server. For information on how to create a robots.txt file, see [url=http://www.robotstxt.org/wc/norobots.html]The Robot Exclusion Standard[url]

==============

The robots.txt prevents the GoogleBot from accessing the page thus the cacheing is also prevented.

Google automatically takes a "snapshot" of each page it crawls and caches it. This enables us to show the search terms highlighted on text heavy pages so users can find relevant information quickly, and to retrieve pages for users if the site's server temporarily fails. Users can access the cached version by choosing the "Cached" link on the search results page. If you do not want your content to be accessible through Google's cache, you can use the NOARCHIVE meta-tag. Place this in the <HEAD> section of your documents:
<META NAME="ROBOTS" CONTENT="NOARCHIVE">

This tag will tell robots not to archive the page. Google will continue to index and follow links from the page, but will not present cached material to users.

If you want to allow other robots to archive your content, but prevent Google's robots from caching, you can use the following tag:

<META NAME="GOOGLEBOT" CONTENT="NOARCHIVE">

Note that the change will occur the next time Google crawls the page containing the NOARCHIVE tag (typically at least once per month). To control whether the page is indexed, use the NOINDEX tag; to control whether links are followed, use the NOFOLLOW tag. See the Robots Exclusion page for more information.

http://www.google.co.in/webmasters/3.html

=========


Cheers
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

If I have a page that I don't want to have crawled in folder1.....and I put a disallow statement in my robots.txt file to deny access to folder1...then I link to this page from my site's index page. Will the spiders index the page? Or will the disallow prevail even though I linked to the page from a spiderable page?
  • Shrek_Update
  • Student
  • Student
  • User avatar
  • Posts: 73

Post 3+ Months Ago

if there is link from another site on that it will get pagerank anyways
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

I'm not worried about PR. What I don't want to happen is for my login page to show up in search results. It's only for the site owner not the general public.
  • webmasterbrain
  • Beginner
  • Beginner
  • webmasterbrain
  • Posts: 51

Post 3+ Months Ago

phaugh wrote:
I'm not worried about PR. What I don't want to happen is for my login page to show up in search results. It's only for the site owner not the general public.


In that case you better cloak your robots.txt file, so users (or more specifically hackers) can't see the url to your login page, or nest this between your head tags: <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">
  • Matthew
  • Proficient
  • Proficient
  • User avatar
  • Posts: 266
  • Loc: Canada

Post 3+ Months Ago

If I have this code...

Code: [ Select ]
<META NAME="ROBOTS" CONTENT="NOARCHIVE, NOINDEX, NOFOLLOW">


...will the page that contains this code still get PageRank, and will the pages that this page links to still get PageRank?
  • rtchar
  • Expert
  • Expert
  • User avatar
  • Posts: 606
  • Loc: Canada

Post 3+ Months Ago

There seems to somewhat of a misunderstanding here ...

NOARCHIVE tells Google not to cache the page

NOINDEX tells Google not to include the site in search results

NOFOLLOW tells Google to ignore links on the page

If a page is NOINDEX'd there is no way page rank can be assigned, since it is not included in the database. This also prevents a site from being included in search results.

A page that is NOARCHIVE'd can be found in the search results, and is assigned page rank. However no snapshot is kept on file.

With NOFOLLOW the linked pages will not receive page rank Google will not even follow the link path.
  • Matthew
  • Proficient
  • Proficient
  • User avatar
  • Posts: 266
  • Loc: Canada

Post 3+ Months Ago

I actually knew what they meant, but I was unsure of how strictly they (Google, whoever...) followed those definitions. For example, I was confused if a NOFOLLOW page would pass along PageRank to a page Google knew of via other means, because Google would not have to actually follow the link to know about and index that page. I was thinking too technically, as that is obviously not the case. Thanks for the clarification.

Post Information

  • Total Posts in this topic: 10 posts
  • Users browsing this forum: No registered users and 7 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.