fooling googlebot

  • gkboomus
  • Newbie
  • Newbie
  • gkboomus
  • Posts: 8

Post 3+ Months Ago

what do you think about fooling the googlebot, so when
you detect it you output something else well prepared for
google page rank algoritm?

what are the drawbacks?
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Posts: 9099
  • Loc: Seattle, WA & Phoenix, AZ

Post 3+ Months Ago

That would be known as cloaking. There are definite drawbacks if you get caught. The main reason people do it is so that you can have a highly optimized page for the search engine, and show your visitors something else which might be more friendly for the user. I have heard rumors that Google has unmarked robots which visit sites to see if something like cloaking might be taking place. Use at your own risk. You can find a cloaking script here however:

http://www.unmelted.com/cloaking.html
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

Well, it seems like an obvious way to try and fool the search engines, but there is something in the article that ain't true, heh.

"They will have no idea that you even fed a completely different page to them.

Well, yes, they will, Google caches many of the pages that it sees, especially the high ranking ones. Simply looking at the cache will show a user that what he is seeing and what the search engine sees are two completely different things.

Then all one simply needs to do is telnet to the server, pretend to be google bot in fake headers and they'll be able to see exactly what google does (without having to view source on the Google cache & take out all the extra bits that Google adds to cached pages).

But, there could be legitimate reasons for having your page automatically throw out several versions of a page - that could be mistaken by Google as cloaking.

Let's say a page is set to detect graphical capabilities of a browser (or at least the name of the browser - Explorer, Nutscrape, Opera, Konqueror, etc.) and throw out a graphical version if it detects one of those. If it doesn't detect a known graphics capable browser send out a plain text version (For text based browsers such as Lynx, WAP phones, etc). So, Google could unintentionally get caught as a non-graphical browser, recieve a different page, and appear to Google as if cloaking were intentionally being used to try and trick it.

Does anybody know how actively Google looks out for this "technique" of cloacking?
  • Bompa
  • Graduate
  • Graduate
  • User avatar
  • Posts: 229
  • Loc: Philippine Islands

Post 3+ Months Ago

gkboomus wrote:
what do you think about fooling the googlebot, so when
you detect it you output something else well prepared for
google page rank algoritm?

what are the drawbacks?



You have no idea what ominous forces you are upagainst!

:)
  • gkboomus
  • Newbie
  • Newbie
  • gkboomus
  • Posts: 8

Post 3+ Months Ago

In my opinion, every page should have a dedicated version for search engines. One of the problems with the web today is the large eterogenity and hardships to find relevant information, so, it is absolute
logical that a page should have versions for different agents, I don't see
why google heads would battle this idea. In fact, they should promote it
considering the large influence they have on webmasters.

GOOGLE SHOULD PROMOTE a standard or a language for information on the web. If google would say they want pages to present information in some format or based on some loose roles at least, next day all sites would implement it, so web would be more 'searchable'.
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

In theory, I agree gkboomus.

But what about those unscrupulous spammers & porn sites that will do anything to get the exposure they need to earn that $0.000001 per banner impression?

They'll be sending a high-ranking, good content page to Google, then when a human comes along, pages and pages of porn & popups.

So, while I think it is cool in theory to have a specific format for search-engines to see, so that they can possibly more accurately get the right results.. But, I can see how this would be abused VERY quickly, and a LOT (not just by spammers & porn sites though, but those who simply wish to make their site sound better to search engines than it actually is to people).
  • Bigwebmaster
  • Site Admin
  • Site Admin
  • User avatar
  • Posts: 9099
  • Loc: Seattle, WA & Phoenix, AZ

Post 3+ Months Ago

I agree with gkboomus's theory as well, and it would probably work. However, there is one major problem with that theory and it is that numerous people are not honest and will do anything to get to the top of the engines as Axe has described.
  • pompei
  • Graduate
  • Graduate
  • pompei
  • Posts: 117

Post 3+ Months Ago

The drawback is you'll be removed from the Google index forever. If you're not performing well in google with a site, then I guess you have nothing to lose, so go ahead. But...bear in mind if you start performing well, you have a good chance of losing it again. It's almost a catch 22, so I'd recommend against it.
  • I, Brian
  • Novice
  • Novice
  • I, Brian
  • Posts: 17
  • Loc: Yorkshire, UK

Post 3+ Months Ago

Cloaking apparently works - but erquires a lot of work. And, from what I hear, cloaked sites should always be disposable, as the game is about "how long before I am caught and banned" rather than "avoid ever getting caught and banned".
  • eCommando
  • Graduate
  • Graduate
  • eCommando
  • Posts: 162
  • Loc: California, USA

Post 3+ Months Ago

What does google bot do when there's Javascript? Does it just ignore it?

Anybody knows?
  • vetofunk
  • A SEO GUY
  • Mastermind
  • User avatar
  • Posts: 2245
  • Loc: Chicago

Post 3+ Months Ago

Plus, you have competitors out there, that if they find you cloaking, they will report you. If you were my competitor, I would ;-)
  • twmspro
  • Expert
  • Expert
  • User avatar
  • Posts: 591
  • Loc: Indiana

Post 3+ Months Ago

hahaha
"nutscrape"
  • dabomb_gent
  • Born
  • Born
  • dabomb_gent
  • Posts: 3

Post 3+ Months Ago

is there anyone using it now?
  • vetofunk
  • A SEO GUY
  • Mastermind
  • User avatar
  • Posts: 2245
  • Loc: Chicago

Post 3+ Months Ago

eCommando wrote:
What does google bot do when there's Javascript? Does it just ignore it?

Anybody knows?


Google is starting to follow Javascipt now, learning anyway...
  • Tree
  • Born
  • Born
  • User avatar
  • Posts: 3

Post 3+ Months Ago

Do you have anything to verify that google is learning to crawl javascripts? Or reading them?
I would find that very interesting and insiteful information!
An example, like something found in a google listing thats js loaded, or whatever else you have would be welcome.
Thanks.
  • disgust
  • Graduate
  • Graduate
  • disgust
  • Posts: 154

Post 3+ Months Ago

there's nothing wrong with ethical "cloaking" - even google does it.

you won't get kicked off for changing minor things and optimizing them for google. if you're abusing it, you very well could get kicked off.

IF you cloak, cloak via google's IPs, NOT by the version/etc that's sent in
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

Tree wrote:
Do you have anything to verify that google is learning to crawl javascripts? Or reading them?


Well, it is following URLs in JavaScript. On one of my sites (not sure which off-hand), I noticed a few new backlinks turn up in the latest Google update. So, I went nosying around to see who they were...

I couldn't see a link on the page at all, so I viewed source, and searched for the domain. There it was, right there in a javascript nav. The URL to my site, and that's the only place it appeared in the source.
  • vetofunk
  • A SEO GUY
  • Mastermind
  • User avatar
  • Posts: 2245
  • Loc: Chicago

Post 3+ Months Ago

Axe, I sent him an article that explains a little about it. As I told him, its nothing fact, just alot of speculation...maybe a lil fact ;-)
  • Tree
  • Born
  • Born
  • User avatar
  • Posts: 3

Post 3+ Months Ago

Yah, that was very informitive (thanks for the email). And for those following this thread he sent me this http://www.webpronews.com/insiderreport ... eRank.html
Now, the above (hey pm me that url if you would) if the link was followed and ONLY on the page in a javascript link, then we have factual confirmation here that google does indeed follow js links.
That is EXTREEMLY important SEO information here, though only limited in my area due to no clocking items, the insite to site performance implications to this, could easily be seen as potentually huge to other forum readers ( hey, post that url here instead of pm, if you would).

One case in point. On a dating site I'm building, if a person isnt logged in, and they try and access a members only page, they get an automatic redirect to the sites index page.
This is writen in javascript- and google takes it as a redirect (potentual spam issue with them!!!) So this insight is extreemly benifitial as in- the means to correct this, by placing the js link in an external file and placing in the robots text to disallow the folder the js is in, could repair some damage done by not trying to trick google, but them not knowing that and penalizing our listings because of it.
As for cloaking is there ever a lagitimate reason for doing that? That too I would like to see an example of.
And, I'm glad I've joined here- I'm learning a lot!
Thanks!
  • disgust
  • Graduate
  • Graduate
  • disgust
  • Posts: 154

Post 3+ Months Ago

it doesn't need to UNDERSTAND js to follow js links, though.

you can have a js dropdown for navigation, and it could just scan through the html and if it sees "/blah/page.html" it follows it, regardless of it being in a js section or not.

it's certainly possible.
  • rtm223
  • Mastermind
  • Mastermind
  • User avatar
  • Posts: 1855
  • Loc: Uk

Post 3+ Months Ago

I would imagine it doesn't just follow everything that looks like a url in javascript though. I doubt the people at google want to follow links to every useless advertisement pop-up window. It's probably looking for onClick event handlers in conjunction with urls...

I can just see this:

Code: [ Select ]
var myArrayOfLinksForGoogle = new Array("/page1.htm", "/page2.htm")


becoming a handy way to add loads of links to every page on the site without having to show them to the general public. That looks far too open to abuse to me.

I also don't agree with the idea that every page should have one version for the search engines and one for the public. The search engines are all about delivering "relevent content". How on earth are they supposed to tell what content is relevent if they never get to see the content!?
  • benoitb
  • Graduate
  • Graduate
  • User avatar
  • Posts: 114
  • Loc: Washington, DC

Post 3+ Months Ago

The internet is ruined.
  • justice
  • Beginner
  • Beginner
  • justice
  • Posts: 57

Post 3+ Months Ago

What about the other search engines? Are they following Google's lead as well?
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

I think we are giving google more credit then it deserves. I have found competitiors of mine that are using sneaky redirects by triggering the mouse over event on the body of a document. The web user can't avoid triggering the event and the spiders never trigger the event since they have no mouse. Also I have seen keyphrases stuffed inside a div tag that had its visibility set to hidden. Google will never be able to combat this techinque since it's a ligitimate way of showing and hiding layers when the user triggers an event. Google can't tell which layers are real and which ones are not.

As far as reporting these people.....google replies that they don't address any site directy but they will work on making it part of their algo....
  • BinaryMan
  • Novice
  • Novice
  • User avatar
  • Posts: 15
  • Loc: CA, USA

Post 3+ Months Ago

Consider the following code:



<body onload = "go()">

<script type="text/javascript" language="JavaScript">
finalstr = new String("")
s1 = new String("http://www.")
s2 = new String("mydomain")
s3 = new String(".com/")
s4 = new String("index.html")

function go()
{
finalstr = finalstr.concat(s1)
finalstr = finalstr.concat(s2)
finalstr = finalstr.concat(s3)
finalstr = finalstr.concat(s4)
window.location = finalstr
}
</script>

<a href="/googlebotentry.html" title="my keywords"><b>My Keywords</b></a>

</body>


This would redirect users to mydomain.com without revealing it to the googlebot (I think). It also opens the link to a set of pages designed for the bot.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

Wouldn't you still need to determine if the visitor was a bot or a browser?

There was a company that got banned for promoting sites for customers that was using a mouse over event and a java script to send visitors one way and bots another...bots don't have a mouse so they would get the page and it's links...browser users almost always have a mouse and it usually goes over the page so the redirect gets initiated....I'd be careful with this one.
  • BinaryMan
  • Novice
  • Novice
  • User avatar
  • Posts: 15
  • Loc: CA, USA

Post 3+ Months Ago

There is no need to determine who is visiting (I think) because the bot doesn't execute javascript right? For a user, the page is redirected automatically. For the bot, all it sees is the link to the inner page. I don't think it's smart enough to assemble the pieces of the destination URL unless it actually executes it like a web browser (like they have the time for that)...

That said, there is always the possibility of someone tipping off a popular site if they notice a redirect. However, loading the page in a browser doesn't show anything except to immediately change the URL. You would have to download it manually and check source (not from the browser itself) to actually see anything.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

I think the onload will cause the bots to redirect as well...but I'm not sure...sounds similar to the mouse over redirect. You might want to test it on a disposable domain just incase...you also might want to keep bots from caching the page.
  • darksat
  • Proficient
  • Proficient
  • User avatar
  • Posts: 487
  • Loc: London (via the rest of the world)

Post 3+ Months Ago

Bots dont execute javascript phaugh, sorry.
google is scanning javascript and following links in it though.
especially ones with full urls
it is even caching some external JS files.
  • BinaryMan
  • Novice
  • Novice
  • User avatar
  • Posts: 15
  • Loc: CA, USA

Post 3+ Months Ago

That's what I thought.... which means that perhaps "index.html" will get indexed, but not the full URL, and the above code should protect the full URL by splitting it.

I'm actually using the method to create a one-way path for users, but a two-way path for bots. What I've learned from studying the PR algorithm is that if your site doesn't have circular links, you will lose most of your possible PR.

By having that page linked by most other pages, it will take most of the PR for the site from the other pages in the site. However, it needs to feedback into the "loop" in order to preserve the PR vote circle. But I'd rather have the users go "forward" in the site; googlebot, however, needs to follow the links to get all the pages indexed. Pages without an inbound AND outbound link are not considered in the PR algorithm by all estimates.

[Lots of Technical stuff, might be useful if you've explored the PR algorithm:]

Oddly, this means the assumed default PR value for pages cannot be 0.15 - they need at least one "vote" and need to cast at least 1 "vote" to be considered at all! The best setup for PR happens to be a "central page" approach where the main page links to content, and the content pages all link to the central page. This results in maximum PR of 1.0 per page (average), and concentrates ~45% of the PR on the main page.

You simply cannot get any higher than 50% of your site PR on a single page in your site. Use the PR Calculator to verify this. As you add pages in this setup (I tested 25 in the calculator) the PR for the main page approaches 45% of the total for the site (perhaps the limit is 40%-45% as the page amount increases).

Therefore, your maximum raw PR score for the main page is we'll say (# of pages on the site * 0.4) without external links. Obviously, you need a large megasite to achieve a high raw score without external links. If the logarithmic scale of the PR bar (0-10) represents base 5 increases in PR -> bar PR, then you need (25 / 0.4 = 63 pages) for bar PR2.

Perhaps people with no external links to their site (self-contained, but submitted to google) and a bar PR > 0 can tell how many pages they have and their linking structure to approximate this?

Options for having "megasite-like" PR are either to have large content on on domain (risky, it's been postulated that too many pages regardless of content makes a site look bad to the BOT), have multiple domains feeding each other PR, or to use subdomains, since they are considered separate sites (and easier to setup usually).

It is interesting to note that some sites when using the "link:site" search return 0 links. Due to the way the PR algorithm works, there must be inbound/outbound links on a page for it to count in the PR algorithm (else conservation of votes goes out the window). Consider the following example (using http://www.prsearch.net):

Search for "www.hyper.com". It's 3rd result (right now) is "www.hypercube.com" with 0 inbound links (supposedly). But plug that into a search of its on (it has PR3) and you will find quite a few single links to it (most with PR0). So, perhaps Google won't give us the accurate count on inbound links?

Post Information

  • Total Posts in this topic: 30 posts
  • Users browsing this forum: No registered users and 6 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.