Duplicate Content Filter

  • beefcakejcc
  • Graduate
  • Graduate
  • beefcakejcc
  • Posts: 103
  • Loc: Atlanta,GA

Post 3+ Months Ago

I've heard several times now that Google finds sites with duplicate content and erases the page that has been around the least amount of time. That seems like a fine idea but I don't understand how it's possible.

When google caches a page does it check the resource against every other page it's caches and look for the same exact content? It seems like that would take forever to do for every page that Google finds. Does anyone know how this works?
  • Anonymous
  • Bot
  • No Avatar
  • Posts: ?
  • Loc: Ozzuland
  • Status: Online

Post 3+ Months Ago

  • panreach
  • Novice
  • Novice
  • User avatar
  • Posts: 28

Post 3+ Months Ago

Of course it's possible. Google has a lot of bright minds that think out side of the box. They're even asking for your computer's idle time for some super super computing projects. Check it out at http://toolbar.google.com/dc/offerdc.html
  • nuclei
  • Graduate
  • Graduate
  • User avatar
  • Posts: 147
  • Loc: On a mountain

Post 3+ Months Ago

Google put a patent out on it even. They patented a method of "fingerprinting" what it considers keyphrases in pages and can check the fingerprint against all other docs in their index. It really is not hard, you just have to realize that google has 10,000 machines at it's disposal to load balance things.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

This technology is an off shoot of OCR and ICR...Optical and Intelligent character recognition. The ability of decifer machine and hand print characters. That technology uses bit maped images to determine what the character values are....

So if google were to stream chunks of text into a formatter...make the font similar...it could then create bit mapped images to run through the OCR type engine....this code is usually integrated into a piece of hardware so it's processing capabilities are amazing.

Then if they figure out away to incorporate the associated word dictionary they could then find similar text that uses different but similar words....just a though
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

From what I gather, it's the page with the lowest PR that gets bumped off or to the bottom of the listings, not necessarily the site that's been around for the shortest time.
  • nuclei
  • Graduate
  • Graduate
  • User avatar
  • Posts: 147
  • Loc: On a mountain

Post 3+ Months Ago

so far every instance we have seen of it happening is always to the newest pages. Grandfathered pages dont ever seem to be affected.
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

Well, newer pages often have a lower PR than the original, so it could be purely coincidence.

A simple way to test. Buy 2 domains.

Set one domain up, let it get a PR3 or something, and a good ranking for a little used keyphrase.

Once it has PR, setup the other domain, pointing to identical content, get MANY MANY more links pointing to it for a higher PR, and see which disappears on the next update.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

If it were a matter of PR then a high PR site could wipe out an existing site by post identical content.
  • nuclei
  • Graduate
  • Graduate
  • User avatar
  • Posts: 147
  • Loc: On a mountain

Post 3+ Months Ago

phaugh wrote:
If it were a matter of PR then a high PR site could wipe out an existing site by post identical content.


exactly.
  • Axe
  • Genius
  • Genius
  • User avatar
  • Posts: 5739
  • Loc: Sub-level 28

Post 3+ Months Ago

Well, a friend of mine had an article running on his site which was ranked at #1. He allowed me to publish the article on my site, which had a higher PR, and his listing no longer appears in the top 300 results, yet the copy of his article on my site is #1.

He posted the article on his own site at least 6 months before he allowed me to post it on mine.
  • nuclei
  • Graduate
  • Graduate
  • User avatar
  • Posts: 147
  • Loc: On a mountain

Post 3+ Months Ago

Your friend may have picked up a penalty for something unrelated possibly. I have only seen the oldest cached versions stay in the game when duped content myself.
  • phaugh
  • Professor
  • Professor
  • User avatar
  • Posts: 796

Post 3+ Months Ago

I recently tried the http://www.mydomain.com/interior-page.htm?keyphrase thing and it knocked my regular page http://www.mydomain.com/interior-page.htm out of the index...the new page is a PR2 and the old one was PR4...so it could be a fresh content thing....but like I said before it would be too easy to abuse....there must be other factors as well as PR.
  • Robbo
  • Born
  • Born
  • Robbo
  • Posts: 2
  • Loc: Thailand

Post 3+ Months Ago

Interesting posts.It could be the answer to what's happened to my newish site.
My home domain has been replaced in Google by the meta refresh URL of a directory that links to me.The cache of this Directory URL shows my homepage.
Am I being penalised by Google for duplicate content when it looks to me like they've wrongly indexed the directory as the authority site.I don't think there's any cloaking involved by the directory.
Any suggestions how I can get out of this mess would be great.E mailed google,but nothing seems to be happening at present.
Thanks Robbo
  • Robbo
  • Born
  • Born
  • Robbo
  • Posts: 2
  • Loc: Thailand

Post 3+ Months Ago

Amazing!Half an hour after posting my problem here the directory site listing which had replaced my URL has gone.
I'm a convert to OZZU -a good luck coincidence or a big helping hand from one of you guys?.....what do you think Axe?
You've made my day!
Robbo

Post Information

  • Total Posts in this topic: 14 posts
  • Users browsing this forum: No registered users and 8 guests
  • You cannot post new topics in this forum
  • You cannot reply to topics in this forum
  • You cannot edit your posts in this forum
  • You cannot delete your posts in this forum
  • You cannot post attachments in this forum
 
 

© 1998-2014. Ozzu® is a registered trademark of Unmelted, LLC.