Feb 05 2007
Duplicate Content in Joomla and Why it Matters
Monday, 05 February 2007

A few weeks ago we mentioned that there was some good news for those people who have duplicate content on their site. A Google staff member mentioned that there was no longer going to be an active penalty for websites that committed this particular mistake.

Some people were happy. Some were dubious and belived the penalty still existed. Some simply said, "what the beep is duplicate content and how does it affect my Joomla site"?

This post is that for that last group of people.

What is duplicate content?

When a website has several pages, all with substantially the same content.

Why is it disliked by search engines?

Because spammers can use this to make their site appear much bigger than it really is to search engines. If you come across an apparently small site that has 50,000 pages indexed by Google, its a fair bet they are using duplicate content to trick the search engines.

Do you think the penalty still exists?

No, but I believe duplicate content can still cause you a lot of problems and that it should be avoided. Aaron Wall, writer of the web's most popular SEO book, has recently mentioned that you can increase your search engine ranking, simply by preventing fewer junk pages from being indexed. Duplicate content produces a lot of junk pages and eventually Google's bots will get tired of visiting your many useless URLs.


Why is it a problem with Joomla?

Because Joomla has a tendency to produce many different URLs to just one page. We'll use this page as an example. The following six URLs can reach this page. Each URL has the same content and the same metadata. Its duplicate content hell:

  1. Regular, non Search Engine-Friendly URL
  2. Regular, non Search Engine-Friendly URL with a menu Itemid
  3. URL to make the page display as a PDF
  4. URL to make the page display in print view
  5. URL to make the page display in Print view with a menu Itemid
  6. URL with Search-Engine-Friendly URL component turned on

Adding more components can produce even more URLs.

How can you stop your Joomla site being penalised?

  1. Unpublish your PDF and Print buttons for all articles.
  2. Use JPromoter. Analyze your site and then go to "Optimize Your Site". Search by using "Group by Same Titles". Make sure you choose "No index" and "No follow" for all but one copy of each page. This means that Google should only index the pages you want indexed.
  3. Start your site right by choosing one SEF URL component and sticking with it as long as you possibly can. Different SEF components often render links in different ways.
  4. Instead of simply creating menu links to a component, create a URL link to the SEF URL for that component. For example, instead of having a menu link to "index.php?option=com_login&Itemid=65" you can have a menu link to "login". This makes sure that only the "login" URL is read by search engines.
  5. If you're a spammer .... stop!


Comments (15)Add Comment
Good points
written by Zorro, February 05, 2007
I'm with you on item 1.

As for JPromoter, contrary to what they're claiming on their website, there is the Joomla SEO Patch from joomlatwork.com which, at a fraction of the price, also gives full control over meta tags on non-com_content pages, besides doing other great things. (No I'm not affiliated.)

I'm with you again completely on item 3, and there is a strong reason for using a SEF URL component that you didn't even mention: It can rectify Joomla's inherent ItemID issues and make sure that a certain page is always reached via *ONE* URL no matter what ItemID Joomla thinks it should have. That's what I'm using OpenSEF for on almost all my sites.

OpenSEF also rewrites all internal links so you can leave them the standard way. I haven't tried SEF Advance but would imagine it operates much the same.

Thanks for the article and kind regards.
...
written by steve, February 05, 2007
Hi Zorro

Thanks for the great comment. Open-SEF has become a great product - it really is driven by talented developers.

I normally find that my choice of SEF component is determined which ones have sef_ext.php files for the other components we're using. For example, if we're using SOBI we go for Artio SEF, but if we're using Community Builder its SEF Advanced.
Stick to one SEF components
written by ronn, July 03, 2007
Yeah I agree that we must stick to one SEF component as long as we can. I just completely messed up my site simply by changing fromo OpenSEF to SH404SEF.

Eventhough I am happy with OpenSEF but due to several good review I read on SH404SEF, I feel the need to try the component which I definitely say a big misstake.

It messed up with content as well as pagerank. After several hours of headache I just installed the OpenSEF again.

regards
...
written by David Towers, October 04, 2007
Is it really necessary to disable the print function? Does this not just make the browers load up the page with linked to a different CSS?

In version 1.5 of Joomla, there is no need to disable PDFs as they have the no follow attribute built into the link, however the print buttons do not have that attribute built in.

So whats your advice, do you recommend disabling the print button on Joomla 1.5 sites then?

From an accessibility point of view and usability point of view, its really nice having a print option!
...
written by Steve Burge, October 04, 2007
Hi David

You're right - the PDF problem is much worse than the print problem.

In fact the PDF nofollow came from an idea by XTraze.net and a post on this site:

http://www.alledia.com/blog/search-engine-optimisation-(seo)/get-out-of-joomla-pdf-hell/

One easy way to remove the print pages might just be to use robots.txt and insert this:

disallow: /*print*
PDF and printing
written by Zorro, October 10, 2007
In Joomla 1.0.x, the best way to avoid having duplicate content is to disallow index2.php in the robots file. Both the print and PDF functions are run through index2.php which (from a search engine point of view) generates additional pages with the same content.

Printing through a "print" CSS is obviously the best and cleanest way - but Joomla doesn't support that out of the box. You have to switch off the print icon and make your own "print" CSS for this method to work.

In Joomla 1.5, disallowing index2.php doesn't work any more since they stopped using the index2.php method and now run everything through index.php. Steve's tip will remove the print pages, and with an additional

disallow: /*pdf*

you can remove the PDF pages as well if you don't trust the nofollow method.

Kind regards,
Zorro
...
written by Good Web Practices, October 18, 2007
It's great to be upto speed about this before it's too late. So basically from what I've understood, on my Joomla 1.5 setup, I can still leave the Pdf and Print page on (if I want to) and not have duplicate content problems by including the following two lines:

disallow: /*pdf*
/*print*

Is that right?
dissallow in robots.txt and leave print out / PDF on ?
written by Vacuum, November 21, 2007
If I understood Zorro well, if we disallow print and PDF in the robots.txt we can leave print and PDF joomla wide on and we doesn't have any printer or PDF duplicate content?
...
written by Zorro, November 22, 2007
Vacuum: Yes, that's what I'm saying. For Joomla 1.5, that is.

Kind fregards,
Zorro
...
written by brendan wilde, December 06, 2007
"Use JPromoter. Analyze your site and then go to "Optimize Your Site". Search by using "Group by Same Titles". Make sure you choose "No index" and "No follow" for all but one copy of each page. This means that Google should only index the pages you want indexed."


Can one set no index and no follow for all but one url without JPromoter... i.e. simply using Open SEF??
...
written by Steve Burge, December 06, 2007
Hi Brendan

You can. This advice is a little out-of-date. We'd probably recommend using the SEF Patch and sh404SEF to do the same task today.

OpenSEF is more or less extinct unfortunately.

Steve
...
written by brendan wilde, December 06, 2007
okay thanks...
One thing to consider
written by Micheas Herman, February 17, 2008
Google strongly hinted at joomla day west last year at the googleplex that if they detect a site is joomla site they drop into a special joomla site mode. This is why some pages that google says they will not index get indexed on joomla sites. There are a huge amount of joomla 1.0.x and mambo 4.x sites and they all do things that google hates, but there are so many of mambo and joomla site google has to cope.

Not that I don't use sh404sef for all of my joomla 1.0 sites, but I suspect that the penalty for not being google friendly is not that great at the moment. However, as sites slowly upgrade the odds of all but a small number of pages not being indexed by the major search engines increases.

Personally I suspect that it is a better use of resources to increase the organic links to your site than optimizing the site. Also, I find that sh404sef makes the site much easier for people to link to your site, so the effect of sh404sef may be more that the urls are human friendly and thus get reposted more.

Remember, SEO is mostly just the methodical work of posting good content and posting links to your content around the web. Shortcuts don't really exist.

Just my musings on the subject.
...
written by Steve Burge, February 17, 2008
Hi Michael

More than musings - some fascinating thoughts. I've mused on this topic before but never really been able to substantiate my ideas with much evidence:
http://www.alledia.com/blog/joomla-seo-ebook/do-search-engines-treat-joomla-sites-differently?/

With your first sentence: "if they detect a site is joomla site they drop into a special joomla site mode", could you tell us a little more about the hints they gave?
...
written by Rafi Michael - Toronto Weddings, March 16, 2008
that was great info i well make sure work on my site videobabylon.ca so i have no problims

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

busy
 
Joomla SEO Club and Book Logo
Search
Login
Blog Details

Subscribe by RSS

Creative Commons License All blog articles are licensed under a Creative Commons Attribution 3.0 United States License.
Top Comment Posters
Good Web Practices
(114 comments)
Klaus Nitsche
(78 comments)
Brian Teeman
(67 comments)
Hummerbie
(35 comments)
guido
(34 comments)
Ansiklopedi
(30 comments)
Amy Stephen
(29 comments)
Yannick Gaultier
(28 comments)
Cory
(27 comments)
Anthony Olsen
(18 comments)
Blog Categories
Accessibility
Affiliates
Alledia News
Analytics
Book Reviews
Business
Design
Domain Names
Domain Tips & Tricks
Drupal
E-Commerce
Extensions of the Month
General CMS Issues
Interviews
Joomla Affiliates
Joomla 1.5
Joomla Blogs
Joomla Hacks
Joomla Hosting
Joomla News
Joomla People
Joomla SEO eBook
Joomla SEO Rankings
Joomla Sites
Joomla South East
Joomla Templates
Joomla Tips & Tricks
Joomla URLs
Open Questions
PHP
Pay Per Click
Product Reviews
Rants
Scams
Recommended Sites
Search Engine Optimization
Security
SEO
SEO Club
SEO Competition
Site Reviews
Template Clubs
Yellow Pages
Virtuemart
Vlogs
Wordpress
Translate
right