Welcome to Alledia, the Joomla SEO Experts

Here at Alledia we provide you with advice and extensions to rank your Joomla! sites high in search engines such as Google, Yahoo and MSN.

You can read the most popular Joomla blog, join the Joomla SEO Club, check out our SEO-friendly Joomla template or attend a Joomla training class.

Home / Blog / Search Engine Optimisation (SEO) / Duplicate Content in Joomla and Why it Matters

05

Feb

2007

Alledia.com Blog

Duplicate Content in Joomla and Why it Matters

Search Engine Optimisation (SEO)
Written by Steve Burge   

A few weeks ago we mentioned that there was some good news for those people who have duplicate content on their site. A Google staff member mentioned that there was no longer going to be an active penalty for websites that committed this particular mistake.

 

Some people were happy. Some were dubious and belived the penalty still existed. Some simply said, "what the beep is duplicate content and how does it affect my Joomla site"?

 

This post is that for that last group of people.

 

What is duplicate content?

When a website has several pages, all with substantially the same content.

 

Why is it disliked by search engines?

Because spammers can use this to make their site appear much bigger than it really is to search engines. If you come across an apparently small site that has 50,000 pages indexed by Google, its a fair bet they are using duplicate content to trick the search engines.

 

Do you think the penalty still exists?

No, but I believe duplicate content can still cause you a lot of problems and that it should be avoided. Aaron Wall, writer of the web's most popular SEO book, has recently mentioned that you can increase your search engine ranking, simply by preventing fewer junk pages from being indexed. Duplicate content produces a lot of junk pages and eventually Google's bots will get tired of visiting your many useless URLs. 


Why is it a problem with Joomla?

Because Joomla has a tendency to produce many different URLs to just one page. We'll use this page as an example. The following six URLs can reach this page. Each URL has the same content and the same metadata. Its duplicate content hell:

  1. Regular, non Search Engine-Friendly URL
  2. Regular, non Search Engine-Friendly URL with a menu Itemid
  3. URL to make the page display as a PDF
  4. URL to make the page display in print view
  5. URL to make the page display in Print view with a menu Itemid
  6. URL with Search-Engine-Friendly URL component turned on

Adding more components can produce even more URLs.

 

How can you stop your Joomla site being penalised?

  1. Unpublish your PDF and Print buttons for all articles.
  2. Use JPromoter. Analyze your site and then go to "Optimize Your Site". Search by using "Group by Same Titles". Make sure you choose "No index" and "No follow" for all but one copy of each page. This means that Google should only index the pages you want indexed.
  3. Start your site right by choosing one SEF URL component and sticking with it as long as you possibly can. Different SEF components often render links in different ways.
  4. Instead of simply creating menu links to a component, create a URL link to the SEF URL for that component. For example, instead of having a menu link to "index.php?option=com_login&Itemid=65" you can have a menu link to "login". This makes sure that only the "login" URL is read by search engines.
  5. If you're a spammer .... stop!


Comments

(20)Add Comment
0
Zorro
February 05, 2007

I'm with you on item 1.

As for JPromoter, contrary to what they're claiming on their website, there is the Joomla SEO Patch from joomlatwork.com which, at a fraction of the price, also gives full control over meta tags on non-com_content pages, besides doing other great things. (No I'm not affiliated.)

I'm with you again completely on item 3, and there is a strong reason for using a SEF URL component that you didn't even mention: It can rectify Joomla's inherent ItemID issues and make sure that a certain page is always reached via *ONE* URL no matter what ItemID Joomla thinks it should have. That's what I'm using OpenSEF for on almost all my sites.

OpenSEF also rewrites all internal links so you can leave them the standard way. I haven't tried SEF Advance but would imagine it operates much the same.

Thanks for the article and kind regards.

Steve Burge
Steve Burge
February 05, 2007

Hi Zorro

Thanks for the great comment. Open-SEF has become a great product - it really is driven by talented developers.

I normally find that my choice of SEF component is determined which ones have sef_ext.php files for the other components we're using. For example, if we're using SOBI we go for Artio SEF, but if we're using Community Builder its SEF Advanced.

0
ronn
July 04, 2007

Yeah I agree that we must stick to one SEF component as long as we can. I just completely messed up my site simply by changing fromo OpenSEF to SH404SEF.

Eventhough I am happy with OpenSEF but due to several good review I read on SH404SEF, I feel the need to try the component which I definitely say a big misstake.

It messed up with content as well as pagerank. After several hours of headache I just installed the OpenSEF again.

regards

0
David Towers
October 04, 2007

Is it really necessary to disable the print function? Does this not just make the browers load up the page with linked to a different CSS?

In version 1.5 of Joomla, there is no need to disable PDFs as they have the no follow attribute built into the link, however the print buttons do not have that attribute built in.

So whats your advice, do you recommend disabling the print button on Joomla 1.5 sites then?

From an accessibility point of view and usability point of view, its really nice having a print option!

Steve Burge
Steve Burge
October 04, 2007

Hi David

You're right - the PDF problem is much worse than the print problem.

In fact the PDF nofollow came from an idea by XTraze.net and a post on this site:

http://www.alledia.com/blog/search-engine-optimisation-(seo)/get-out-of-joomla-pdf-hell/

One easy way to remove the print pages might just be to use robots.txt and insert this:

disallow: /*print*

Klaus Nitsche
Klaus Nitsche
October 10, 2007

In Joomla 1.0.x, the best way to avoid having duplicate content is to disallow index2.php in the robots file. Both the print and PDF functions are run through index2.php which (from a search engine point of view) generates additional pages with the same content.

Printing through a "print" CSS is obviously the best and cleanest way - but Joomla doesn't support that out of the box. You have to switch off the print icon and make your own "print" CSS for this method to work.

In Joomla 1.5, disallowing index2.php doesn't work any more since they stopped using the index2.php method and now run everything through index.php. Steve's tip will remove the print pages, and with an additional

disallow: /*pdf*

you can remove the PDF pages as well if you don't trust the nofollow method.

Kind regards,
Zorro

David Towers
Good Web Practices
October 18, 2007

It's great to be upto speed about this before it's too late. So basically from what I've understood, on my Joomla 1.5 setup, I can still leave the Pdf and Print page on (if I want to) and not have duplicate content problems by including the following two lines:

disallow: /*pdf*
/*print*

Is that right?

0
Vacuum
November 21, 2007

If I understood Zorro well, if we disallow print and PDF in the robots.txt we can leave print and PDF joomla wide on and we doesn't have any printer or PDF duplicate content?

Klaus Nitsche
Klaus Nitsche
November 22, 2007

Vacuum: Yes, that's what I'm saying. For Joomla 1.5, that is.

Kind fregards,
Zorro

0
brendan wilde
December 06, 2007

"Use JPromoter. Analyze your site and then go to "Optimize Your Site". Search by using "Group by Same Titles". Make sure you choose "No index" and "No follow" for all but one copy of each page. This means that Google should only index the pages you want indexed."


Can one set no index and no follow for all but one url without JPromoter... i.e. simply using Open SEF??

Steve Burge
Steve Burge
December 06, 2007

Hi Brendan

You can. This advice is a little out-of-date. We'd probably recommend using the SEF Patch and sh404SEF to do the same task today.

OpenSEF is more or less extinct unfortunately.

Steve

0
brendan wilde
December 06, 2007

okay thanks...

0
Micheas Herman
February 17, 2008

Google strongly hinted at joomla day west last year at the googleplex that if they detect a site is joomla site they drop into a special joomla site mode. This is why some pages that google says they will not index get indexed on joomla sites. There are a huge amount of joomla 1.0.x and mambo 4.x sites and they all do things that google hates, but there are so many of mambo and joomla site google has to cope.

Not that I don't use sh404sef for all of my joomla 1.0 sites, but I suspect that the penalty for not being google friendly is not that great at the moment. However, as sites slowly upgrade the odds of all but a small number of pages not being indexed by the major search engines increases.

Personally I suspect that it is a better use of resources to increase the organic links to your site than optimizing the site. Also, I find that sh404sef makes the site much easier for people to link to your site, so the effect of sh404sef may be more that the urls are human friendly and thus get reposted more.

Remember, SEO is mostly just the methodical work of posting good content and posting links to your content around the web. Shortcuts don't really exist.

Just my musings on the subject.

Steve Burge
Steve Burge
February 17, 2008

Hi Michael

More than musings - some fascinating thoughts. I've mused on this topic before but never really been able to substantiate my ideas with much evidence:
http://www.alledia.com/blog/joomla-seo-ebook/do-search-engines-treat-joomla-sites-differently?/

With your first sentence: "if they detect a site is joomla site they drop into a special joomla site mode", could you tell us a little more about the hints they gave?

0
Rafi Michael - Toronto Weddings
March 16, 2008

that was great info i well make sure work on my site videobabylon.ca so i have no problims

0
MikeH
September 29, 2008

Thanks for the insight!

Do you have any idea why the current version of coomla (1.5.7) produce different URLs for section layouts and category layouts? And some good way of solving this problem when creating a silo structured site. Currently (with core SEF turned on) Category layouts link to content items like:

http://www.domain.com/sectionNAME/categoryNAME/ID-ArticleNAME

while sections link to the same piece of content like:

http://www.domain.com/sectionNAME/categoryID/ID-ArticleNAME

Front page uses the same formatting as categories, and I don't really see the point of formatting sectionURLs different than everything else...

But is there any convenient way to get around this inconsistency...

0
SEOexpert...
November 14, 2008

and don't forget to disable RSS !!! this was made to create duplicate content... Really who could be so stupid to make it easy for other sites to create your content in duplicate.

I think I have out-smart these search engines and just create my content in many duplicates as Screen shots png images with meta tags and links back to original.

STOP RSS XML they are evil for the web....

0
jamesff
February 02, 2009

In regards to the duplicate content part of this blog post, I personally use the http://www.copygator.com website to find and stop duplicate content:

1. it's automated and brings me results instead of me searching for duplicated content. All i had to do was submit my feed and it started monitoring my feed showing me who's republished my articles on the web.

2. i get notified by email so it contacts me when it finds copies of my articles online.

3. i use their image badge feature to alert me directly on my website when my content is being lifted.

4. it's a free service as opposed the "per page" cost of copyscape/copysentry.

0
refco
March 31, 2009

made with joomla and the only component www.coomla.com weblink uses.
50 sheets of labels has been resolved the incident site because of labels have been 10,000 page index.
What is your comment on this issue

0
Rick
June 03, 2009

I don't believe that Google has a penalty for duplicate content on your own domain. It's when it's on more than one domain that it's a problem.

Write comment

 
  smaller | bigger
 

busy