Home / Search Engine Optimization / Get out of Joomla PDF Hell 
Search Engine Optimization
May
03
2007
Get out of Joomla PDF Hell
Written by Steve Burge   
Avatar

In previous blog posts, I've talked about how Joomla can create lots of duplicate content pages and wreak havoc with your Search Engine rankings. Today, we'll deal with a major culprit.

 

One Joomla's major causes of duplicate content is the PDF generator. Brian Teeman has even pointed out that when he does his in-depth searches for Joomla Weekly News, he finds many PDF pages ranking higher than the original pages.

 

The problem is so bad, and the PDF so useless, that if you check the demo of Joomla 1.5, you'll see that its about to be dropped. For those of us running  the current version of Joomla, what do we do to avoid Joomla PDF hell?

 

  1. Unpublish the PDFs completely.
  2. Use robots.txt to stop Google from picking up the PDF pages.
  3. A very simple, but useful tip from XTraze.net. He suggests simply adding a "no-follow" to the PDF links. No-follow is often used by sites that suffer heavy spam attacks or have lots of extra pages that can reduce the value of their site as a whole.

 

Open up /components/com_content/content.html.php

    <a href="<?php echo $link; ?>" target="_blank" onclick="window.open(’<?php echo $link; ?>’,'win2′,’<?php echo $status; ?>’); return false;" title="<?php echo _CMN_PDF;?>">>


Add the rel=”nofollow” attribute:

    <a href="<?php echo $link; ?>" rel="nofollow" target="_blank" onclick="window.open(’<?php echo $link; ?>’,'win2′,’<?php echo $status; ?>’); return false;" title="<?php echo _CMN_PDF;?>">>


 

 

Comments  

 
#1 Anthony Olsen 2007-05-02 21:12
Yes its funny isnt it. This was one of the features that attracted me to Joomla when I first saw it ... I thought wow! people can print, email or pdf your text thats got to be useful.But I have never used it once and neither have any of my users (as far as Im aware) ...
Quote
 
 
#2 Johan Janssens 2007-05-02 22:19
Actually for 1.5 we have completely refactored the pdf library. It now supports images and is fully internationalis ed. I have made a quick change to 1.5 and added the nofollow to the pdf links. Thanks for the tip !
Quote
 
 
#3 Steve Burge 2007-05-03 08:25
Thanks Johan - thats great news.

Its not a 100% perfect solution ... no-follow doesn't work on Ask.com, but fortunately I don't think anyone uses them anymore :-)

Should work fine on Google, Yahoo and MSN

Steve
Quote
 
 
#4 sean 2007-05-09 14:57
It all makes sense now. I was checking the "friendly" urls in Open-SEF on one of my site and I saw about 400 extra urls that could hardly be termed friendly. The week before when I'd implemented Open-SEF initially, those urls weren't there. Does that mean I have hundreds of PDF's lurking somewhere on my site, and, if so, where?
This remains my number one rated blog for SEO, period!
Quote
 
 
#5 XTraze 2007-05-17 05:29
I just went through that file source and found this but I couldn't find the Print button's link over there. Can you let me know if you got it ?

Thanks for these lines mate.

Quote:
A very simple, but useful tip from XTraze.net. He suggests simply adding a "no-follow" to the PDF links. No-follow is often used by sites that suffer heavy spam attacks or have lots of extra pages that can reduce the value of their site as a whole.
Quote
 
 
#6 Gavin 2007-05-30 02:44
I have been thinking about this and I'm not 100% sure if this is causing any problems. I just read this article on Webmaster world http://www.webmasterworld.com/forum44/711.htm

I don't publish the PDF option in Joomla myself. If I don't publish the pdf option is it still neccessary to add the rel="nofollow" ?
Quote
 
 
#7 Steve Burge 2007-05-30 08:52
Hi Gavin

Thanks for the interesting link.

Even if they are right (I'd still disagree and say it can still cause problems with Google spidering your site), there's still the problem of PDF pages ranking above regular pages.

For example, you want to find the Joomla.org article about 100,000 forum members. Try searching Google for "joomla.org 100000" and the PDF comes up first. The original article is nowhere to be seen.

This means that visitors won't go to your site - they'll download the PDF. In all likelihood you've lost that visitor.
Quote
 
 
#8 Keith Schilling 2007-07-12 15:45
Hate to rehash an old topic, but Google treated my URL's just fine when as I've never had the PDF button published...HOWEVER, when using site explorer with Yahoo i have PDF hell everywhere. So, I'll try this trick and see if Yahoo treats me better.
Quote
 
 
#9 rachel 2007-11-22 05:42
thank you so much for all the info on this site. i just found it through google and love it! it's definitely going in my bookmarks.
Quote
 
 
#10 Kingdom 2008-04-29 13:33
I found I had an exta > which showed on the page when I changed the code!
Quote
 

Add comment


Security code
Refresh