May 03 2007
Get out of Joomla PDF Hell Print
Thursday, 03 May 2007

In previous blog posts, I've talked about how Joomla can create lots of duplicate content pages and wreak havoc with your Search Engine rankings. Today, we'll deal with a major culprit.

One Joomla's major causes of duplicate content is the PDF generator. Brian Teeman has even pointed out that when he does his in-depth searches for Joomla Weekly News, he finds many PDF pages ranking higher than the original pages.

The problem is so bad, and the PDF so useless, that if you check the demo of Joomla 1.5, you'll see that its about to be dropped. For those of us running the current version of Joomla, what do we do to avoid Joomla PDF hell?

  1. Unpublish the PDFs completely.
  2. Use robots.txt to stop Google from picking up the PDF pages.
  3. A very simple, but useful tip from XTraze.net. He suggests simply adding a "no-follow" to the PDF links. No-follow is often used by sites that suffer heavy spam attacks or have lots of extra pages that can reduce the value of their site as a whole.

Open up /components/com_content/content.html.php

<a href="<?php echo $link; ?>" target="_blank" onclick="window.open(’<?php echo $link; ?>’,'win2′,’<?php echo $status; ?>’); return false;" title="<?php echo _CMN_PDF;?>">>


Add the rel=”nofollow” attribute:

<a href="<?php echo $link; ?>" rel="nofollow" target="_blank" onclick="window.open(’<?php echo $link; ?>’,'win2′,’<?php echo $status; ?>’); return false;" title="<?php echo _CMN_PDF;?>">>


Comments (10)Add Comment
...
written by Anthony, May 02, 2007
Yes its funny isnt it. This was one of the features that attracted me to Joomla when I first saw it ... I thought wow! people can print, email or pdf your text thats got to be useful.But I have never used it once and neither have any of my users (as far as Im aware) ...
Joomla! Lead Developer
written by Johan Janssens, May 02, 2007
Actually for 1.5 we have completely refactored the pdf library. It now supports images and is fully internationalised. I have made a quick change to 1.5 and added the nofollow to the pdf links. Thanks for the tip !
Thanks Johan
written by steve, May 03, 2007
Thanks Johan - thats great news.

Its not a 100% perfect solution ... no-follow doesn't work on Ask.com, but fortunately I don't think anyone uses them anymore smilies/smiley.gif

Should work fine on Google, Yahoo and MSN

Steve
WOW!
written by sean, May 09, 2007
It all makes sense now. I was checking the "friendly" urls in Open-SEF on one of my site and I saw about 400 extra urls that could hardly be termed friendly. The week before when I'd implemented Open-SEF initially, those urls weren't there. Does that mean I have hundreds of PDF's lurking somewhere on my site, and, if so, where?
This remains my number one rated blog for SEO, period!
What about the print button ?
written by XTraze, May 17, 2007
I just went through that file source and found this but I couldn't find the Print button's link over there. Can you let me know if you got it ?

Thanks for these lines mate.

A very simple, but useful tip from XTraze.net. He suggests simply adding a "no-follow" to the PDF links. No-follow is often used by sites that suffer heavy spam attacks or have lots of extra pages that can reduce the value of their site as a whole.
Duplicate content from PDF?
written by Gavin, May 30, 2007
I have been thinking about this and I'm not 100% sure if this is causing any problems. I just read this article on Webmaster world http://www.webmasterworld.com/forum44/711.htm

I don't publish the PDF option in Joomla myself. If I don't publish the pdf option is it still neccessary to add the rel="nofollow" ?
...
written by steve, May 30, 2007
Hi Gavin

Thanks for the interesting link.

Even if they are right (I'd still disagree and say it can still cause problems with Google spidering your site), there's still the problem of PDF pages ranking above regular pages.

For example, you want to find the Joomla.org article about 100,000 forum members. Try searching Google for "joomla.org 100000" and the PDF comes up first. The original article is nowhere to be seen.

This means that visitors won't go to your site - they'll download the PDF. In all likelihood you've lost that visitor.
Rehash :)
written by Keith Schilling, July 12, 2007
Hate to rehash an old topic, but Google treated my URL's just fine when as I've never had the PDF button published...HOWEVER, when using site explorer with Yahoo i have PDF hell everywhere. So, I'll try this trick and see if Yahoo treats me better.
...
written by rachel, November 22, 2007
thank you so much for all the info on this site. i just found it through google and love it! it's definitely going in my bookmarks.
...
written by Kingdom, April 29, 2008
I found I had an exta > which showed on the page when I changed the code!

Write comment
quote
bold
italicize
underline
strike
url
image
quote
quote
smile
wink
laugh
grin
angry
sad
shocked
cool
tongue
kiss
cry
smaller | bigger

busy
 
Joomla SEO Club and Book Logo
Search
Login
Blog Details

Subscribe by RSS

Creative Commons License All blog articles are licensed under a Creative Commons Attribution 3.0 United States License.
Top Comment Posters
Good Web Practices
(114 comments)
Klaus Nitsche
(78 comments)
Brian Teeman
(67 comments)
Hummerbie
(35 comments)
guido
(34 comments)
Ansiklopedi
(30 comments)
Amy Stephen
(29 comments)
Yannick Gaultier
(28 comments)
Cory
(27 comments)
Anthony Olsen
(18 comments)
Blog Categories
Accessibility
Affiliates
Alledia News
Analytics
Book Reviews
Business
Design
Domain Names
Domain Tips & Tricks
Drupal
E-Commerce
Extensions of the Month
General CMS Issues
Interviews
Joomla Affiliates
Joomla 1.5
Joomla Blogs
Joomla Hacks
Joomla Hosting
Joomla News
Joomla People
Joomla SEO eBook
Joomla SEO Rankings
Joomla Sites
Joomla South East
Joomla Templates
Joomla Tips & Tricks
Joomla URLs
Open Questions
PHP
Pay Per Click
Product Reviews
Rants
Scams
Recommended Sites
Search Engine Optimization
Security
SEO
SEO Club
SEO Competition
Site Reviews
Template Clubs
Yellow Pages
Virtuemart
Vlogs
Wordpress
Translate
right