| Written by Steve Burge |
 Back in Janaury I mentioned that I'd met a puzzle I couldn't solve ... Google was indexing a lot of search results on Joomla sites.
The problem appeared all on kinds of Joomla sites and with all kinds of URL extensions. I just couldn't work out what where the bug was in Joomla. It turns out the bug was in Google. The Google BugGoogle is trying a new crawling method ... automatically filling in forms such as search boxes in order to try and find new URLs. Matt Cutts discusses it here. Unfortunately they are creating new pages as well as finding them and in Joomla the main outcome is that random search pages are indexed. Example of the Problem URLs
- Default Joomla URLs : /index.php?option=com_search&searchword=stuff
- Default SEF URLs: /component/option,com_search/Itemid,38/index.php?searchword=stuff
- sh404SEF: /search/newest-first.html?searchphrase=any&searchword=stuff
SolutionAdd the search component to your robots.txt file. With the examples above, you would use this code: - Default Joomla URLs: Disallow: /*com_search*/
- Default SEF URLs: Disallow: /*com_search*/
- sh404SEF: Disallow: /search/
Other Recent Google Bugs You May Have Missed
|
Comments
Thanks for your post again
Regards - Tina
This is good advise I will be placing the code in my robots.txt
thanks
I see 300,000 problem URLs with basic Joomla URLs:
http://www.google.com/search?hl=en&q=allinurl: index.php?option=com_search ([url:error])
I'm sure theres a lot more with other URL variations.
i am writing on alledia.com for very first time.
if anything i am doing wrong,Please advise me.I am not techie guy.
Google is crawling our site but there are some url showing up like following even after we have blocked through robot.txt file.
When we search " folding gate ottawa"
See Following URl
http://www.google.com/search?q=folding gate ottawa&sourceid=navclient-ff&ie=UTF-8&rlz=1B3GGGL_enCA237CA238
google shows joomla search results a randomly created one.Does google creating url with joomla searchbot?
Is it due to Virtuemart Product Serchbot or any permission or still robot problem?
Any clue how to overcome this
RSS feed for comments to this post