robots.txt – Why is Google Showing the Pages I Have Blocked?
Posted • October 17, 2009 • 1 Comment
Video from Google’s Matt Cutts explaining why your blocked pages are showing up in search results.
A robots.txt file restricts access to your site by search engine bots that crawl the web. While Google won’t crawl or index the content of pages blocked by robots.txt, they may still index the URLs if they are found in other pages on the web.
To use a robots.txt file, you’ll need to have root access to your server.
A robot.txt file can be very useful. It is used to tell the search engines what folders to stay out of and where your sitemap.xml file is located. If you have an area of your site you don’t want displayed by the search engines this is what you use to accomplish it. ( e.g. you have special landing pages for your pay-per-click or other campaigns that contain duplicate content search engines might penalize you for duplicate content.)
The correct syntax for the robots.txt file is:
# /robots.txt file for http://www.yourdomain.com/
(Although not necessary, I use this statement commented out, because all the files must be named the same and I like to know which site it is that I am working on.)
User-agent: *
Disallow: /awstats/
Disallow: /cgi-bin/
Disallow: /Scripts/
Disallow: /formmail/
Disallow: /problem/
Disallow: /webalizer/
Disallow: /wordpress/
Disallow: /contact/
Sitemap: http://www.yourdomain.com/sitemap.xml
For more information on the use of robots.txt and other important files visit the Top Ten Rules for Beginning SEO
(Post from Search Engine Optimizician.)
Google will show robots.txt blocked URLs even if they haven’t crawled them.
Google just won’t show any summary from the page. They will however show related information from sites like DMOZ.![]()
If really don’t want the page to show up in search results use the no index meta tag
<META NAME=”ROBOTS” CONTENT=”NOINDEX,NOFOLLOW”>
or use the URL removal tool inside of Google Webmaster Tools.
The meta robots tag will actually remove previously indexed URLs. Google’s information on removing URLs is here.
This answers a couple questions such as:
- Why is my url showing up in Google when I blocked it in robots.txt? Did you fetch that url?
- How do I make that url disappear from Google?
For even more information about blocking or removing pages using a robots.txt file visit Google Webmaster Central.
Late,
Gary Pool
Terms related to this post:
cache:Xk9PZr-bOvcJ:www searchengineoptimizationportland com/blog/2009/10/email-scam-uk-imf/ address of imf office in uk
cache:Xk9PZr-bOvcJ:www searchengineoptimizationportland com/blog/2009/10/email-scam-uk-imf/ john @yahoo com@gmail com working in company or office in 2010
cache:Xk9PZr-bOvcJ:www searchengineoptimizationportland com/blog/2009/10/email-scam-uk-imf/ john email online in 2010@yahoo com@hotmail com
cache:Xk9PZr-bOvcJ:www searchengineoptimizationportland com/blog/2009/10/email-scam-uk-imf/ name of companies owners in cities of ukfaxemail@yahoo com @hotmail com
Comments
One Response to “robots.txt – Why is Google Showing the Pages I Have Blocked?”



ShareThis













October 17th, 2009 @ 2:23 pm
[...] This post was mentioned on Twitter by Gary Pool and Alltop, Aggie Greiter. Aggie Greiter said: robots.txt – Why is Google Showing the Pages I Have Blocked? http://bit.ly/GKWPw [...]