SEOPDXSEO ServicesSEM Servicesemail MarketingSEOSDSEO Web Design
Free ReportsFree Video SEO Site ReviewSubscribe to RSSemail SEOPDXSitemap


Search Engine Optimizician

Digital Strategist • Gary Pool SEO

robots.txt – Why is Google Showing the Pages I Have Blocked?

Posted • October 17, 2009 • 1 Comment

robots-txt-why-is-google-showing-the-pages-i-have-blocked

Video from Google’s Matt Cutts explaining why your blocked pages are showing up in search results.

Matt CuttsA robots.txt file restricts access to your site by search engine bots that crawl the web. While Google won’t crawl or index the content of pages blocked by robots.txt, they may still index the URLs if they are found in other pages on the web.

To use a robots.txt file, you’ll need to have root access to your server.

A robot.txt file can be very useful. It is used to tell the search engines what folders to stay out of and where your sitemap.xml file is located. If you have an area of your site you don’t want displayed by the search engines this is what you use to accomplish it. ( e.g. you have special landing pages for your pay-per-click or other campaigns that contain duplicate content search engines might penalize you for duplicate content.)

The correct syntax for the robots.txt file is:
# /robots.txt file for http://www.yourdomain.com/
(Although not necessary, I use this statement commented out, because all the files must be named the same and I like to know which site it is that I am working on.)

User-agent: *
Disallow: /awstats/
Disallow: /cgi-bin/
Disallow: /Scripts/
Disallow: /formmail/
Disallow: /problem/
Disallow: /webalizer/
Disallow: /wordpress/
Disallow: /contact/
Sitemap: http://www.yourdomain.com/sitemap.xml

For more information on the use of robots.txt and other important files visit the Top Ten Rules for Beginning SEO
(Post from Search Engine Optimizician.)
Google will show robots.txt blocked URLs even if they haven’t crawled them.
Google just won’t show any summary from the page. They will however show related information from sites like DMOZ.DMOZ

If really don’t want the page to show up in search results use the no index meta tag
<META NAME=”ROBOTS” CONTENT=”NOINDEX,NOFOLLOW”>
or use the URL removal tool inside of Google Webmaster Tools.

The meta robots tag will actually remove previously indexed URLs. Google’s information on removing URLs is here.

This answers a couple questions such as:
- Why is my url showing up in Google when I blocked it in robots.txt? Did you fetch that url?
- How do I make that url disappear from Google?

For even more information about blocking or removing pages using a robots.txt file visit Google Webmaster Central.

Late,
Gary Pool

Don't keep it to your self share this post with others

  • Twitter
  • Facebook
  • LinkedIn
  • MySpace
  • Sphinn
  • StumbleUpon
  • Digg
  • del.icio.us
  • Reddit
  • FriendFeed
  • email
  • RSS

Comments

One Response to “robots.txt – Why is Google Showing the Pages I Have Blocked?”

  1. Tweets that mention Google : robots.txt : Search Results : Search Engine Optimizician -- Topsy.com
    October 17th, 2009 @ 2:23 pm

    [...] This post was mentioned on Twitter by Gary Pool and Alltop, Aggie Greiter. Aggie Greiter said: robots.txt – Why is Google Showing the Pages I Have Blocked? http://bit.ly/GKWPw [...]


  • Optimizician
    Insider Info

    email:
    safe subscribe
  • Gary Pool SEO

    follow Gary Pool on Twitter
    Subscribe to the Search Engine Optimizician
    View Gary Pool's profile on LinkedIn
    Send Gary Pool and invitation to LinkIn
    View Gary Pool's profile on Facebook
    Gary Pool - Konnects
    Join My Community at MyBloglog!
  • Tools To Help

    SEO Automatic WordPress Plugin

    email Marketing for Small Business

    SEOmoz.org - Learn From SEO Experts. Become an Expert.

    NicheBOT Premium Keyword Research - Finds exactly what people search for

    TextLinks Free 21 day trial

    TweetAdder Automate Twitter Promotion & Marketing

    Search Engine Strategies Conferences & Expo


  • Popular Posts

  • Category Cloud

    Yahoo! WordPress Windows XP WIFI Video SEO Site Review Video twitter Tools twitter Top 10 SEO Tips and Tricks Sustainability Software Social Networking Social network Social Media SEO SEM Search Engines Search Scams Restaurants Raves Rants Rand Fishkin PPC PMUG Personal Pee Cee OS X Networking MSN Movies Microsoft Meta Tags Matt Cutts Marketing Mac M$ Live Linux LinkedIn Learning Internet Hardware Guest Blog Post Green Google Fun Food Firefox Facebook Entertainment Computers Coffee Code Business Blog Bing Ask Apple All That Jazz Advertising

  • Tag Cloud

  • TwitterCounter

  • New @garypool Followers

  • Creative Commons

  • Get Adobe Flash playerPlugin by wpburn.com wordpress themes