Thursday, February 19, 2015

The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (typically search engine robots) how to crawl and index pages on their website.

Search engine like Google, Bing, Yahoo sends spiders or crawlers to crawl your website. When these crawlers or web spiders reach your site they first go through your robots.txt file to check any robots exclusion protocol that is before crawling and indexing your pages.

Robots.txt is a normal text file which is available on all websites that is used by a webmaster to advice these crawlers about accessing several pages on a website.


The pages that are restricted in your robots.txt file will won’t be crawled and indexed in search results. However all those page are viewable publicly to normal humans.

If you want to check your blog robots.txt file then just put robots.txt after your blog URL. i.e
User-agent: * Disallow: /
User-agent:* >> The user -agent * means this section applies to all robots. 

Disallow: / >> The Dissallow: / tell the robot that it should not visit any pages on the site.
User-agent: Mediapartners-Google
User-agent: *
Disallow: /search
Allow: /

AnsMachine . 2017 Copyright. All rights reserved. Designed by Blogger Template | Free Blogger Templates