What is robots.txt?
![]()
For those of you out there who like me don’t know a great deal about the techie side of the internet then this explanation is for you.
I have read about the robots.txt file but it has never really sunk in as to what it actually means, so I thought the only way I would ever understand it was if I was to do a bit of research and write about it. As you will see by now, I love to research stuff.
So this guide will explain what the robots.txt file is in simple terms (because that’s the only way I can understand it) and I will provide a step-by-step to creating your own robots.txt file.
So first of all, what exactly is a robots.txt file? Well, as the file extension indicates it is a text file. This means a plain text file and not html.
What does the robots.txt file do?
It instructs the search robots to not index certain web pages, files or directories in their search engine. This is useful if you have certain pages that you want on the internet but you don’t want the rest of the world to see them.
Bear in mind however, that this won’t keep a web page completely private, it just means that the page won’t show up in search engines results. Anyone who has a browser and knows the url or can find the url through other means will still be able to see it.
Also keep in mind that not all search engines acknowledge the robots.txt file and will subsequently list your page in their search engine anyway.
The best way to stop someone from seeing or accessing a page on your website is to password protect it.
How do you create your own robots.txt file?
1. Determine which files or directories you want excluded.
2. Open Notepad (you can access this by clicking START> Programs> Accessories> Notepad) and type in the following information from the table below depending on what you want to do:
| Type this into Notepad: | If you want to do this: |
|---|---|
| User-agent: * Disallow: |
Allow all robots to retrieve all pages on your website. Note that you can also just add an empty robots.txt file to do this. |
| User-agent: Disallow: / |
Disallow all robots from retrieving all pages on your website. This will pretty much mean that none of your web pages will appear in the search engines. |
| User-agent: Googlebot Disallow: / |
Disallow a specific robot from accessing all files on your website. We have used the Googlebot robot here as an example, so substitute with the robot of your choice. This would mean that your website would not be indexed in the Google search engine but would be indexed in any of the other search engines like Yahoo and MSN. |
| User-agent: Googlebot Disallow: User agent: * Disallow: / |
Allow one robot but disallow all others. We have used Googlebot as an example so substitute with the robot of your choice. |
| User-agent: * Disallow: /directory/file.html |
Disallow all robots from retrieving a specific file. In this case we have used directory/file.html as an example so substitute your own. |
| User-agent: * Disallow: /images/ |
Disallow all robots from retrieving a specific directory of your website. In this case we have used /images/ as an example so substitute your own. |
| User-agent: * Disallow: /images/ Disallow: /private/ |
Disallow all robots from retrieving more than one directory of your website. In this case we have used /images/ and /private/ as examples so substitute your own. |
3. Save this file as robots.txt.
4. Now you need to upload this file to the root directory of your website. I use Filezilla to upload files to the internet. The file needs to be sitting in the same folder as your home page (index.htm). In other words it should look like this when uploaded: www.mydomain.com/robots.txt
Let’s use an example here just to be clear how this is done. I have a page on my website which I want to exclude from the search engines. The file name is: www.mydomain.com/mypage.html.
I would now open up Notepad and type in the following text based on the table above:
User-agent: *
Disallow: /mypage.html
If the page was sitting in a directory called ‘mydirectory’ ie.www.mydomain.com/mydirectory/mypage.html then I would type in this:
User-agent: *
Disallow: /mydirectory/mypage.html
Once that information is entered into Notepad I save the file on my computer as robots.txt.
I would then open up Filezilla which is the file transfer system I use to upload my web pages to the internet and transfer the robots.txt file over to wherever my website pages are stored.
Too easy!…well hopefully you found it easy to do. You would need some idea of how to load up files to the internet to do it but if you can do that bit everything else should be quite straight forward.
FINALLY RELEASED - Our complete guide to making money online by promoting Amazon products. Head over to the AMAZONIAN PROFIT PLAN website for all the latest info.
Like this post? Subscribe to my RSS feed and get loads more!
We are
Wanda & Paula - 2 friends who successfully operate a number of online internet businesses. In fact we have over 20 blogs and websites making money. We have learnt a lot over the years particularly in the area of natural search, SEO
and traffic generation. If you want to make a living online then we are here to
help. 









Leave a Reply