Web Hosting Vodahost    

Home Take The Royal Tour! Order Now Features Prices
Go Back   Web Hosting > Search Engines & Directories > Search Engine Topics and VodaHits

Notices

Search Engine Topics and VodaHits Discussions and articles relating to search engines, SEO (Search Engine Optimaization), search engine submission, keywords, Metatags, other search engines and directories, etc...

Reply
 
Thread Tools
  #1  
Old 09-16-2009, 10:18 AM
John K.'s Avatar
First Sergeant
 
Join Date: Dec 2008
Location: Pattaya, Thailand
Posts: 75
Send a message via MSN to John K. Send a message via Skype™ to John K.
Default A lot of robots.txt file folders to block question

Hello! I have read numerous threads, yet none answer my curiosity so far.

Is this a good example of the content for a robots.txt file for pages I do not care to have the bots crawl?
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
# Exclude Files From All Robots:

User-agent: *
Disallow: /110_general_managers_statement.html
Disallow: /120_terms_and_conditions.html




# End robots.txt file
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

I did not place all the pages at this time. I have about 27 pages that I do not wish to be crawled simply because they are not important.

I wish only to make a sitemap.xml file for my main pages and list every other page that is not on my sitemap.

Is this normal to put all pages listed in the /public_html folder into either the sitemp.xml or the robots.txt so they are all accounted for?

Additionally, I understand that .swf, .jpg, mp3, and other files similar to these are not crawled by the bots even though they are in the /public_html folder?, yes?

Thank you kindly,

John
__________________
http://www.ProgressLanguage.com
Beginnings are fraught with many challenges. If it doesn't kill you, it will make you stronger.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #2  
Old 09-16-2009, 10:36 AM
John K.'s Avatar
First Sergeant
 
Join Date: Dec 2008
Location: Pattaya, Thailand
Posts: 75
Send a message via MSN to John K. Send a message via Skype™ to John K.
Default Re: A lot of robots.txt file folders to block question

OK, what the heck! here is the entire text I wish to post to the /public_html directory so that the bots will be blocked:
_ _ _ _ _

# Exclude Files From All Robots:

User-agent: *
Disallow: /110_general_managers_statement.html
Disallow: /120_terms_and_conditions.html
Disallow: /130_privacy_policy.html
Disallow: /210_our_staff.html
Disallow: /220_customer_testimonials.html
Disallow: /230_photo_gallery.html
Disallow: /300ru_student_visa.html
Disallow: /310_student_visa_process.html
Disallow: /310ru_student_visa_process.html
Disallow: /320_student_visa_policy.html
Disallow: /330_student_visa_referral_reward.html
Disallow: /510_why_is_english_important.html
Disallow: /610_learn_dutch.html
Disallow: /611_inburgerings_examen.html
Disallow: /620_learn_german.html
Disallow: /621_start_deutsch_1a1_test.html
Disallow: /630_other_languages.html
Disallow: /640_why_learn_another_language.html
Disallow: /710_all_document_translation.html
Disallow: /720_computer_skills_training.html
Disallow: /800a_test_preparation_course_prices.html
Disallow: /800b_academic_writing_and_business_course_prices.h tml
Disallow: /800c_other_language_course_prices.html
Disallow: /800d_thai_and_english_general_conversation_private _course_prices.html
Disallow: /800e_thai_and_english_general_conversation_group_c ourse_prices.html
Disallow: /800f_private_and_group_computer_course_prices.html





# End robots.txt file
- - - - -

Please let me know if I am on the right track. I will try to test this with the free online testers, but I do appreciate the input of the higher ranking personnel in this forum.
__________________
http://www.ProgressLanguage.com
Beginnings are fraught with many challenges. If it doesn't kill you, it will make you stronger.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #3  
Old 09-16-2009, 11:37 AM
John K.'s Avatar
First Sergeant
 
Join Date: Dec 2008
Location: Pattaya, Thailand
Posts: 75
Send a message via MSN to John K. Send a message via Skype™ to John K.
Default Re: A lot of robots.txt file folders to block question

Sorry, but I waited too long in order to edit my prior message. Another question I have about the robots.txt requesting the bots to NOT crawl the listed pages:

If the bots DO crawl even though I requested them not to, will I be penalized by Google if any of those pages are not SEO optimized?

Thsnk you

John
__________________
http://www.ProgressLanguage.com
Beginnings are fraught with many challenges. If it doesn't kill you, it will make you stronger.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #4  
Old 09-17-2009, 12:11 AM
Vasili's Avatar
General & Forum Moderator
 
Join Date: Mar 2006
Posts: 10,939
Arrow Re: A lot of robots.txt file folders to block question

Quote:
Originally Posted by John K. View Post
If the bots DO crawl even though I requested them not to, will I be penalized by Google if any of those pages are not SEO optimized?
You will not be penalized (unless you have distinct violations on your site), but neither will you earn greater valuations: you'll only get basic "earnings" ... without being "punished."
__________________
Choice Pro SEO
Choice Pro Webs
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #5  
Old 09-17-2009, 04:00 AM
John K.'s Avatar
First Sergeant
 
Join Date: Dec 2008
Location: Pattaya, Thailand
Posts: 75
Send a message via MSN to John K. Send a message via Skype™ to John K.
Default Re: A lot of robots.txt file folders to block question

Thank you, Vasilli. I am aware that I put up too much too fast, even though it took the time that it did. What I am most aware of out of all of this is that that one click of the "Publish" button enters the person constructing their website into a whole new world of knowledge that is extremely vital to the building process. The difference being a) building a web site without an eye towards how it will be "read" by the bots, and b) building a web site from an artistic frame of mind, without any regard or knowledge of how it will be "read" by the bots.

Any ways, at this point, I am going through my pages as quickly as possible, beginning with my sitemap pages, and then into my robots.txt pages, and making every change possible to receive the bots and get a decent rating. It is the first-time jitters, I guess.

To anyone who knows, I am also interested to know if it is proper to NOT leave ANY published page unlisted, and to either list them all on the sitemap.xml doc (to be crawled), or the robots.txt doc (to be blocked from beign crawled). This would include even the error, success, and custom 404 pages. Is this right thinking?

Thank you
John
__________________
http://www.ProgressLanguage.com
Beginnings are fraught with many challenges. If it doesn't kill you, it will make you stronger.
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
  #6  
Old 09-17-2009, 10:18 AM
Vasili's Avatar
General & Forum Moderator
 
Join Date: Mar 2006
Posts: 10,939
Wink Re: A lot of robots.txt file folders to block question

Quote:
Originally Posted by John K. View Post
I am also interested to know if it is proper to NOT leave ANY published page unlisted, and to either list them all on the sitemap.xml doc (to be crawled), or the robots.txt doc (to be blocked from beign crawled). This would include even the error, success, and custom 404 pages. Is this right thinking?
There is no reason to include your Email/Contact pages (including the "Success" page) in your 'Do Not Follow' file, as they represent commonly featured pages of normal navigation/construct. Neither would there be any reason to include any custom 404 pages ... it is a simple "form or error message" page that 'non-content' and not included in any navigation, and as such are typically ignored.

Your sitemap file is generated, and should include all "content" and 'navigable' pages (pages included in any hyperlinks on your site) that contribute values: this file is generally not manually created or modified.

You seem to be thinking too much into the process, and might be getting bogged down on minutia. Keep things simple and within general parameters and you will be fine!
__________________
Choice Pro SEO
Choice Pro Webs
Digg this Post!Add Post to del.icio.usBookmark Post in TechnoratiFurl this Post!Spurl this Post!Reddit! Wong this Post!
Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off



All times are GMT +1. The time now is 04:49 PM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2009, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC7
2005-2009 VodaHost Web Hosting Your Perfect Web Host - All Rights Reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203