Announcement

Collapse
No announcement yet.

A lot of robots.txt file folders to block question

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • A lot of robots.txt file folders to block question

    Hello! I have read numerous threads, yet none answer my curiosity so far.

    Is this a good example of the content for a robots.txt file for pages I do not care to have the bots crawl?
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
    # Exclude Files From All Robots:

    User-agent: *
    Disallow: /110_general_managers_statement.html
    Disallow: /120_terms_and_conditions.html




    # End robots.txt file
    _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

    I did not place all the pages at this time. I have about 27 pages that I do not wish to be crawled simply because they are not important.

    I wish only to make a sitemap.xml file for my main pages and list every other page that is not on my sitemap.

    Is this normal to put all pages listed in the /public_html folder into either the sitemp.xml or the robots.txt so they are all accounted for?

    Additionally, I understand that .swf, .jpg, mp3, and other files similar to these are not crawled by the bots even though they are in the /public_html folder?, yes?

    Thank you kindly,

    John

  • #2
    Re: A lot of robots.txt file folders to block question

    OK, what the heck! here is the entire text I wish to post to the /public_html directory so that the bots will be blocked:
    _ _ _ _ _

    # Exclude Files From All Robots:

    User-agent: *
    Disallow: /110_general_managers_statement.html
    Disallow: /120_terms_and_conditions.html
    Disallow: /130_privacy_policy.html
    Disallow: /210_our_staff.html
    Disallow: /220_customer_testimonials.html
    Disallow: /230_photo_gallery.html
    Disallow: /300ru_student_visa.html
    Disallow: /310_student_visa_process.html
    Disallow: /310ru_student_visa_process.html
    Disallow: /320_student_visa_policy.html
    Disallow: /330_student_visa_referral_reward.html
    Disallow: /510_why_is_english_important.html
    Disallow: /610_learn_dutch.html
    Disallow: /611_inburgerings_examen.html
    Disallow: /620_learn_german.html
    Disallow: /621_start_deutsch_1a1_test.html
    Disallow: /630_other_languages.html
    Disallow: /640_why_learn_another_language.html
    Disallow: /710_all_document_translation.html
    Disallow: /720_computer_skills_training.html
    Disallow: /800a_test_preparation_course_prices.html
    Disallow: /800b_academic_writing_and_business_course_prices.h tml
    Disallow: /800c_other_language_course_prices.html
    Disallow: /800d_thai_and_english_general_conversation_private _course_prices.html
    Disallow: /800e_thai_and_english_general_conversation_group_c ourse_prices.html
    Disallow: /800f_private_and_group_computer_course_prices.html





    # End robots.txt file
    - - - - -

    Please let me know if I am on the right track. I will try to test this with the free online testers, but I do appreciate the input of the higher ranking personnel in this forum.

    Comment


    • #3
      Re: A lot of robots.txt file folders to block question

      Sorry, but I waited too long in order to edit my prior message. Another question I have about the robots.txt requesting the bots to NOT crawl the listed pages:

      If the bots DO crawl even though I requested them not to, will I be penalized by Google if any of those pages are not SEO optimized?

      Thsnk you

      John

      Comment


      • #4
        Re: A lot of robots.txt file folders to block question

        Originally posted by John K. View Post
        If the bots DO crawl even though I requested them not to, will I be penalized by Google if any of those pages are not SEO optimized?
        You will not be penalized (unless you have distinct violations on your site), but neither will you earn greater valuations: you'll only get basic "earnings" ... without being "punished."
        . VodaWebs....Luxury Group
        * Success Is Potential Realized *

        Comment


        • #5
          Re: A lot of robots.txt file folders to block question

          Thank you, Vasilli. I am aware that I put up too much too fast, even though it took the time that it did. What I am most aware of out of all of this is that that one click of the "Publish" button enters the person constructing their website into a whole new world of knowledge that is extremely vital to the building process. The difference being a) building a web site without an eye towards how it will be "read" by the bots, and b) building a web site from an artistic frame of mind, without any regard or knowledge of how it will be "read" by the bots.

          Any ways, at this point, I am going through my pages as quickly as possible, beginning with my sitemap pages, and then into my robots.txt pages, and making every change possible to receive the bots and get a decent rating. It is the first-time jitters, I guess.

          To anyone who knows, I am also interested to know if it is proper to NOT leave ANY published page unlisted, and to either list them all on the sitemap.xml doc (to be crawled), or the robots.txt doc (to be blocked from beign crawled). This would include even the error, success, and custom 404 pages. Is this right thinking?

          Thank you
          John

          Comment


          • #6
            Re: A lot of robots.txt file folders to block question

            Originally posted by John K. View Post
            I am also interested to know if it is proper to NOT leave ANY published page unlisted, and to either list them all on the sitemap.xml doc (to be crawled), or the robots.txt doc (to be blocked from beign crawled). This would include even the error, success, and custom 404 pages. Is this right thinking?
            There is no reason to include your Email/Contact pages (including the "Success" page) in your 'Do Not Follow' file, as they represent commonly featured pages of normal navigation/construct. Neither would there be any reason to include any custom 404 pages ... it is a simple "form or error message" page that 'non-content' and not included in any navigation, and as such are typically ignored.

            Your sitemap file is generated, and should include all "content" and 'navigable' pages (pages included in any hyperlinks on your site) that contribute values: this file is generally not manually created or modified.

            You seem to be thinking too much into the process, and might be getting bogged down on minutia. Keep things simple and within general parameters and you will be fine!
            . VodaWebs....Luxury Group
            * Success Is Potential Realized *

            Comment

            Working...
            X