Results 1 to 12 of 12

Thread: Keeping HTTPS pages out of google index
      
   

  1. #1
    elvisyorkie is offline Private First Class
    Join Date
    Dec 2006
    Posts
    5

    Default Keeping HTTPS pages out of google index

    Hi,

    What is the best way to keep search engines from indexing https pages. When they index both http and https pages you get thrown into supplemental results due to duplicate content.

    I was reading that you should have a robots.txt file for both the http and https. But I don't know where to put the file for the https pages.

    Can anyone help.

    Thanks

    Larry

  2. #2
    Join Date
    Feb 2006
    Location
    Earth
    Posts
    8,721

    Default Re: Keeping HTTPS pages out of google index

    You can place tags similar to this in the head of your page.
    Do a Google search on Meta tags no index, or similar.




    <META name="ROBOTS" content="NOINDEX">
    Use this for pages with many links on them, but not much useful data. Because "follow" is the default, you don't have to include it.
    Index, but do not follow links

    <META name="ROBOTS" content="NOFOLLOW">
    Use this for pages which have useful content but links which may be irrelevant or obsolete.
    Do not index or follow links

    <META name="ROBOTS" content="NOINDEX,NOFOLLOW">
    This is for pages which should not be indexed at all. If you put that in every page, the site should not be indexed.
    Index and follow links

    <META name="ROBOTS" content="INDEX,FOLLOW">
    This is the default behavior: you don't have to include this tag.

    Note: if you add Robots META tags to a framed site, be sure to include them on both the FRAMESET and the FRAME pages.

  3. #3
    elvisyorkie is offline Private First Class
    Join Date
    Dec 2006
    Posts
    5

    Default Re: Keeping HTTPS pages out of google index

    Chris,

    This will not work on a site that has a ssl cert. You want the search engines to index all you http urls and not index your https urls. When they index both you get duplicate content and end up in the supplemental results.

    Larry

  4. #4
    elvisyorkie is offline Private First Class
    Join Date
    Dec 2006
    Posts
    5

    Default Re: Keeping HTTPS pages out of google index

    To remove only the https version of indexed pages from search engines, place the following in robots.txt file in the folder which serves the secured page of your site.
    User-agent: *
    Disallow: /

    The question is where is the folder which serves the secured pages of my site?

    thanks

    Larry

  5. #5
    Karen Mac's Avatar
    Karen Mac is offline General
    Join Date
    Apr 2006
    Location
    X marks the spot
    Posts
    8,353

    Default Re: Keeping HTTPS pages out of google index

    Larry

    You dont need it in a folder. Just add to your robots.txt file
    User-Agent:* refers to the bots if you want to name them specifically.
    Disallow: https:/

    That would disallow all bots from your https pages so you dont inadvertantly get duplicate content smacked because bots havent quite figured out everything with these new algorythms yet. :)

    This goes in your ROOT folder for the site in question. If its your MAIN domain, then it goes in the public html folder along with the other files you see listed there. If its an addon, then put it inside the addon folder Ie: elvisyorkshireterrior

    Karen

    VodaHost

    Your Website People!
    1-302-283-3777 North America / International
    07031847328 / United Kingdom

    ------------------------

    Top 3 Best Sellers

    Web Hosting - Unlimited disk space & bandwidth.

    Reseller Hosting - Start your own web hosting business.

    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)



  6. #6
    Karen Mac's Avatar
    Karen Mac is offline General
    Join Date
    Apr 2006
    Location
    X marks the spot
    Posts
    8,353

    Default Re: Keeping HTTPS pages out of google index

    To clarify, when i said it didnt need a folder, I meant its own folder or an https folder. It goes in either the public html folder or the domain folder in the root.

    Karen

    VodaHost

    Your Website People!
    1-302-283-3777 North America / International
    07031847328 / United Kingdom

    ------------------------

    Top 3 Best Sellers

    Web Hosting - Unlimited disk space & bandwidth.

    Reseller Hosting - Start your own web hosting business.

    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)



  7. #7
    elvisyorkie is offline Private First Class
    Join Date
    Dec 2006
    Posts
    5

    Default Re: Keeping HTTPS pages out of google index

    Karen,

    I did what you said, but what concerns me is that this is what google says about this issue.

    Each port must have its own robots.txt file. In particular, if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
    For your http protocol (http://yourserver.com/robots.txt):
    User-agent: *
    Allow: /
    For the https protocol (https://yourserver.com/robots.txt):
    User-agent: *
    Disallow: /

    The other concern I have is that I use relative link urls instead of absolute urls which from what I read may contribute to the problem but if you follow using the two robot.txt files you can solve the problem.

    I've contacted customer service and had to wait over 8 hrs for a response. They finally responded and this was their response :

    "You can not really have the https version of a page not list and the http version of a page listed. At the end of the day, they are the exact same page."

    Based upon this response one really wonders how they even turn their computers on.

    Karen if you know how I can do what google says has to be done it would be greatly appreciated. Or If you know someone that I can call to resolve this problem would also be great.

    Thanks

    Larry

  8. #8
    Bethers's Avatar
    Bethers is offline Major General & Forum Moderator
    Join Date
    Feb 2006
    Posts
    5,232

    Default Re: Keeping HTTPS pages out of google index

    Larry,
    I got your phone message - but I'm on vacation.

    However, you shouldn't have ANY pages https EXCEPT your checkout pages - and if this was the case, it wouldn't be a problem.

    So - what pages are both? NONE should be both. Checkout should be https -the rest should be http.

    Now, if you for some stupid reason have made some pages accessible both ways - then the robots.txt of nofollow to the https will work.

    Again - YOU SHOULD NOT HAVE a problem if you have the HTTPS pages only the pages they SHOULD BE

    And, YES I"M YELLING.

  9. #9
    Karen Mac's Avatar
    Karen Mac is offline General
    Join Date
    Apr 2006
    Location
    X marks the spot
    Posts
    8,353

    Default Re: Keeping HTTPS pages out of google index

    Larry

    Define PORT or what you think they are saying is a PORT. Each website should have its own robots.txt file, you could install one in your admin area, but .. i dont know that would really serve any purpose other than to backup the root one.

    I would define port.. as each website. Now.. your website https whatever is only virtual, because you have a dedicated ip, and encryption, but its the same SITE as your http, the only difference is when the encryption is called for. So one robots text should cover this. Now there is only ONE way for google to get this information. Either your software isnt reverting back to the http when a product is followed into the cart, and then continue shopping is hit, and the user remains in https mode. Google would also follow this path. The only other way is that when you created your sitemap, you also allowed those urls and didnt omit them.

    Now if you create another domain or subdomain, then you would need another robots.txt file for this ROOT OR PORT. You dont technically have an HTTPS ROOT to install a robot txt file on, so I think you are complicating what googles intent is. You can also in webmaster tools set your preferences in google for this domain.

    Karen

    VodaHost

    Your Website People!
    1-302-283-3777 North America / International
    07031847328 / United Kingdom

    ------------------------

    Top 3 Best Sellers

    Web Hosting - Unlimited disk space & bandwidth.

    Reseller Hosting - Start your own web hosting business.

    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)



  10. #10
    Karen Mac's Avatar
    Karen Mac is offline General
    Join Date
    Apr 2006
    Location
    X marks the spot
    Posts
    8,353

    Default Re: Keeping HTTPS pages out of google index

    Ok.. I just went and read up on this headache.. and what you said Larry was true, however they dont give you HOW to do this, and when I looked at yours i think i put the slash in the wrong place.

    Http is normally port 80 and Https i think is 443 or something like this. Https is hypertext transfer protocol over secure sockets layer. (look all that up that will keep you busy for 2 or 3 hours)

    Owing that you cant SEE or have access really to either of the ports, I dont have the froggiest idea how google would expect you to create 2 robots text files.

    So.. I would stick with one robots txt file per root. Your domain isnt changing only the http or https is the culprit, so it would also stand then to reason, at least to my tired brain, that the https only kicks in when you are IN the shopping cart preparing to TRANSFER info to the virtual terminal via port 443. Therefore, I would disallow whatever that carts name is... and generally that is in your INCLUDES or SCRIPTS and housed in your ADMIN folder. Therefore, my most learned and experienced caffiene deprived brain says...
    disallow: /ADMIN AREA whatever your cart maybe called, IE: SOHOADMIN, IE: in oscommerce: ADMIN

    And forget about disallow /https, which most likely would give a syntax error anyway since its NOT a folder, but a virtual folder. and you could only access this on WMH hosting by port number.

    And by the way, I didnt generate your site map, but make sure there are no references in it to the https protocols or to the admin or SECURE area of your store.

    And by the way Larry, I went into your cart and ran a test order and I never did get an HTTPS protocol, which, I should have gotten while putting in the card numbers in the cart, and when the fake card was declined, you have some include syntax error come up.. so id say your cart isnt exactly par somewhere. The ssl is installed just fine. I checked that, but its in your direction to the https within your cart admin area.

    God I HATE GOOGLE.. LOL

    Ok.. im going to bed now, Ill get Matt Cutts Hate mail tomorrow! :)

    Karen

    VodaHost

    Your Website People!
    1-302-283-3777 North America / International
    07031847328 / United Kingdom

    ------------------------

    Top 3 Best Sellers

    Web Hosting - Unlimited disk space & bandwidth.

    Reseller Hosting - Start your own web hosting business.

    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)



  11. #11
    Karen Mac's Avatar
    Karen Mac is offline General
    Join Date
    Apr 2006
    Location
    X marks the spot
    Posts
    8,353

    Default Re: Keeping HTTPS pages out of google index

    OH.. One more thing.. NO PHONE CALLS TIL 11am. Im sleeping in! IF my phone rings somebody better be bleeding... or else they will be!

    Karen

    VodaHost

    Your Website People!
    1-302-283-3777 North America / International
    07031847328 / United Kingdom

    ------------------------

    Top 3 Best Sellers

    Web Hosting - Unlimited disk space & bandwidth.

    Reseller Hosting - Start your own web hosting business.

    Search Engine & Directory Submission - 300 directories + (Google,Yahoo,Bing)



  12. #12
    Bethers's Avatar
    Bethers is offline Major General & Forum Moderator
    Join Date
    Feb 2006
    Posts
    5,232

    Default Re: Keeping HTTPS pages out of google index

    Larry,
    I don't know what you've done - but I'm going to checkout on your site - and I never am even hitting the secure pages - therefore would never give you my credit card info.

    You somehow - it's like you're cloning your site for the https - instead of installing it where it's needed. I can't tell you how to fix it- but it's definitely wrong - and no robot txt file is gonna fix this - you need to fix the pages.

    Back to vacation :)

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

     

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49