The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Blocking https from being crawled

    Blocking https from being crawled

    Technical SEO Issues
    7 4 537
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Sean_Dawes
      Sean_Dawes last edited by

      I have an ecommerce site where https is being crawled for some pages. Wondering if the below solution will fix the issue

      www.example.com will be my domain

      In the nav there is a login page www.example.com/login which is redirecting to the https://www.example.com/login

      If I just disallowed /login in the robots file wouldn't it not follow the redirect and index that stuff?

      The redirect part is what I am questioning.

      1 Reply Last reply Reply Quote 0
      • RebekahMay
        RebekahMay last edited by

        You can disallow the https portion in robots.txt, but remember robots.txt isn't always a sure fire way of not getting an area of your site crawled. If you have other important content to crawl from the secured page, be careful you are not blocking robots from there.

        If this is linked to other places on the web, and the link doesn't include no-follow, search engines may still crawl the page.  Can you change the link in your navigation to no-follow as well? I would also add a meta noindex tag to the page itself, and a canonical tag to the https version.

        Sean_Dawes 1 Reply Last reply Reply Quote 0
        • NakulGoyal
          NakulGoyal last edited by

          The pages that are being crawled under https, are the same pages available under http as well ? If yes, can you just add a canonical tag on these pages to go to the http version. That should fix it. And if your login page is the entry point, your fix will help as well. But then as Rebekah said, what if somebody is linking to your https page. I would suggest you look into making a canonical tag on these pages to http if that makes sense and is doable.

          Sean_Dawes 1 Reply Last reply Reply Quote 0
          • Sean_Dawes
            Sean_Dawes @RebekahMay last edited by

            Yea I was going to nofollow the link in the nav and add a meta tag but was curious how the robots file would handle this since the url is a redirect.

            Thanks for your input

            1 Reply Last reply Reply Quote 0
            • Sean_Dawes
              Sean_Dawes @NakulGoyal last edited by

              Gotcha. Yea I commented above how I was going to add a canonical as well as a noindex in the meta but was curious how it handled the redirect that was happening.

              thanks for your help 🙂

              1 Reply Last reply Reply Quote 0
              • Dr-Pete
                Dr-Pete last edited by

                So, the "/login" page gets redirected to https: and then every link on that page goes secure and Google crawls them all? I think blocking the "/login" page is a perfectly good way to go here - cut the crawl path, and you'll cut most of the problem.

                You could request removal of "/login" in Google Webmaster Tools, too. Sometimes, I find that Robots.txt isn't great at removing pages that are already indexed. I would definitely add the canonical as well, if it's feasible. Cutting the path may not cut the pages that have already been indexed with https:.

                Sorry, I'd actually reverse that:

                (1) Add the canonicals, and let Google sweep up the duplicates

                (2) A few weeks later, block the "/login" page

                Sounds counter-intuitive, but if you block the crawl path to the https: pages first, then Google won't crawl the canonical tags on those versions. Use canonical to clean up the index, and then block the page to prevent future problems.

                Sean_Dawes 1 Reply Last reply Reply Quote 0
                • Sean_Dawes
                  Sean_Dawes @Dr-Pete last edited by

                  Correct once /login gets redirected to https://www.example.com/login all nav links etc are https

                  What I ended up doing was blocking /login in robots and now doing canonicals on https as well as nofollow the /login link that is in the nav that redirects

                  Willl see what happens now.

                  1 Reply Last reply Reply Quote 0
                  • 1 / 1
                  • First post
                    Last post
                  • Changing Domains - 301 old https to new https
                    BlueprintMarketing
                    BlueprintMarketing
                    1
                    5
                    81

                  • Brushing up on my SEO skills - how do I check my website to see if Javascript is blocking search engines from crawling the links within a javascript-enabled drop down menu?
                    Martijn_Scheijbeler
                    Martijn_Scheijbeler
                    0
                    3
                    64

                  • Google Webmaster Tools is saying "Sitemap contains urls which are blocked by robots.txt" after Https move...
                    vetofunk
                    vetofunk
                    0
                    5
                    11.2k

                  • Can I Block https URLs using Host directive in robots.txt?
                    LoganRay
                    LoganRay
                    0
                    4
                    760

                  • Blocking subdomains without blocking sites...
                    OlegKorneitchouk
                    OlegKorneitchouk
                    1
                    6
                    145

                  • Will blocking the Wayback Machine (archive.org) have any impact on Google crawl and indexing/SEO?
                    EricHess
                    EricHess
                    0
                    3
                    2.2k

                  • Http VS https and google crawl and indexing ?
                    sherohass
                    sherohass
                    0
                    4
                    1.8k

                  • Have a client that migrated their site; went live with noindex/nofollow and for last two SEOMoz crawls only getting one page crawled. In contrast, G.A. is crawling all pages. Just wait?
                    Nobody1560986989723
                    Nobody1560986989723
                    0
                    5
                    422

                  Get started with Moz Pro!

                  Unlock the power of advanced SEO tools and data-driven insights.

                  Start my free trial
                  Products
                  • Moz Pro
                  • Moz Local
                  • Moz API
                  • Moz Data
                  • STAT
                  • Product Updates
                  Moz Solutions
                  • SMB Solutions
                  • Agency Solutions
                  • Enterprise Solutions
                  • Digital Marketers
                  Free SEO Tools
                  • Domain Authority Checker
                  • Link Explorer
                  • Keyword Explorer
                  • Competitive Research
                  • Brand Authority Checker
                  • Local Citation Checker
                  • MozBar Extension
                  • MozCast
                  Resources
                  • Blog
                  • SEO Learning Center
                  • Help Hub
                  • Beginner's Guide to SEO
                  • How-to Guides
                  • Moz Academy
                  • API Docs
                  About Moz
                  • About
                  • Team
                  • Careers
                  • Contact
                  Why Moz
                  • Case Studies
                  • Testimonials
                  Get Involved
                  • Become an Affiliate
                  • MozCon
                  • Webinars
                  • Practical Marketer Series
                  • MozPod
                  Connect with us

                  Contact the Help team

                  Join our newsletter
                  Moz logo
                  © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                  • Accessibility
                  • Terms of Use
                  • Privacy