The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. Best way to handle indexed pages you don't want indexed

    Best way to handle indexed pages you don't want indexed

    Technical SEO Issues
    11 5 786
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • gavinhoman
      gavinhoman last edited by

      We've had a lot of pages indexed by google which we didn't want indexed. They relate to a ajax category filter module that works ok for front end customers but under the bonnet google has been following all of the links.

      I've put a rule in the robots.txt file to stop google from following any dynamic pages (with a ?) and also any ajax pages but the pages are still indexed on google.

      At the moment there is over 5000 pages which have been indexed which I don't want on there and I'm worried is causing issues with my rankings.

      Would a redirect rule work or could someone offer any advice?

      https://www.google.co.uk/search?q=site:outdoormegastore.co.uk+inurl:default&num=100&hl=en&safe=off&prmd=imvnsl&filter=0&biw=1600&bih=809#hl=en&safe=off&sclient=psy-ab&q=site:outdoormegastore.co.uk+inurl%3Aajax&oq=site:outdoormegastore.co.uk+inurl%3Aajax&gs_l=serp.3...194108.194626.0.194891.4.4.0.0.0.0.100.305.3j1.4.0.les%3B..0.0...1c.1.SDhuslImrLY&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=ff301ef4d48490c5&biw=1920&bih=860

      1 Reply Last reply Reply Quote 0
      • NakulGoyal
        NakulGoyal last edited by

        OMG, that does not look good. I completely understand. The best way in my opinion would be to add a noindex meta tag on these pages and let Google crawl them. Once they re-index them with the noindex, that should take care of the problem. However, be careful since you want to make sure that noindex tag does not appear on your real pages, just the AJAX ones.

        Another option might be to consider the canonical tag, but then technically these pages are not duplicate pages, they just should not exist. Are you verified and using the Google Webmaster Console ? If yes, see if you can get some of these pages excluded via the URL removal tool. The best way is to add the noindex tag in my opinion.

        gavinhoman 1 Reply Last reply Reply Quote 1
        • gavinhoman
          gavinhoman @NakulGoyal last edited by

          Thanks for the quick reply! I'm desperate to get these removed as soon as possible now. I've got webmaster tools access but requesting over 5,000 pages to be removed one by one will take too long. You can't do page removal in bulk can you?

          I'm going to work on the noindex option

          josh-riley 1 Reply Last reply Reply Quote 0
          • josh-riley
            josh-riley @gavinhoman last edited by

            If they are already indexed, it's going to take time for Google to recrawl, read the tag and get them to fall out, so patience will be key. It's not a quick thing to undo.

            If the pages are all in one location, you can add a disallow robots/text to Webmaster Tools command to prevent that entire folder from being indexed, but again, it's already done so you are going to have to wait for all those pages to fall out.

            1 Reply Last reply Reply Quote 0
            • GeorgeAndrews
              GeorgeAndrews last edited by

              I'm not sure if you're aware or not, but I think I know why Google is indexing these pages.

              Right now, you are outputting URLs into your source code of your page in the form of a JavaScript function call similar to the following:

              I believe this is because your page (and this function call) is programmatically created. Instead of outputting the whole URL to the page, you could output only what needs to be there.

              For example:

              Then change the signature of the JavaScript function so that it accepts this new input and builds the URL from your inputs:

              function initSlider(price, low, high, category, subcategory, product, store, ajax, ?) {

              // build URL

              var URL = 'http://www.outdoormegastore.co.uk/' + category + '/' + subcategory + '/' + product + '.html?_' + store + '&' + ajax;

              // continue...

              }

              Right now, because that URL is being outputted to the page, I think Google sees it as a URL it should follow and index. If you build this URL with the function in an external JavaScript file, I don't think it will be indexed.

              Your developer(s) should know what I'm talking about.

              Hope this helps!

              1 Reply Last reply Reply Quote 1
              • Dr-Pete
                Dr-Pete last edited by

                Definitely review George's comment as you need to figure out why they're being crawled. As Andrea said, any solution takes time, I'm sorry to say. Robots.txt is not a good solution for getting pages removed that are already indexed, especially in bulk. It's better at prevention than cure.

                META NOINDEX can be effective, or you could rel=canonical these pages to the appropriate non-AJAX URL - not sure exactly how the structure is set up. Those are probably the two fastest and more powerful approaches. Google parameter handling (in Webmaster Tools) is another option, but it's a bit unpredictable whether they honor it and how quickly.

                You can only do mass removal if everything is in a folder, if I recall. There's no way to bulk remove unless all of the pages are structurally under one root URL.

                1 Reply Last reply Reply Quote 1
                • gavinhoman
                  gavinhoman last edited by

                  Thanks for all of the replies... My best option seems to be the meta noindex rule but the nature of the pages that are getting indexed are just one long ajax string with no access to the header are. I hope I have already 'prevented' google from following the links in the future by adding the rules to robots.txt but  I'm now desperate to clean up (cure) the existing ones.

                  My next thought would be to put a rule in htaccess and redirect anything with ajax in the url to a 404 page?

                  I'm worried that this may have even worse side effects with rankings but its based on this article that google publish: https://support.google.com/webmasters/bin/answer.py?hl=en&answer=59819

                  "To remove a page or image, you must do one of the following:

                  • Make sure the content is no longer live on the web. Requests for the page must return an HTTP 404 (not found) or 410 status code

                  What would your thoughts be on this?

                  NakulGoyal Dr-Pete 2 Replies Last reply Reply Quote 0
                  • NakulGoyal
                    NakulGoyal @gavinhoman last edited by

                    Gavin, that's a more generic response. In this scenario, unless you can make a 404 happen, it won't work and therefore is not applicable. Noindex and / or the canonical tag are the choices and I would try and get those going if possible.

                    1 Reply Last reply Reply Quote 0
                    • Dr-Pete
                      Dr-Pete @gavinhoman last edited by

                      The AJAX URLs are used by the site, though, right (for visitors)? If you 404 them, you may be breaking the functionality and not just impacting Google.

                      Another problem is that, if these pages are no longer crawlable, and you add a page-level directive (whether it's a 404, 301, canonical, NOINDEX, etc.), Google won't process those new instructions. So, they could get stuck in the index. If that's the case, ti may actually be more effective to block the "ajax=" parameter with parameter handling in Google Webmaster Tools (there's a similar option in Bing).

                      If you know the path is cut and this isn't a recurrent problem, that could be the fastest short-term solution. You do need to monitor, though, as they can re-enter the index later.

                      1 Reply Last reply Reply Quote 0
                      • gavinhoman
                        gavinhoman last edited by

                        Right... We think we've been able to get the code noindex code into the dodgy pages. The only way we could think of doing it without breaking the user interface was to put this rule into the PHP.

                        if(!empty($_SERVER['HTTP_X_REQUESTED_WITH']) && strtolower($_SERVER['HTTP_X_REQUESTED_WITH']) == 'xmlhttprequest')
                        {

                        normal code

                        }

                        else
                        {

                        echo '';
                        echo '';
                        echo '';
                        echo '';
                        echo '';
                        echo '404';
                        echo '';
                        echo '';
                        }

                        Its rendering ok for us front end, if anyone would like to test... I'm just hopeful it would work for google?

                        http://www.outdoormegastore.co.uk/cycling/cycling-clothing/protective-clothing.html?ajax=1

                        One thing I am not sure about is how google is going to revisit the said pages. I have put in various rules to the robots.txt files as well as the url parameter handling in webmaster tools to prevent any future pages from being followed... Would these rules need to be removed?

                        NakulGoyal 1 Reply Last reply Reply Quote 0
                        • NakulGoyal
                          NakulGoyal @gavinhoman last edited by

                          Gavin Since you have added the noindex in the pages, the best way is to let Google crawl those pages, see the noindex and remove them. The other option is to keep everything as is and request these parameter pages via your Google Webmaster Console. Option 1: You never know how long it takes Option 2: This should happen relatively fast I would therefore suggest keeping everything as is and doing a removal request.

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post
                          • Old pages not mobile friendly - new pages in process but don't want to upset current traffic.
                            0
                            18
                            68

                          • Specific pages won't index
                            ThomasHarvey
                            ThomasHarvey
                            0
                            11
                            97

                          • How to handle pages I can't delete?
                            evolvingSEO
                            evolvingSEO
                            0
                            14
                            281

                          • Best way to handle pages with iframes that I don't want indexed? Noindex in the header?
                            jim_shook
                            jim_shook
                            0
                            3
                            165

                          • Duplicate page errors from pages don't even exist
                            danatanseo
                            danatanseo
                            0
                            4
                            100

                          • Url's don't want to show up in google. Please help?
                            RyanKent
                            RyanKent
                            0
                            5
                            587

                          • Switching ecommerce CMS's - Best Way to write URL 301's and sub pages?
                            tylerfraser
                            tylerfraser
                            0
                            7
                            1.2k

                          • How do I use the Robots.txt "disallow" command properly for folders I don't want indexed?
                            portalseo
                            portalseo
                            0
                            5
                            1.9k

                          Get started with Moz Pro!

                          Unlock the power of advanced SEO tools and data-driven insights.

                          Start my free trial
                          Products
                          • Moz Pro
                          • Moz Local
                          • Moz API
                          • Moz Data
                          • STAT
                          • Product Updates
                          Moz Solutions
                          • SMB Solutions
                          • Agency Solutions
                          • Enterprise Solutions
                          • Digital Marketers
                          Free SEO Tools
                          • Domain Authority Checker
                          • Link Explorer
                          • Keyword Explorer
                          • Competitive Research
                          • Brand Authority Checker
                          • Local Citation Checker
                          • MozBar Extension
                          • MozCast
                          Resources
                          • Blog
                          • SEO Learning Center
                          • Help Hub
                          • Beginner's Guide to SEO
                          • How-to Guides
                          • Moz Academy
                          • API Docs
                          About Moz
                          • About
                          • Team
                          • Careers
                          • Contact
                          Why Moz
                          • Case Studies
                          • Testimonials
                          Get Involved
                          • Become an Affiliate
                          • MozCon
                          • Webinars
                          • Practical Marketer Series
                          • MozPod
                          Connect with us

                          Contact the Help team

                          Join our newsletter
                          Moz logo
                          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                          • Accessibility
                          • Terms of Use
                          • Privacy