The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.

    How does Google index pagination variables in Ajax snapshots? We're seeing random huge variables.

    Intermediate & Advanced SEO
    10 6 1.6k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • sitestrux
      sitestrux last edited by

      We're using the Google snapshot method to index dynamic Ajax content.  Some of this content is from tables using pagination. The pagination is tracked with a var in the hash, something like:

      #!home/?view_3_page=1

      We're seeing all sorts of calls from Google now with huge numbers for these URL variables that we are not generating with our snapshots.  Like this:

      #!home/?view_3_page=10099089

      These aren't trivial since each snapshot represents a server load, so we'd like these vars to only represent what's returned by the snapshots.

      Is Google generating random numbers going fishing for content?  If so, is this something we can control or minimize?

      1 Reply Last reply Reply Quote 0
      • FedeEinhorn
        FedeEinhorn last edited by

        I think you are right. Google is fishing for content. I would find a solution to make those URL friendly by removing the hash and using some URL rewrite and pushState to paginate that content instead.

        Here's a previous question that may help: http://moz.com/community/q/best-way-to-break-down-paginated-content

        randfish Carson-Ward 2 Replies Last reply Reply Quote 1
        • sitestrux
          sitestrux last edited by

          Hi Federico, thanks for the response.

          Unfortunately this is an SEO solution for a third-party JavaScript product, so removing the hash isn't an option.

          I'm still interested in knowing if this is a formal Google practice and if there's some way to control or mitigate this.

          FedeEinhorn 1 Reply Last reply Reply Quote 0
          • FedeEinhorn
            FedeEinhorn @sitestrux last edited by

            We also noticed some weird crawls last year using random numbers at the end of the URL, checking in google webmaster tools we saw that most of those urls were reported as not found, checking from where the link came from google listed some of our URLs, but didn't had any link to those URLs google was trying to fetch. After 2 or 3 months those crawls stopped. We never knew from where Google got those URLs...

            1 Reply Last reply Reply Quote 0
            • evolvingSEO
              evolvingSEO last edited by

              Hi There

              I'm an associate here at Moz, and have asked the other associates if they might know the answer, as this one's a little outside of my experience. Please follow up and let us know if you don't hear from anyone.

              Thanks!

              -Dan

              sitestrux 1 Reply Last reply Reply Quote 0
              • sitestrux
                sitestrux @evolvingSEO last edited by

                Awesome, thanks for looking into it.  We've gotten nowhere with any kind of answer.

                1 Reply Last reply Reply Quote 0
                • randfish
                  randfish @FedeEinhorn last edited by

                  I agree with Federico. I've seen Google go fishing with URL parameters (?param=xyz) and I've seen it with AJAX and hashbangs as well. How far they take this and when they choose to apply it doesn't seem to follow a consistent pattern . You can see some folks on StackExchange discussing this, too: http://webmasters.stackexchange.com/questions/25560/does-the-google-crawler-really-guess-url-patterns-and-index-pages-that-were-neve

                  1 Reply Last reply Reply Quote 1
                  • Carson-Ward
                    Carson-Ward @FedeEinhorn last edited by

                    This seems to do this only for parameters that it has decided "changes, re-orders, or narrows content." They may also crawl things that look like URLs in Javascript even when it's part of a function, but it doesn't seem like that's what's happening in this case.

                    Depending on the setup of the site, you can either manually configure the variable in WMT (don't do this if the parameter is material), write a clever robots.txt rule (e.g. to block anything after a number of digits after the parameter), or (the best solution) re-work the system to generate URLs that don't rely on parameters.

                    I'm not sure I understand why the server is rendering a page if the URL isn't supposed to exist. Depending on your server config, you may also be able to return a 404 and make a rule for which (valid) pages to render. From there you can just ignore the 404 errors until Google figures it out.

                    I think that's the best I can do without seeing the site.

                    1 Reply Last reply Reply Quote 1
                    • richardbaxter
                      richardbaxter last edited by

                      100% of my experience in this situation is from using Angular.js with Phantom rendering the snapshot, so I tend to use the meta fragment directive in the page header (because I don't use #!'s). With that said I do think my debugging / test experience might be useful, so I'll splurge it out here just in case.

                      For the record I don't think this is a simple case of Google fabricating URLs - I think it's worth making sure there's not something happening in-between. The real reason tends to come out in testing.

                      Have you looked in your log files at requests specifically containing your ?view_3_page= parameter? I'd get a sample of Googlebot requests and look for that parameter. Every time I've come across this problem so far, it's been all about your framework not responding well to the parameter ordering in the URL when combined with the escaped_fragment= parameter.

                      Sometimes, when the request is made by Google with escaped_fragment= in the request URI, you have to be certain that you understand the behavior that particular request URL is likely to trigger.

                      So when initial request: yourdomain.com/#!home/?view_3_page=1 is made,

                      What does: yourdomain.com/?escaped_fragment=home/?view_3_page=1 do?

                      Side note - it could be: yourdomain.com/?escaped_fragment=home/&view_3_page=1 but as Carson said, without looking at how your side behaves in this situation it's difficult to know so I'll just put the different outcome options in here in case one of them is close.

                      So, check your server logs and look at how the snapshot request URI is formed. Then check those pages out in a browser - making sure (obviously) you're responding with the right server header response and that the page code makes sense,

                      What tends to happen (if you've got this far) is that in unusual circumstances (eg: a chain of parameters with the escaped fragment pre-fetch directive bolted in) is that you might be serving malformed versions of what you'd hoped would be your perfectly constructed HTML snapshot.

                      IF that's the case, I would spend a lot of time evaluating what Google sees and therefore, what it attempts to crawl. You might find that if you're serving something a bit strange then Google might be discovering URLs you didn't know you were capable of generating. That should give you enough scope to detect a problem and get a change request assigned to fix it.

                      If not, then I suppose Google really is making these URLs up - but honestly, I spend a lot of time trawling through log files and it's been a long time since I haven't been able to find an explanation from the actual code.

                      As a side note: I'd try to avoid hashbangs in the medium / long term. As soon as they're they're you're committed to a lifetime of supporting them. A much more elegant solution is to use PushState (or $location if you're Angular) but (obviously) continue to serve the snapshot trigger via the meta fragment directive. I'm sure you're quite tired of being told to get rid of hashbangs, though.

                      Hope that helps?

                      Richard Baxter
                      SEOgadget.com

                      1 Reply Last reply Reply Quote 3
                      • sitestrux
                        sitestrux last edited by

                        Thanks for the great replies all.  Just to clarify, this is the page we're referencing:

                        http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=

                        You can see the one pagination var "next" that points here:

                        http://www.knackhq.com/business-directory-user-demo/?escaped_fragment=home/?view_3_page=2

                        As you can see this is pretty simple.  There's only one potential variable (the "prev" and "next" links) for introducing these huge numbers and that's pretty limited.  We tested the Google URLs up and down the app and haven't seen anything that would send it fishing for larger numbers.  But Google keeps hammering us with:

                        GET /business-directory-user-demo/?escaped_fragment=home/?view_3_page=1000251

                        For now we're trying to respond to those with 404s and hope they eventually die.

                        Unfortunately we can't avoid hashbangs.

                        1 Reply Last reply Reply Quote 0
                        • 1 / 1
                        • First post
                          Last post
                        • Is Google able to see child pages in our AJAX pagination?
                          0
                          1
                          235

                        • Why isn't my site being indexed by Google?
                          Chris661
                          Chris661
                          0
                          3
                          188

                        • Is it a problem that Google's index shows paginated page urls, even with canonical tags in place?
                          94501
                          94501
                          0
                          3
                          249

                        • Why isn't google indexing our site?
                          MikeTek
                          MikeTek
                          0
                          18
                          438

                        • Huge Google index on E-commerce site
                          ssiebn7
                          ssiebn7
                          0
                          5
                          757

                        • We're indexed in Google News, any tips or suggestions for getting traffic from news?
                          OrionGroup
                          OrionGroup
                          0
                          2
                          140

                        • Pagination Question: Google's 'rel=prev & rel=next' vs Javascript Re-fresh
                          nicole.healthline
                          nicole.healthline
                          0
                          3
                          793

                        • Is 404'ing a page enough to remove it from Google's index?
                          RyanKent
                          RyanKent
                          0
                          4
                          13.7k

                        Get started with Moz Pro!

                        Unlock the power of advanced SEO tools and data-driven insights.

                        Start my free trial
                        Products
                        • Moz Pro
                        • Moz Local
                        • Moz API
                        • Moz Data
                        • STAT
                        • Product Updates
                        Moz Solutions
                        • SMB Solutions
                        • Agency Solutions
                        • Enterprise Solutions
                        • Digital Marketers
                        Free SEO Tools
                        • Domain Authority Checker
                        • Link Explorer
                        • Keyword Explorer
                        • Competitive Research
                        • Brand Authority Checker
                        • Local Citation Checker
                        • MozBar Extension
                        • MozCast
                        Resources
                        • Blog
                        • SEO Learning Center
                        • Help Hub
                        • Beginner's Guide to SEO
                        • How-to Guides
                        • Moz Academy
                        • API Docs
                        About Moz
                        • About
                        • Team
                        • Careers
                        • Contact
                        Why Moz
                        • Case Studies
                        • Testimonials
                        Get Involved
                        • Become an Affiliate
                        • MozCon
                        • Webinars
                        • Practical Marketer Series
                        • MozPod
                        Connect with us

                        Contact the Help team

                        Join our newsletter
                        Moz logo
                        © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                        • Accessibility
                        • Terms of Use
                        • Privacy