The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Intermediate & Advanced SEO
    4. What is the best tool to crawl a site with millions of pages?

    What is the best tool to crawl a site with millions of pages?

    Intermediate & Advanced SEO
    4 3 479
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • iCrossing_UK
      iCrossing_UK last edited by

      I want to crawl a site that has so many pages that Xenu and Screaming Frog keep crashing at some point after 200,000 pages.

      What tools will allow me to crawl a site with millions of pages without crashing?

      1 Reply Last reply Reply Quote 0
      • YannickVeys
        YannickVeys last edited by

        For what purpose do you want to crawl the site?

        A web crawler isn't really hard to write. In 100 lines of code you can probably code one. The question is of course: what do you want out of the crawl?

        iCrossing_UK 1 Reply Last reply Reply Quote 0
        • iCrossing_UK
          iCrossing_UK @YannickVeys last edited by

          Only basic stuff: URL, Title, Description, and a few HTML elements.

          I am aware that building a crawler would be fairly easy, but is there one out there that already does it without consuming too many resources?

          1 Reply Last reply Reply Quote 0
          • McCannSEO
            McCannSEO last edited by

            Don't forget to exclude pages that don't contain the information you are looking for - exclude query parameters which just result in duplicate content, system files, etc. That may help to bring the amount down.

            1 Reply Last reply Reply Quote 0
            • 1 / 1
            • First post
              Last post
            • On 1 of our sites we have our Company name in the H1 on our other site we have the page title in our H1 - does anyone have any advise about the best information to have in the H1, H2 and Page Tile
              Kenn_Gold
              Kenn_Gold
              0
              5
              601

            • Webmaster Tools HTML Improvements Page Blank / Site Not Ranking Well
              evolvingSEO
              evolvingSEO
              0
              2
              299

            • When migrating website platforms but keeping the domain name how best do we add the new site to google webmaster tools? Best redirect practices?
              WNL
              WNL
              0
              6
              462

            • How does the crawl find duplicate pages that don't exist on the site?
              Deniz
              Deniz
              0
              3
              110

            • How do I best deal with pages returning 404 errors as they contain links from other sites?
              TommyTan
              TommyTan
              0
              6
              341

            • Best possible linking on site with 100K indexed pages
              Cyrus-Shepard
              Cyrus-Shepard
              0
              4
              261

            • Best practice to change the URL of all my site pages
              RyanKent
              RyanKent
              1
              4
              698

            • Working out exactly how Google is crawling my site if I have loooots of pages
              wazza1985
              wazza1985
              0
              3
              875

            Get started with Moz Pro!

            Unlock the power of advanced SEO tools and data-driven insights.

            Start my free trial
            Products
            • Moz Pro
            • Moz Local
            • Moz API
            • Moz Data
            • STAT
            • Product Updates
            Moz Solutions
            • SMB Solutions
            • Agency Solutions
            • Enterprise Solutions
            • Digital Marketers
            Free SEO Tools
            • Domain Authority Checker
            • Link Explorer
            • Keyword Explorer
            • Competitive Research
            • Brand Authority Checker
            • Local Citation Checker
            • MozBar Extension
            • MozCast
            Resources
            • Blog
            • SEO Learning Center
            • Help Hub
            • Beginner's Guide to SEO
            • How-to Guides
            • Moz Academy
            • API Docs
            About Moz
            • About
            • Team
            • Careers
            • Contact
            Why Moz
            • Case Studies
            • Testimonials
            Get Involved
            • Become an Affiliate
            • MozCon
            • Webinars
            • Practical Marketer Series
            • MozPod
            Connect with us

            Contact the Help team

            Join our newsletter
            Moz logo
            © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
            • Accessibility
            • Terms of Use
            • Privacy