The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. How to extract URLs from a site (without bringing the server down!)

    How to extract URLs from a site (without bringing the server down!)

    Technical SEO Issues
    6 5 546
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • neooptic
      neooptic last edited by

      Hi everybody.

      One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.

      However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.

      Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!

      1 Reply Last reply Reply Quote 0
      • YannickVeys
        YannickVeys last edited by

        • Scrape Google?

        • Make your own scraper and keep the requests per second really low ?

        • Maybe the site has an automated sitemap somewhere ?

        • Google webmaster tools -> download "internal links" table

        neooptic 1 Reply Last reply Reply Quote 3
        • neooptic
          neooptic @YannickVeys last edited by

          Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?

          AlanMosley 1 Reply Last reply Reply Quote 0
          • AlanMosley
            AlanMosley @neooptic last edited by

            why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv

            1 Reply Last reply Reply Quote 0
            • Dan-Petrovic
              Dan-Petrovic last edited by

              Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?

              1 Reply Last reply Reply Quote 1
              • Dr-Pete
                Dr-Pete last edited by

                Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):

                http://www.screamingfrog.co.uk/seo-spider/

                It's a good tool, and nice to have around, IMO.

                1 Reply Last reply Reply Quote 1
                • 1 / 1
                • First post
                  Last post
                • Site scraped over 400,000 urls
                  Nigel_Carr
                  Nigel_Carr
                  0
                  3
                  91

                • New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
                  Nigel_Carr
                  Nigel_Carr
                  1
                  4
                  106

                • Site Migration from One Dev. and Server to Another Dev. and Server
                  LoganRay
                  LoganRay
                  0
                  10
                  245

                • URL Question: Is there any value for ecomm sites in having a reverse "breadcrumb" in the URL?
                  ROI_DNA
                  ROI_DNA
                  0
                  4
                  229

                • If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect?
                  Booj
                  Booj
                  0
                  4
                  113

                • Can the Hosting location of image files have a negative effect if on the developers own media server rather than on client site server ?
                  ReferralCandy
                  ReferralCandy
                  0
                  3
                  141

                • One URL To All Sites, How Can I Avoid ?
                  BlueprintMarketing
                  BlueprintMarketing
                  0
                  2
                  67

                • Friendly URLs for MultiLingual Site
                  wissamdandan
                  wissamdandan
                  0
                  2
                  106

                Get started with Moz Pro!

                Unlock the power of advanced SEO tools and data-driven insights.

                Start my free trial
                Products
                • Moz Pro
                • Moz Local
                • Moz API
                • Moz Data
                • STAT
                • Product Updates
                Moz Solutions
                • SMB Solutions
                • Agency Solutions
                • Enterprise Solutions
                • Digital Marketers
                Free SEO Tools
                • Domain Authority Checker
                • Link Explorer
                • Keyword Explorer
                • Competitive Research
                • Brand Authority Checker
                • Local Citation Checker
                • MozBar Extension
                • MozCast
                Resources
                • Blog
                • SEO Learning Center
                • Help Hub
                • Beginner's Guide to SEO
                • How-to Guides
                • Moz Academy
                • API Docs
                About Moz
                • About
                • Team
                • Careers
                • Contact
                Why Moz
                • Case Studies
                • Testimonials
                Get Involved
                • Become an Affiliate
                • MozCon
                • Webinars
                • Practical Marketer Series
                • MozPod
                Connect with us

                Contact the Help team

                Join our newsletter
                Moz logo
                © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                • Accessibility
                • Terms of Use
                • Privacy