The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. Technical SEO Issues
    4. How to extract URLs from a site (without bringing the server down!)

    How to extract URLs from a site (without bringing the server down!)

    Technical SEO Issues
    6 5 546
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • neooptic
      neooptic last edited by

      Hi everybody.

      One of my clients is migrating to a new ecommerce platform, and we need to get a list of urls from the existing site to start mapping out the 301 redirects. Usually, I'd use a tool like Xenu or Integrity to crawl and output a list.

      However, the database and server setup is so bad that it can't handle the requests from these tools and it sends the site down. This, unsurprisingly, is one of the reasons for the migration.

      Does anybody know of a way to get a full list of urls without having to make a bunch of http requests which will kill the site? Any advice would be much appreciated!

      1 Reply Last reply Reply Quote 0
      • YannickVeys
        YannickVeys last edited by

        • Scrape Google?

        • Make your own scraper and keep the requests per second really low ?

        • Maybe the site has an automated sitemap somewhere ?

        • Google webmaster tools -> download "internal links" table

        neooptic 1 Reply Last reply Reply Quote 3
        • neooptic
          neooptic @YannickVeys last edited by

          Thanks Yannick, I don't know why I didn't think of using a scraper! Can you recommend any good code (PHP perhaps)?

          AlanMosley 1 Reply Last reply Reply Quote 0
          • AlanMosley
            AlanMosley @neooptic last edited by

            why not find the links to the site, becauase you will only need to 301 the urls with extenal links. let teh rest 404. i use Bing WMT as it has a most complete collection IMO. they also export to a csv

            1 Reply Last reply Reply Quote 0
            • Dan-Petrovic
              Dan-Petrovic last edited by

              Copy the site, set it up on a staging server and run http://www.xml-sitemaps.com/ on it?

              1 Reply Last reply Reply Quote 1
              • Dr-Pete
                Dr-Pete last edited by

                Just a follow-up to my endorsement. It looks like Screaming Frog will let you control the number of pages crawled per second, but to do a full crawl you'll need to get the paid version (the free version only crawls 500 URLs):

                http://www.screamingfrog.co.uk/seo-spider/

                It's a good tool, and nice to have around, IMO.

                1 Reply Last reply Reply Quote 1
                • 1 / 1
                • First post
                  Last post
                • Changing site URL structure
                  vezaus
                  vezaus
                  0
                  2
                  80

                • New SEO manager needs help! Currently only about 15% of our live sitemap (~4 million url e-commerce site) is actually indexed in Google. What are best practices sitemaps for big sites with a lot of changing content?
                  Nigel_Carr
                  Nigel_Carr
                  1
                  4
                  106

                • PortfolioID urls appearing in my wordpress site- what to do?
                  Carson-Ward
                  Carson-Ward
                  0
                  3
                  166

                • URL Question: Is there any value for ecomm sites in having a reverse "breadcrumb" in the URL?
                  ROI_DNA
                  ROI_DNA
                  0
                  4
                  229

                • If I want clean up my URLs and take the "www.site.com/page.html" and make it "www.site.com/page" do I need a redirect?
                  Booj
                  Booj
                  0
                  4
                  113

                • Blocking subdomains without blocking sites...
                  OlegKorneitchouk
                  OlegKorneitchouk
                  1
                  6
                  145

                • Can the Hosting location of image files have a negative effect if on the developers own media server rather than on client site server ?
                  ReferralCandy
                  ReferralCandy
                  0
                  3
                  141

                Get started with Moz Pro!

                Unlock the power of advanced SEO tools and data-driven insights.

                Start my free trial
                Products
                • Moz Pro
                • Moz Local
                • Moz API
                • Moz Data
                • STAT
                • Product Updates
                Moz Solutions
                • SMB Solutions
                • Agency Solutions
                • Enterprise Solutions
                • Digital Marketers
                Free SEO Tools
                • Domain Authority Checker
                • Link Explorer
                • Keyword Explorer
                • Competitive Research
                • Brand Authority Checker
                • Local Citation Checker
                • MozBar Extension
                • MozCast
                Resources
                • Blog
                • SEO Learning Center
                • Help Hub
                • Beginner's Guide to SEO
                • How-to Guides
                • Moz Academy
                • API Docs
                About Moz
                • About
                • Team
                • Careers
                • Contact
                Why Moz
                • Case Studies
                • Testimonials
                Get Involved
                • Become an Affiliate
                • MozCon
                • Webinars
                • Practical Marketer Series
                • MozPod
                Connect with us

                Contact the Help team

                Join our newsletter
                Moz logo
                © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                • Accessibility
                • Terms of Use
                • Privacy