The Moz Q&A Forum

    • Forum
    • Questions
    • My Q&A
    • Users
    • Ask the Community

    Welcome to the Q&A Forum

    Browse the forum for helpful insights and fresh discussions about all things SEO.

    1. SEO and Digital Marketing Q&A Forum
    2. Categories
    3. On-Page / Site Optimization
    4. Sitemap Help!

    Sitemap Help!

    On-Page / Site Optimization
    11 7 1.6k
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as question
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • wazza1985
      wazza1985 last edited by

      Hi Guys,

      Quick question regarding sitemaps. I am currently working on a huge site that has masses of pages.

      I am looking to create a site map. How would you guys do this? i have looked at some tools but it say it will only do up to 30,000 pages roughly. It is so large it would be impossible to do this myself....any suggestions?

      Also, how do i find out how many pages my site actually has indexed and not indexed??

      Thank You all

      Wayne

      1 Reply Last reply Reply Quote 0
      • saibose
        saibose last edited by

        To extract URLs, you can use Xenu Link Sleuth. Then you msut make a hiearchy of sitemaps so that all sitemaps are efficiently crawled by Google.

        1 Reply Last reply Reply Quote 0
        • krissy-cca
          krissy-cca last edited by

          Hey,

          I'm assuming you mean XML sitemaps here: You can create a sitemap index file which essentially lists a number of sitemaps in one file (A sitemap of sitemap files if that makes sense). See http://www.google.com/support/webmasters/bin/answer.py?answer=71453

          There are automatic sitemap generators out there -  if you're site has categories with thousands of pages I'd split up them up and have a sitemap per category.

          DD

          wazza1985 1 Reply Last reply Reply Quote 2
          • StalkerB
            StalkerB last edited by

            How big we talking?

            Probably best grabbing something server side if your CMS can't do it. Check out - http://code.google.com/p/sitemap-generators/wiki/SitemapGenerators - I know Google says they've not tested any (and neither have I) but they must have looked at them at some point.

            Secondly you'll need to know how to submit multiple sitemap parts and how to break them up.

            Looking at it Amazon seem to cap theirs at 50,000 and Ebay at 40,000, so I think you should be fine with numbers around there.

            Here's how to set up multiple sitemaps in the same directory - http://googlewebmastercentral.blogspot.com/2006/10/multiple-sitemaps-in-same-directory.html

            Once you've submitted your sitemaps Webmaster Tools will tell you how many URLs you've submitted vs. how many they've indexed.

            wazza1985 1 Reply Last reply Reply Quote 1
            • wazza1985
              wazza1985 @krissy-cca last edited by

              Thanks for your help

              do you ffel it is important to have HTML + Video site maps as well? How does this make a differance?

              StalkerB 1 Reply Last reply Reply Quote 0
              • wazza1985
                wazza1985 @StalkerB last edited by

                Is there any way i can see pages that have not been indexed?

                Is it more beneficial to include various site maps or just the one?

                Thanks for your help!!

                StalkerB 1 Reply Last reply Reply Quote 0
                • StalkerB
                  StalkerB @wazza1985 last edited by

                  Is there any way i can see pages that have not been indexed?

                  Not that I can tell and using site: isn't going to be feasible on a large site I guess.

                  Is it more beneficial to include various site maps or just the one?

                  Well, the max files size is 50,000 or 10MB uncompressed (you can gzip them), so if you've more than 50,000 URLs you'll have to.

                  1 Reply Last reply Reply Quote 1
                  • StalkerB
                    StalkerB @wazza1985 last edited by

                    HTML sitemaps are good for users; having 100,000 links on a page though, not so much.

                    If you can (and certainly with a site this large) if you can do video and image sitemaps you'll help Google get around your site.

                    1 Reply Last reply Reply Quote 1
                    • Guest
                      Guest last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 1
                      • Lauroca
                        Lauroca last edited by

                        Hi! With this file, you can create a Google-friendly sitemap for any given folder almost automatically. No limits on the number of files. Please note that the code is the courtesy of @frkandris who generously helped me out when I had a similair problem. I hope it will be as helpful to you as it was to me 😄

                        • Copy / paste the code below into a text editor.
                        • Edit the beginning of the file: where you see seomoz.com, put your own domain name there
                        • Save the file as getsitemap.php and ftp it to the appropriate folder.
                        • Write the full URL in your browser: http://www.yourdomain.com/getsitemap.php
                        • The moment you do it, a sitemap.xml will be generated in your folder
                        • Refresh your ftp client and download the sitemap. Make further changes to it if you wish.

                        ===  CODE STARTS HERE ===

                        define(DIRBASE, './');define(URLBASE, 'http://www.seomoz.com/'); $isoLastModifiedSite = "";$newLine = "\n";$indent = " ";if (!$rootUrl) $rootUrl = "http://www.seomoz.com"; $xmlHeader = "$newLine"; $urlsetOpen = "<urlset xmlns="&quot;http://www.google.com/schemas/sitemap/0.84\&quot;"  =""   <="" span="">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance\" xsi:schemaLocation="http://www.google.com/schemas/sitemap/0.84 http://www.google.com/schemas/sitemap/0.84/sitemap.xsd\">$newLine";$urlsetValue = "";$urlsetClose = "</urlset>$newLine"; function makeUrlString ($urlString) {   return htmlentities($urlString, ENT_QUOTES, 'UTF-8');} function makeIso8601TimeStamp ($dateTime) {   if (!$dateTime) {       $dateTime = date('Y-m-d H:i:s');   }   if (is_numeric(substr($dateTime, 11, 1))) {       $isoTS = substr($dateTime, 0, 10) ."T"                .substr($dateTime, 11, 😎 ."+00:00";   }   else {       $isoTS = substr($dateTime, 0, 10);   }   return $isoTS;} function makeUrlTag ($url, $modifiedDateTime, $changeFrequency, $priority) {   GLOBAL $newLine;   GLOBAL $indent;   GLOBAL $isoLastModifiedSite;   $urlOpen = "$indent<url>$newLine";</url>   $urlValue = "";   $urlClose = "$indent$newLine";   $locOpen = "$indent$indent<loc>";</loc>   $locValue = "";   $locClose = "$newLine";   $lastmodOpen = "$indent$indent<lastmod>";</lastmod>   $lastmodValue = "";   $lastmodClose = "$newLine";   $changefreqOpen = "$indent$indent<changefreq>";</changefreq>   $changefreqValue = "";   $changefreqClose = "$newLine";   $priorityOpen = "$indent$indent<priority>";</priority>   $priorityValue = "";   $priorityClose = "$newLine";    $urlTag      = $urlOpen;   $urlValue     = $locOpen .makeUrlString("$url") .$locClose;   if ($modifiedDateTime) {    $urlValue .= $lastmodOpen .makeIso8601TimeStamp($modifiedDateTime) .$lastmodClose;    if (!$isoLastModifiedSite) { // last modification of web site        $isoLastModifiedSite = makeIso8601TimeStamp($modifiedDateTime);    }   }   if ($changeFrequency) {    $urlValue .= $changefreqOpen .$changeFrequency .$changefreqClose;   }   if ($priority) {    $urlValue .= $priorityOpen .$priority .$priorityClose;   }   $urlTag .= $urlValue;   $urlTag .= $urlClose;   return $urlTag;} function rscandir($base='', &$data=array()) {  $array = array_diff(scandir($base), array('.', '..')); # remove ' and .. from the array /  foreach($array as $value) : / loop through the array at the level of the supplied $base /    if (is_dir($base.$value)) : / if this is a directory /     $data[] = $base.$value.'/'; / add it to the $data array /     $data = rscandir($base.$value.'/', $data); / then make a recursive call with the     current $value as the $base supplying the $data array to carry into the recursion /    elseif (is_file($base.$value)) : / else if the current $value is a file /     $data[] = $base.$value; / just add the current $value to the $data array */    endif;  endforeach;  return $data; // return the $data array } function kill_base($t) {   return(URLBASE.substr($t, strlen(DIRBASE)));} $dir = rscandir(DIRBASE);$a = array_map("kill_base", $dir); foreach ($a as $key => $pageUrl) {       $pageLastModified = date ("Y-m-d", filemtime($dir[$key]));       $pageChangeFrequency = "monthly";       $pagePriority = 0.8;        $urlsetValue .= makeUrlTag ($pageUrl, $pageLastModified, $pageChangeFrequency, $pagePriority); } $current = "$xmlHeader$urlsetOpen$urlsetValue$urlsetClose"; file_put_contents('sitemap.xml', $current); ?>

                        === CODE ENDS HERE ===

                        1 Reply Last reply Reply Quote 0
                        • linztm
                          linztm last edited by

                          The problem that I have with CMS side sitemap generators is that it often pulls content from pages that are existing and adds entries based off that information.  If you have pages linked to that are no longer there, as is the case with dynamic content, then you'll be imposing 404's on yourself like crazy.

                          Just something to watch out for but it's probably your best solution.

                          1 Reply Last reply Reply Quote 0
                          • 1 / 1
                          • First post
                            Last post
                          • Sitemaps Updating
                            Martijn_Scheijbeler
                            Martijn_Scheijbeler
                            0
                            4
                            67

                          • Project Help
                            DanielMulderNL
                            DanielMulderNL
                            0
                            5
                            101

                          • Our urls for adwords are slightly different from current urls presented on site (weused htaccess to help create shorter urls). How important is it that the adwords url match the sitemap url for keywords on those pages?
                            LesleyPaone
                            LesleyPaone
                            0
                            2
                            134

                          • Page not ranking, Help!
                            littlesthobo
                            littlesthobo
                            0
                            3
                            164

                          • Sitemap error is reported when using a sitemap-index generated by Yoast
                            AlanBleiweiss
                            AlanBleiweiss
                            0
                            4
                            216

                          • Is having this sitemap worthwhile?
                            Cornel_Ilea
                            Cornel_Ilea
                            0
                            2
                            335

                          • No follow for html sitemap?
                            mjtaylor
                            mjtaylor
                            0
                            3
                            805

                          • No Index help
                            Vetshopgroup
                            Vetshopgroup
                            0
                            5
                            658

                          Get started with Moz Pro!

                          Unlock the power of advanced SEO tools and data-driven insights.

                          Start my free trial
                          Products
                          • Moz Pro
                          • Moz Local
                          • Moz API
                          • Moz Data
                          • STAT
                          • Product Updates
                          Moz Solutions
                          • SMB Solutions
                          • Agency Solutions
                          • Enterprise Solutions
                          • Digital Marketers
                          Free SEO Tools
                          • Domain Authority Checker
                          • Link Explorer
                          • Keyword Explorer
                          • Competitive Research
                          • Brand Authority Checker
                          • Local Citation Checker
                          • MozBar Extension
                          • MozCast
                          Resources
                          • Blog
                          • SEO Learning Center
                          • Help Hub
                          • Beginner's Guide to SEO
                          • How-to Guides
                          • Moz Academy
                          • API Docs
                          About Moz
                          • About
                          • Team
                          • Careers
                          • Contact
                          Why Moz
                          • Case Studies
                          • Testimonials
                          Get Involved
                          • Become an Affiliate
                          • MozCon
                          • Webinars
                          • Practical Marketer Series
                          • MozPod
                          Connect with us

                          Contact the Help team

                          Join our newsletter
                          Moz logo
                          © 2021 - 2026 SEOMoz, Inc., a Ziff Davis company. All rights reserved. Moz is a registered trademark of SEOMoz, Inc.
                          • Accessibility
                          • Terms of Use
                          • Privacy