How can I get a listing of just the URLs that are indexed in Google
-
As Google will not show you everything, even using the site command, I use Yahoo SiteExplorer:
http://siteexplorer.search.yahoo.com/search?p=seomoz.org&bwm=i&bwmo=d&bwmf=s
and wrote a PHP script to take the TSV it exports and create a line for each page. I could probably make that available for use one one of my sites.
-
Thanks, that let me grab the first 1000.
-
Your welcome. If that fully answered your question please mark it as answered.

-
Ooh that would be great to let others use, maybe even a YOUmoz post?
-
A bit of a teaser... our new Firefox toolbar that's coming out soon will have the ability in the SERP overlay to download the page of SERPs

-
Well here it is for those paying attention to this thread:
http://www.stevenferrino.com/scripts/redirect-parser.php
Not sure if posting a link will work, they tend not to for me, you can always copy and paste.
I'm considering the YOUMoz addition and already sent you an email Jennifer
-
It didnt fully answer the question cause I was only able to get the first 1000 URLS. I need to get the entire list.
-
If you import the TSV into Excel you will get a column of just the URLS

-
Ok, you haven't stated how big the site is. As I already stated, Google will not show you everything it has in it's index, Yahoo will give 1000, SEOMoz might have additional, also check your Google Webmaster Tools (if you have that setup).
The second thing to keep in mind is incoming links from other places. It sounds like there was no housekeeping before the restructure, so I would keep an eye on the web server logs, analytics, etc. and add 301's for anything else that comes in that doesn't exist.
It's not just about Google, it's also about the user experience. Going to a non-existent page can give the impression that whatever they are looking for is no longer mentioned on your website, which potentially looses customers.
-
This question still remains unanswered, why did it get marked answered?