The URL Keri posts has a lot of helpful info, but does not answer Luke's original question. I am very curious about such a tool as well.
Here is my research on the subject:
There are numerous sites offering to check 20-50 URLs for index status. (Search for "bulk indexation checker") but none available to check the thousands of pages on a typical enterprise site.
Why not just trust WMT for the list? A) Large complex sites have too many pages excluded via robots or non-index tags. Important pages can slip through the cracks. More importantly, B) I do not trust WMT data.
Why not just use Screaming Frog, IIS, SEOtools for Excel, Xenu or another site scraper, to check if a cached copy exists? Seems simple, just plug in a list of URLs like http://webcache.googleusercontent.com/search?q=cac... and report which throw 404 errors. Unfortunately, Google dislikes this and forces CAPTCHAs after 30 or so.
After much research, I have only uncovered two ways to check large #s of URLs: 1) Scrapebox, with rotating proxies
2) Using the Moz toolbar and this search https://www.google.com/search?q=site:www.zappos.co... Export the SERP to CSV using the Mozbar then do it over again to the next 200: https://www.google.com/search?q=site:www.zappos.co... Merge & purge.
The idea would be to use either method to generate a list, then check it against a site crawl/sitemap file to see if all important pages are indexed.
QUESTION: Does anyone know of a better method than 1 or 2 above? I'd be happy to share the results of my research with anyone similarly interested.
Thanks,
Carl