Why does my crawl diagnostics show duplicate content
-
My crawl diagnostics show duplicate content at mysite.com and mysite.com/index.html which are essentially the same file.
-
"Essentially" the same file isn't the same as "the same file." Your best bet is probably to mark one of them (probably mysite.com) with rel=canonical.
-
If the crawl report found those two URLs, then your website has at least one link to each of those URLs (otherwise Rogerbot wouldn't have found them).
You should follow Collin's advice to define the canonical page.
It also won't hurt to figure out where those links are being used in your content, and then make sure you only use one to point to your page.
Cheers
Michel
-
mysite.com is a domain not a file with mysite.com/index.html being the home page. Not sure how I would do what you suggest.
-
Michel is right - Google doesn't care that they're one template - if both URLs are being crawled, then they'll see that as two "pages". Every unique, crawlable URL can become an indexed page. That's why duplicate content problems are so common.
The good news is that you can put a canonical tag on just the one template/file and it will cover all of the paths/URLs that land on that file. The tag goes in your section and looks like:
I'd check the internal links, though, and see if you're linking to both versions. It's best to use one, consistent URL in your internal links for any given page.