404 Errors for Form Generated Pages - No index, no follow or 301 redirect
-
Hi there
I wonder if someone can help me out and provide the best solution for a problem with form generated pages.
I have blocked the search results pages from being indexed by using the 'no index' tag, and I wondered if I should take this approach for the following pages. I have seen a huge increase in 404 errors since the new site structure and forms being filled in. This is because every time a form is filled in, this generates a new page, which only Google Search Console is reporting as a 404.
Whilst some 404's can be explained and resolved, I wondered what is best to prevent Google from crawling these pages, like this: mydomain.com/webapp/wcs/stores/servlet/TopCategoriesDisplay?langId=-1&storeId=90&catalogId=1008&homePage=Y
Implement 301 redirect using rules, which will mean that all these pages will redirect to the homepage. Whilst in theory this will protect any linked to pages, it does not resolve this issue of why GSC is recording as 404's in the first place. Also could come across to Google as 100,000+ redirected links, which might look spammy.
Place No index tag on these pages too, so they will not get picked up, in the same way the search result pages are not being indexed.
Block in robots - this will prevent any 'result' pages being crawled, which will improve the crawl time currently being taken up. However, I'm not entirely sure if the block will be possible? I would need to block anything after the domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?. Hopefully this is possible?
The no index tag will take time to set up, as needs to be scheduled in with development team, but the robots.txt will be an quicker fix as this can be done in GSC.
I really appreciate any feedback on this one.
Many thanks
-
Hi RIc,
I believe your first step would be blocking via robots.txt something along the lines of:
Disallow: domain/webapp/wcs/stores/servlet/TopCategoriesDisplay?*But I think you are mistaken that you can make this change within GSC, you can test in GSC, but this doesn't change anything on your site. You will still have to reach out to a dev to get this change complete.
Out of curiosity are these 404's being marked as soft 404's?
-
Hi - thank you for your response. Apologies, I mean't test in GSC.
To answer your question, these are not soft 404's
Many thanks
-
Hi Ric,
That makes sense, so do these pages result in a non-404 from a search, but direct traffic would result in a 404? Or are these 404's only appearing in GSC?
Did the robots.txt blocking work out? Are any of these URL's mentioned in the sitemap.xml? Have you tried crawling the site with a crawler like screaming frog to see if they surface in that? If they do you might need to approach your search results a different way.
-
Hi there
I wonder if you would be able to still help. The number of 404's is increasing significantly and the majority only appear in GSC. The reason I think this could be search URL related is these are increasing significantly every day.
The robots.txt has blocked some, but as the number continues to increase I am thinking there could be a few reasons, which I need to look into more.
A siteliner report cannot crawl the site due to 'too many redirections for this URL'. This is one reason why I suspect there is a wider issue to investigate with the https http.
Moz and Screaming Frog are recording some errors (which we expected and need to resolve) but in the 100's, compared to the 1000's recorded in GSC.
Any other ideas / suggestions would be appreciated.
Many thanks