Welcome to the Q&A Forum

willcritchlow

Very hard to prove these things before they're done - good luck with getting buy-in for what you need to do and in undoing the worst of the damage.

willcritchlow

Good luck!

willcritchlow

You're not the only person reporting odd indexation happenings here on Q&A (see for example this question). And, just like I found for that question, your site appears to have more pages indexed in Bing than in Google - which at least seems to point to us not having missed something obvious like meta noindex or similar.

I did also read Google saying that they had issues with the site: command (link) but I don't think that can have anything to do with your situation as they say they have now fixed that issue, and I couldn't find any other pages on your site even with non-site: searches (i.e. it does genuinely appear as though those pages are missing from the index).

While I am loathe to point just at links these days, I do wonder if in this case it is just a case of needing some more authority for the whole site before it is seen as big enough and important enough to justify more pages in the index.

willcritchlow

It sounds as though you should be OK in that case - if they are all site.com/post, then it shouldn't matter how many categories they are in.

In theory you can have Topics and Categories - it all depends on how the site is set up, but I would probably say it's best to focus your efforts on one if I had to guess without knowing the site and all the considerations inside out.

Good luck.

willcritchlow

It seems like Google only has a handful of distinct pages indexed at the moment - whereas bing has about 10x as many. So that seems to indicate something wrong specifically for Google.

I'd start by checking your search console - are there any errors? If you use the URL inspection tool and visit the URL in question (studyplaces.com/15-best-minors-for-business-majors/) does it tell you why it has been canonicalised? What happens if you view the page as Google saw it? Is there any chance that you are blocking googlebot / cloaking in any way? Have you had any website outages or downtime?

As others noted, your about page is now missing - did you do that deliberately to see if it resolved this issue?

willcritchlow

The answer Google would like you to believe is that it's possible for the word ordering to imply different intents. In reality though, I think it's mainly an artefact of them not fully understanding meaning, or not being able to classify all pages and keywords perfectly.

My colleague Sam Nemzer wrote a post with research on this topic that you might find interesting.

willcritchlow

This sounds tricky - it appears to be indexed for me at the moment.

Some things to check:

Server monitoring - is the website up and available at all times (including e.g. robots.txt)?
Server logs - are you serving any status code other than 200 to googlebot at any point?
Search Console - are there any errors recorded, or if you go to the URL inspector during an outage, has it canonicalised the homepage to any other page somehow?
I noticed a heavy reliance on JavaScript - and some of the text that appears on the homepage doesn't appear to be returning the homepage on a search for it in quotes (e.g. ["Not just any old snack box, SnackMagic lets everyone customize"] so I'd check for JS rendering issues for googlebot as well

Hope something there helps.

willcritchlow

Hi there,

The answer to this depends a little bit on how your CMS / website treats content that appears in multiple categories. The "correct" way of it working is that those pages are still accessible at only one URL (i.e. not at both /category1/slug and /category2/slug ). This is normally achieved either by having a primary category that appears in the URL (and then having that content also appear on the category2 page but not with category2 in the URL) or by having the content available at /slug and appearing on both category1 and category2 pages.

Assuming this is the case, then it's perfectly safe to have content appear in more than one category.

If not, you could investigate whether you can add a rel=canonical link from secondary categories to the primary category. This would be OK but you might want to limit it only to times when content really needs to be in both categories otherwise you may waste crawl budget depending on the scale of your site.

I actually wrote a post about the difference between URL structure and site architecture that you might find useful.

willcritchlow

Hi Jeroen,

Many websites have category or listings pages that contain substantially different lists of links each time Google crawls them. This can be because they are rotating the top listings (like you describe) or simply because the velocity of content creation (and in some cases archiving / removal) is high enough that it appears to change dramatically (think e.g. the reddit "new" page).

As such, I don't think you need to do anything particularly special here - it should "just work" for the page in question - depending on the details, you might want to make sure that there is enough other content on the page that it is substantial enough in its own right.

The other thing I'd consider is whether you want to have more static crawl-paths available to make sure that googlebot always has a way of discovering and crawling all listings - whether you do this via categories, tags, or via some other means.

willcritchlow

In addition to the advice and tips you have already received here (in general: be super careful with .htaccess / httpd.conf files, and revert to previous versions if you see unexpected behaviour) one additional tip is to consider turning on logging while you debug the problem.

willcritchlow

Note that, in addition to what others have said here, it can often be the case that consolidating two websites in the same industry can result in less traffic than the total the two were receiving previously. This is because it's possible that both were ranking for some of the same queries before the merge, and yet the merge doesn't move the top result up (because the result above is significantly more powerful) and hence the net result for that query is just to remove one result you own from the search result page.

Just a word of caution as you model this impact.

willcritchlow

Hi Daniel. Did you get this resolved / did it resolve itself? I'd happily take a look if you'd like if not - just let me know the URL.

willcritchlow

You're right that the best plan is likely to look at adding that content onto the mobile versions of the category page (though it's worth rolling out slowly and carefully if you can't split test because we have seen it be good or bad in different circumstances - see this whiteboard Friday for example).

In theory, with mobile-first indexing, Google will be crawling your site with a mobile user agent, and so as long as you are treating googlebot the same as you treat other similar user agents, it should see the page exactly as you do when you visit with a mobile browser (or emulate mobile using chrome for example).

There are various ways to check different parts of this:

Check what is actually indexed - by viewing the cached version of the page and / or searching for unique text that only appears on a specific category page "in quotes"
Check what google sees using the URL inspection tool in search console and selecting "view crawled page"

Good luck - I hope that helps.

willcritchlow

Hi Frankie,

Sorry for the slow reply to this one. I hope it's still relevant to offer some thoughts.

First, at the top level, I would say that the stated reasons don't necessarily mean that you should not have the kinds of pages you describe. My first preference would be to modify the functionality so that the filters you describe users actually using are those sub-category pages. Even if this meant changing URLs (and hence 301 redirecting the pages you currently have), it is possible to have filter / facet pages be indexable and have unique URLs and meta information.

If that's not possible for whatever reason, I would separate my efforts into the micro and the macro:

Micro: apply a 80:20 or 90:10 rule to the pages that you are losing - find the small number of most important and highest traffic / conversion pages and find a way to keep versions of those pages (again - even if you have to 301 redirect them, you could create them as static content pages targeting those keywords or something if you had to)
Macro: where you simply have no choice but to lose these pages, I think your best bet will be to redirect them to the absolutely best (/ next best!) page on the site for those queries - these might be other (sub-)category pages or they might be individual products or content pages, but at least for the highest traffic end, it'd be worth specific research effort to identify the best redirect targets

One final thought: it's not always the case that the URL has to represent every level in the hierarchy. I don't know your underlying technology, but it might be possible to recreate some of these sub-categories as top-level categories if products are allowed by your CMS to be in more than one category at once. I wrote this article about the difference between URL structures and site architecture that might give more clarity on what I mean here.

willcritchlow

Hi Rashmi,

In my experience, the normal extent of a spammy structured markup penalty is the removal of the SERP features that are associated with that markup - and often if you believe the remedy is to remove the offending markup, you don't get the SERP features either so there often isn't that much of a "recovery" that's possible in this kind of situation.

What kind of symptoms are you seeing / how do you know you have an ongoing structured markup penalty?

I don't know of any situations where there are legitimate ongoing penalties even after you have removed all structured markup so I suspect there must be something else going on (either the situation is resolved, but the search console message remains - noting that if you have removed the markup, you've probably lost the rich snippets as well, or the issue that remains is unrelated to structured data).

Can you share more about the symptoms / notices / communications you have had with the Google team? Thanks!

willcritchlow

In general, Google cares only about cloaking in the sense of treating their crawler differently to human visitors - it's not a problem to treat them differently to other crawlers.

So: if you are tracking the "2 pages visited" using cookies (which I assume you must be? there is no other reliable way to know the 2nd request is from the same user without cookies?) then you can treat googlebot exactly the same as human users - every request is stateless (without cookies) and so googlebot will be able to crawl. You can then treat non-googlebot scrapers more strictly, and rate limit / throttle / deny them as you wish.

I think that if real human users get at least one "free" visit, then you are probably OK - but you may want to consider not showing the recaptcha to real human users coming from google (but you could find yourself in an arms race with the scrapers pretending to be human visitors from google).

In general, I would expect that if it's a recaptcha ("prove you are human") step rather than a paywall / registration wall, you will likely be OK in the situation where:

Googlebot is never shown the recaptcha
Other scrapers are aggressively blocked
Human visitors get at least one page without a recaptcha wall
Human visitors can visit more pages after completing a recaptcha (but without paying / registering)

Hope that all helps. Good luck!

willcritchlow

Ugh. Yeah - sorry - no more bright ideas forthcoming from our side. I think you have a clear-eyed view of the risks and difficulties of the different options. Sorry I don't have anything more substantial for you. Good luck!

willcritchlow

Firstly - wow - after speaking to a few folks here (at Distilled), we're surprised that you've had such an honest (but useless) answer from the GMB team.

I'm going to continue asking around / seeing if anyone has any genuinely bright or authoritative ideas for you, but on a first pass if I were in your shoes, my first step would be to go back to the people you've already escalated to and continue trying to escalate further / get them to look into it with even more technical people. You can describe what you are continuing to see as you did here, and hopefully it can help them debug. This feels by far the safest option at this stage.

I'll come back to you if anyone comes up with any better ideas though.

willcritchlow

What is the business issue this is causing? Are you seeing these 404 / 410 pages appearing in actual searches?

If it's just that they remain technically indexed, I'd be tempted not to be too worried about it - they will drop out eventually.

Unfortunately, most of the ways to get pages (re-)indexed are only appropriate for real pages that you want to have remain in the index (e.g.: include in a new sitemap file and submit that) or are better for individual pages which has the same downside as removing them via search console one by one.

You can remove whole folders at a time via search console, if that would speed things up - if the removed pages are grouped neatly into folders?

Otherwise, I would probably consider prioritising the list (using data about which are getting visits or visibility in search) and removing as many as you can be bothered to work through.

Hope that helps.

willcritchlow

Unless I've misunderstood, I'm not sure that aria-hidden is going to be able to deliver what you are looking to do - I don't think you can use it to hide the alt attribute of the image without hiding the image as well.

If you mean adding non-alt-attribute text to the page so that it is visible to sighted users, I would expect that it would make sense to keep that accessible to screen readers as well - it should be useful to all kinds of site visitor, I would have thought.

In general, I would tend to suggest that alt attributes should primarily be used for their intended accessibility purpose, and that this should tend to include more valuable content on the page which the search engines may find useful. I found this guide to be one of the best I have seen on the subject.

As a sidenote, I tend to think alt attributes are over-rated for SEO purposes anyway. In our testing, we have not yet detected a statistically significant uplift from adding alt attributes to images that did not previously have them.

Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

willcritchlow

@willcritchlow

Posts made by willcritchlow

Products

Moz Solutions

Free SEO Tools

Resources

About Moz

Why Moz

Get Involved